We need to benchmark how much slowdown we experience when we add the optimizer trace to the
server.
We would need the comparision for:
1) server without the optimizer trace (current 10.4)
2) server with optimizer trace disabled here. (currently in branch 10.4-mdev6111)
The optimizer trace is currently disabled by default.
Please run sysbench.
We need to benchmark how much slowdown we experience when we add the optimizer trace to the
server.
We would need the comparision for:
1) server without the optimizer trace (current 10.4)
2) server with optimizer trace disabled here. (currently in branch 10.4-mdev6111)
{code:sql}
set optimizer_trace='enabled=off';
{code}
We need to benchmark how much slowdown we experience when we add the optimizer trace to the
server.
We would need the comparision for:
1) server without the optimizer trace (current 10.4)
2) server with optimizer trace disabled here. (currently in branch 10.4-mdev6111)
{code:sql}
set optimizer_trace='enabled=off';
{code}
We need to benchmark how much slowdown we experience when we add the optimizer trace to the
server.
We would need the comparision for:
1) server without the optimizer trace (current 10.4)
2) server with optimizer trace disabled here. (currently in branch 10.4-mdev6111)
The optimizer trace is currently disabled by default.
Please run sysbench.
Varun Gupta (Inactive)
added a comment - Results are here in the google doc
https://docs.google.com/document/d/11RUOHxXNbUO4oIu4iN9cpdh_mseMVZvAXp77m4kkDNU/edit?usp=sharing
Test name % of TPS that trace=off is worse than 10.4-vanilla
oltp_write_only.lua -3.23
oltp_delete.lua -6.92
oltp_update_index.lua 0.56
select_random_points.lua 9.32
oltp_point_select.lua 10.36
oltp_read_write.lua 4.45
oltp_insert.lua -2.11
select_random_ranges.lua 17.31
oltp_read_only.lua 4.58
oltp_update_non_index.lua 2.71
Sergei Petrunia
added a comment - - edited Analyzing the data...
Test name % of TPS that trace=off is worse than 10.4-vanilla
oltp_write_only.lua -3.23
oltp_delete.lua -6.92
oltp_update_index.lua 0.56
select_random_points.lua 9.32
oltp_point_select.lua 10.36
oltp_read_write.lua 4.45
oltp_insert.lua -2.11
select_random_ranges.lua 17.31
oltp_read_only.lua 4.58
oltp_update_non_index.lua 2.71
select_random_ranges.lua is 17% worse. According to varun, this is a flaw in the optimizer trace code which can be fixed.
Sergei Petrunia
added a comment - select_random_ranges.lua is 17% worse. According to varun , this is a flaw in the optimizer trace code which can be fixed.
It is also not clear why enabling the trace makes oltp_update_non_index 33% slower, when we dont have tracing for DML statements yet?
Sergei Petrunia
added a comment - It is also not clear why enabling the trace makes oltp_update_non_index 33% slower, when we dont have tracing for DML statements yet?
Sergei Petrunia
added a comment - A bit of tunning to make it more CPU-bound:
innodb_buffer_pool_size=4G
innodb_flush_log_at_trx_commit=0
10.4-optimizer-trace-orig:
transactions: 2549706 (42493.02 per sec.)
10.4-optimizer-trace:
transactions: 2527543 (42123.71 per sec.)
The slowdown is no 0.87%
mdev18172-optimizer-trace-benchmark-jun14.ods - ran a single threaded benchmark on Intel Core i9 box. Compared "good" branch (just before the trace push) and "bad" branch just after the trace push.
Not very scientific conclusions:
Trace makes the queries slower
There's also speed variety across runs, which is greater than the slowdown.
variety in execution speed is not an excuse for adding slowdowns.
Sergei Petrunia
added a comment - mdev18172-optimizer-trace-benchmark-jun14.ods - ran a single threaded benchmark on Intel Core i9 box. Compared "good" branch (just before the trace push) and "bad" branch just after the trace push.
Not very scientific conclusions:
Trace makes the queries slower
There's also speed variety across runs, which is greater than the slowdown.
variety in execution speed is not an excuse for adding slowdowns.
Looks like the above are are stabler metric to measure the overhead? varun, could you try to reproduce the above? Do you get the same #cycles #instructions difference?
Also, perf now shows optimizer trace code. Why weren't Json_writer_array and Json_writer_object ctor/dtor inlined ? (because their code is in .cpp file and not .h file?)
Sergei Petrunia
added a comment - Looks like the above are are stabler metric to measure the overhead?
varun , could you try to reproduce the above? Do you get the same #cycles #instructions difference?
Also, perf now shows optimizer trace code. Why weren't Json_writer_array and Json_writer_object ctor/dtor inlined ? (because their code is in .cpp file and not .h file?)
Thinking about the potential overhead that optimizer trace should have...
Measuring the CPU instructions counts from Varun's benchmark , we can see that optimizer trace patch has caused extra 1557 CPU instructions to be executed per query.
Let's assume 10 instructions per "if (optimizer_trace)" check. This means 156 checks are made per query.
This means ~150 "if (optimizer_trace is on)" checks in total, which agrees with the 156 we got above.
(The most arbitrary number is the 10 instructions per check. This looks like a reasonable upper bound, but could one get it lower?)
Sergei Petrunia
added a comment - - edited Thinking about the potential overhead that optimizer trace should have...
Measuring the CPU instructions counts from Varun's benchmark , we can see that optimizer trace patch has caused extra 1557 CPU instructions to be executed per query.
Let's assume 10 instructions per "if (optimizer_trace)" check. This means 156 checks are made per query.
Looking at an example trace: https://mariadb.com/kb/en/library/basic-optimizer-trace-example/
one can see 86 (value written + array start + object start) operations and ~50 (array_end + object_end) operations.
This means ~150 "if (optimizer_trace is on)" checks in total, which agrees with the 156 we got above.
(The most arbitrary number is the 10 instructions per check. This looks like a reasonable upper bound, but could one get it lower?)
Before running the benchmark, please check if branch 10.4-optimizer-trace should be used instead.