[MDEV-18172] Benchmarking 10.4 for optimizer trace Created: 2019-01-08 Updated: 2021-01-28 Resolved: 2021-01-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Fix Version/s: | N/A |
| Type: | Task | Priority: | Major |
| Reporter: | Varun Gupta (Inactive) | Assignee: | Varun Gupta (Inactive) |
| Resolution: | Done | Votes: | 1 |
| Labels: | benchmarking | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
We need to benchmark how much slowdown we experience when we add the optimizer trace to the We would need the comparision for: The optimizer trace is currently disabled by default. |
| Comments |
| Comment by Sergei Petrunia [ 2019-01-18 ] | |||||||||||||||||||||
|
Before running the benchmark, please check if branch 10.4-optimizer-trace should be used instead. | |||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2019-01-29 ] | |||||||||||||||||||||
|
Results are here in the google doc | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-01-29 ] | |||||||||||||||||||||
|
Analyzing the data...
| |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-01-29 ] | |||||||||||||||||||||
|
select_random_ranges.lua is 17% worse. According to varun, this is a flaw in the optimizer trace code which can be fixed. | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-01-29 ] | |||||||||||||||||||||
|
Enabling the trace makes some tests faster? like oltp_delete or oltp_update_index ? | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-01-29 ] | |||||||||||||||||||||
|
It is also not clear why enabling the trace makes oltp_update_non_index 33% slower, when we dont have tracing for DML statements yet? | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-02-05 ] | |||||||||||||||||||||
|
oltp_delete run on my machine. All default settings,
10.4-optimizer-trace tree:
10.4-optimizer-trace-orig tree:
so it was 1.5% slower (on a single run). | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-02-05 ] | |||||||||||||||||||||
|
A bit of tunning to make it more CPU-bound:
10.4-optimizer-trace-orig:
10.4-optimizer-trace:
The slowdown is no 0.87% | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-06-15 ] | |||||||||||||||||||||
|
mdev18172-optimizer-trace-benchmark-jun14.ods Not very scientific conclusions:
variety in execution speed is not an excuse for adding slowdowns. | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-06-15 ] | |||||||||||||||||||||
|
Take-aways from discussion with svoj: Profiling with `perf top -p $MYSQLD_PID` shows this:
extra load adds up to 0.87% some time later:
1.23% in total. the worlkoad is same as above, sysbench oltp_point_select.lua, 1 thread, 1 table, 100000 records | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-06-15 ] | |||||||||||||||||||||
|
Trying another metric. Let's count cpu-cycles and instructions it took to run1M queries. The workload is same as above, single-threaded:
CPU Cycles
CPU Instructions:
| |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-06-15 ] | |||||||||||||||||||||
|
Looks like the above are are stabler metric to measure the overhead? Also, perf now shows optimizer trace code. Why weren't Json_writer_array and Json_writer_object ctor/dtor inlined ? (because their code is in .cpp file and not .h file?) | |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-06-15 ] | |||||||||||||||||||||
|
The only tuning I did (except for port# and paths):
| |||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-06-19 ] | |||||||||||||||||||||
|
Thinking about the potential overhead that optimizer trace should have... Measuring the CPU instructions counts from Varun's benchmark , we can see that optimizer trace patch has caused extra 1557 CPU instructions to be executed per query. Looking at an example trace: https://mariadb.com/kb/en/library/basic-optimizer-trace-example/ This means ~150 "if (optimizer_trace is on)" checks in total, which agrees with the 156 we got above. (The most arbitrary number is the 10 instructions per check. This looks like a reasonable upper bound, but could one get it lower?) |