[MDEV-18172] Benchmarking 10.4 for optimizer trace - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Fix Version/s: N/A
Component/s: Optimizer
Labels:
- benchmarking

Description

We need to benchmark how much slowdown we experience when we add the optimizer trace to the
server.

We would need the comparision for:
1) server without the optimizer trace (current 10.4)
2) server with optimizer trace disabled here. (currently in branch 10.4-mdev6111)

The optimizer trace is currently disabled by default.
Please run sysbench.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

benchmarks.png
111 kB
2019-06-17 11:55
image-2019-01-29-17-02-28-197.png
43 kB
2019-01-29 14:02
mdev18172-optimizer-trace-benchmark-jun14.ods
91 kB
2019-06-15 09:28
mdev18182-analysis.ods
21 kB
2019-01-29 14:08
screenshot-1.png
40 kB
2019-01-29 14:05

Issue Links

is part of

MDEV-6111 optimizer trace

Closed

relates to

MDEV-18822 investigate performance regressions in 10.4

Stalled

Activity

Ascending order - Click to sort in descending order

Varun Gupta (Inactive) created issue - 2019-01-08 17:57

Varun Gupta (Inactive) made changes - 2019-01-08 18:03

Field	Original Value	New Value
Description		We need to benchmark how much slowdown we experience when we add the optimizer trace to the server. We would need the comparision for: 1) server without the optimizer trace (current 10.4) 2) server with optimizer trace disabled here. (currently in branch 10.4-mdev6111) {code:sql} set optimizer_trace='enabled=off'; {code}

Varun Gupta (Inactive) made changes - 2019-01-08 18:56

Description

We need to benchmark how much slowdown we experience when we add the optimizer trace to the
server.

We would need the comparision for:
1) server without the optimizer trace (current 10.4)
2) server with optimizer trace disabled here. (currently in branch 10.4-mdev6111)
{code:sql}
set optimizer_trace='enabled=off';
{code}

We need to benchmark how much slowdown we experience when we add the optimizer trace to the
server.

We would need the comparision for:
1) server without the optimizer trace (current 10.4)
2) server with optimizer trace disabled here. (currently in branch 10.4-mdev6111)

The optimizer trace is currently disabled by default.
Please run sysbench.

Varun Gupta (Inactive) made changes - 2019-01-08 18:57

Assignee

Axel Schwenke [ axel ]

Varun Gupta (Inactive) made changes - 2019-01-08 18:58

Affects Version/s	10.4 [ 22408 ]
Issue Type	Bug [ 1 ]	Task [ 3 ]

Sergei Petrunia made changes - 2019-01-18 22:09

Link

This issue is part of ~~MDEV-6111~~ [ ~~MDEV-6111~~ ]

Sergei Petrunia added a comment - 2019-01-18 22:09

Before running the benchmark, please check if branch 10.4-optimizer-trace should be used instead.

Sergei Petrunia added a comment - 2019-01-18 22:09 Before running the benchmark, please check if branch 10.4-optimizer-trace should be used instead.

Varun Gupta (Inactive) added a comment - 2019-01-29 13:44

Results are here in the google doc
https://docs.google.com/document/d/11RUOHxXNbUO4oIu4iN9cpdh_mseMVZvAXp77m4kkDNU/edit?usp=sharing

Varun Gupta (Inactive) added a comment - 2019-01-29 13:44 Results are here in the google doc https://docs.google.com/document/d/11RUOHxXNbUO4oIu4iN9cpdh_mseMVZvAXp77m4kkDNU/edit?usp=sharing

Sergei Petrunia added a comment - 2019-01-29 14:01 - edited

Analyzing the data...

Test name	% of TPS that trace=off is worse than 10.4-vanilla

oltp_write_only.lua	-3.23

oltp_delete.lua	-6.92

oltp_update_index.lua	0.56

select_random_points.lua	9.32

oltp_point_select.lua 	10.36

oltp_read_write.lua	4.45

oltp_insert.lua	-2.11

select_random_ranges.lua	17.31

oltp_read_only.lua	4.58

oltp_update_non_index.lua	2.71

Sergei Petrunia added a comment - 2019-01-29 14:01 - edited Analyzing the data... Test name % of TPS that trace=off is worse than 10.4-vanilla oltp_write_only.lua -3.23 oltp_delete.lua -6.92 oltp_update_index.lua 0.56 select_random_points.lua 9.32 oltp_point_select.lua 10.36 oltp_read_write.lua 4.45 oltp_insert.lua -2.11 select_random_ranges.lua 17.31 oltp_read_only.lua 4.58 oltp_update_non_index.lua 2.71

Sergei Petrunia made changes - 2019-01-29 14:02

Attachment

image-2019-01-29-17-02-28-197.png [ 47186 ]

Sergei Petrunia added a comment - 2019-01-29 14:03

select_random_ranges.lua is 17% worse. According to varun, this is a flaw in the optimizer trace code which can be fixed.

Sergei Petrunia added a comment - 2019-01-29 14:03 select_random_ranges.lua is 17% worse. According to varun , this is a flaw in the optimizer trace code which can be fixed.

Sergei Petrunia made changes - 2019-01-29 14:05

Attachment

screenshot-1.png [ 47187 ]

Sergei Petrunia added a comment - 2019-01-29 14:06

Enabling the trace makes some tests faster? like oltp_delete or oltp_update_index ?

Sergei Petrunia added a comment - 2019-01-29 14:06 Enabling the trace makes some tests faster? like oltp_delete or oltp_update_index ?

Sergei Petrunia added a comment - 2019-01-29 14:07

It is also not clear why enabling the trace makes oltp_update_non_index 33% slower, when we dont have tracing for DML statements yet?

Sergei Petrunia added a comment - 2019-01-29 14:07 It is also not clear why enabling the trace makes oltp_update_non_index 33% slower, when we dont have tracing for DML statements yet?

Sergei Petrunia made changes - 2019-01-29 14:08

Attachment

mdev18182-analysis.ods [ 47188 ]

Varun Gupta (Inactive) made changes - 2019-01-29 18:28

Assignee

Axel Schwenke [ axel ]

Varun Gupta [ varun ]

Julien Fritsch made changes - 2019-01-29 20:04

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Petrunia added a comment - 2019-02-05 12:03 - edited

oltp_delete run on my machine. All default settings,

git clone -b 10.4-optimizer_trace git@github.com:MariaDB/server.git 10.4-optimizer-trace;

cd 10.4-optimizer-trace;

cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITHOUT_MROONGA:bool=1 -DWITHOUT_TOKUDB:bool=1 && make -j10

(cd mysql-test; ./mtr alias)

git clone 10.4-optimizer-trace 10.4-optimizer-trace-orig

cd 10.4-optimizer-trace;

git reset --hard 923415ffdb4ce2ef66d53077414abc2845a768e6

cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITHOUT_MROONGA:bool=1 -DWITHOUT_TOKUDB:bool=1 && make -j10

(cd mysql-test; ./mtr alias)

sysbench --threads=4 --mysql-socket=/tmp/mysql20.sock \

  --mysql-user=root --tables=5 --table-size=1000000 \

  /usr/share/sysbench/oltp_delete.lua prepare

sysbench --threads=3 --events=0 --time=60 --mysql-socket=/tmp/mysql20.sock \

  --mysql-user=root  --tables=5 --table-size=1000000 --range_selects=off \

  --db-ps-mode=disable --report-interval=1 /usr/share/sysbench/oltp_delete.lua run

10.4-optimizer-trace tree:

    transactions:                        275766 (4595.41 per sec.)

10.4-optimizer-trace-orig tree:

    transactions:                        279959 (4665.76 per sec.)

so it was 1.5% slower (on a single run).

Sergei Petrunia added a comment - 2019-02-05 12:03 - edited oltp_delete run on my machine. All default settings, git clone -b 10.4-optimizer_trace git@github.com:MariaDB/server.git 10.4-optimizer-trace; cd 10.4-optimizer-trace; cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITHOUT_MROONGA:bool=1 -DWITHOUT_TOKUDB:bool=1 && make -j10 (cd mysql-test; ./mtr alias) git clone 10.4-optimizer-trace 10.4-optimizer-trace-orig cd 10.4-optimizer-trace; git reset --hard 923415ffdb4ce2ef66d53077414abc2845a768e6 cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITHOUT_MROONGA:bool=1 -DWITHOUT_TOKUDB:bool=1 && make -j10 (cd mysql-test; ./mtr alias) sysbench --threads=4 --mysql-socket=/tmp/mysql20.sock \ --mysql-user=root --tables=5 --table-size=1000000 \ /usr/share/sysbench/oltp_delete.lua prepare sysbench --threads=3 --events=0 --time=60 --mysql-socket=/tmp/mysql20.sock \ --mysql-user=root --tables=5 --table-size=1000000 --range_selects=off \ --db-ps-mode=disable --report-interval=1 /usr/share/sysbench/oltp_delete.lua run 10.4-optimizer-trace tree: transactions: 275766 (4595.41 per sec.) 10.4-optimizer-trace-orig tree: transactions: 279959 (4665.76 per sec.) so it was 1.5% slower (on a single run).

Sergei Petrunia added a comment - 2019-02-05 12:15

A bit of tunning to make it more CPU-bound:

innodb_buffer_pool_size=4G

innodb_flush_log_at_trx_commit=0

10.4-optimizer-trace-orig:

    transactions:                        2549706 (42493.02 per sec.)

10.4-optimizer-trace:

    transactions:                        2527543 (42123.71 per sec.)

The slowdown is no 0.87%

Sergei Petrunia added a comment - 2019-02-05 12:15 A bit of tunning to make it more CPU-bound: innodb_buffer_pool_size=4G innodb_flush_log_at_trx_commit=0 10.4-optimizer-trace-orig: transactions: 2549706 (42493.02 per sec.) 10.4-optimizer-trace: transactions: 2527543 (42123.71 per sec.) The slowdown is no 0.87%

Ralf Gebhardt made changes - 2019-02-25 23:25

Labels

benchmarking

Sergei Petrunia made changes - 2019-06-14 18:09

Link

This issue relates to MDEV-18822 [ MDEV-18822 ]

Sergei Petrunia made changes - 2019-06-15 09:28

Attachment

mdev18172-optimizer-trace-benchmark-jun14.ods [ 48214 ]

Sergei Petrunia added a comment - 2019-06-15 09:46

mdev18172-optimizer-trace-benchmark-jun14.ods - ran a single threaded benchmark on Intel Core i9 box. Compared "good" branch (just before the trace push) and "bad" branch just after the trace push.

Not very scientific conclusions:

Trace makes the queries slower
There's also speed variety across runs, which is greater than the slowdown.

variety in execution speed is not an excuse for adding slowdowns.

Sergei Petrunia added a comment - 2019-06-15 09:46 mdev18172-optimizer-trace-benchmark-jun14.ods - ran a single threaded benchmark on Intel Core i9 box. Compared "good" branch (just before the trace push) and "bad" branch just after the trace push. Not very scientific conclusions: Trace makes the queries slower There's also speed variety across runs, which is greater than the slowdown. variety in execution speed is not an excuse for adding slowdowns.

Sergei Petrunia added a comment - 2019-06-15 09:56

Take-aways from discussion with svoj: Profiling with `perf top -p $MYSQLD_PID` shows this:

   0,27%  mysqld  [.] Json_writer_object::Json_writer_object

   0,23%  mysqld  [.] Json_writer_object::~Json_writer_object

   0,17%  mysqld  [.] Json_writer_object::add

   0,08%  mysqld  [.] Json_writer_object::add_table_name

   0,07%  mysqld  [.] Json_writer_object::Json_writer_object

   0,05%  mysqld  [.] Json_writer_object::add

extra load adds up to 0.87%

some time later:

   0,25%  mysqld  [.] Json_writer_object::~Json_writer_object

   0,25%  mysqld  [.] Json_writer_object::Json_writer_object

   0,22%  mysqld  [.] Json_writer_array::Json_writer_array

   0,17%  mysqld  [.] Json_writer_object::add

   0,12%  mysqld  [.] Json_writer_array::~Json_writer_array

   0,09%  mysqld  [.] Json_writer_object::add_table_name

   0,07%  mysqld  [.] Json_writer_object::add

   0,06%  mysqld  [.] Json_writer_object::Json_writer_object

1.23% in total.

the worlkoad is same as above, sysbench oltp_point_select.lua, 1 thread, 1 table, 100000 records

Sergei Petrunia added a comment - 2019-06-15 09:56 Take-aways from discussion with svoj : Profiling with `perf top -p $MYSQLD_PID` shows this: 0,27% mysqld [.] Json_writer_object::Json_writer_object 0,23% mysqld [.] Json_writer_object::~Json_writer_object 0,17% mysqld [.] Json_writer_object::add 0,08% mysqld [.] Json_writer_object::add_table_name 0,07% mysqld [.] Json_writer_object::Json_writer_object 0,05% mysqld [.] Json_writer_object::add extra load adds up to 0.87% some time later: 0,25% mysqld [.] Json_writer_object::~Json_writer_object 0,25% mysqld [.] Json_writer_object::Json_writer_object 0,22% mysqld [.] Json_writer_array::Json_writer_array 0,17% mysqld [.] Json_writer_object::add 0,12% mysqld [.] Json_writer_array::~Json_writer_array 0,09% mysqld [.] Json_writer_object::add_table_name 0,07% mysqld [.] Json_writer_object::add 0,06% mysqld [.] Json_writer_object::Json_writer_object 1.23% in total. the worlkoad is same as above, sysbench oltp_point_select.lua, 1 thread, 1 table, 100000 records

Sergei Petrunia added a comment - 2019-06-15 10:08

Trying another metric. Let's count cpu-cycles and instructions it took to run1M queries. The workload is same as above, single-threaded:

perf stat -e cpu-cycles,instructions -p $MYSQLD_PID \

  sysbench --threads=1 --events=1000000 --time=60 \

           --mysql-socket=/tmp/mysql4.sock   --db-driver=mysql \

           --mysql-user=root  --tables=1 --table-size=100000 \

           --db-ps-mode=disable --report-interval=1  \

           /usr/share/sysbench/oltp_point_select.lua run

CPU Cycles

	Measured 	%-difference

GOOD1	119268709772	0.000

GOOD2	119752271354	0.404

GOOD2	123604972628	3.508

BAD1	124065619931	3.866

BAD2	126842976642	5.971

BAD3	128390721496	7.105

CPU Instructions:

 	Measured 	%-difference

GOOD1	62150155349	0.000

GOOD2	62075386099	-0.120

GOOD2	62076210080	-0.119

BAD1	63453169426	2.054

BAD2	63384390710	1.947

BAD3	63381281060	1.942

Sergei Petrunia added a comment - 2019-06-15 10:08 Trying another metric. Let's count cpu-cycles and instructions it took to run1M queries. The workload is same as above, single-threaded: perf stat -e cpu-cycles,instructions -p $MYSQLD_PID \ sysbench --threads=1 --events=1000000 --time=60 \ --mysql-socket=/tmp/mysql4.sock --db-driver=mysql \ --mysql-user=root --tables=1 --table-size=100000 \ --db-ps-mode=disable --report-interval=1 \ /usr/share/sysbench/oltp_point_select.lua run CPU Cycles Measured %-difference GOOD1 119268709772 0.000 GOOD2 119752271354 0.404 GOOD2 123604972628 3.508 BAD1 124065619931 3.866 BAD2 126842976642 5.971 BAD3 128390721496 7.105 CPU Instructions: Measured %-difference GOOD1 62150155349 0.000 GOOD2 62075386099 -0.120 GOOD2 62076210080 -0.119 BAD1 63453169426 2.054 BAD2 63384390710 1.947 BAD3 63381281060 1.942

Sergei Petrunia added a comment - 2019-06-15 10:11

Looks like the above are are stabler metric to measure the overhead?
varun, could you try to reproduce the above? Do you get the same #cycles #instructions difference?

Also, perf now shows optimizer trace code. Why weren't Json_writer_array and Json_writer_object ctor/dtor inlined ? (because their code is in .cpp file and not .h file?)

Sergei Petrunia added a comment - 2019-06-15 10:11 Looks like the above are are stabler metric to measure the overhead? varun , could you try to reproduce the above? Do you get the same #cycles #instructions difference? Also, perf now shows optimizer trace code. Why weren't Json_writer_array and Json_writer_object ctor/dtor inlined ? (because their code is in .cpp file and not .h file?)

Sergei Petrunia added a comment - 2019-06-15 15:15

The only tuning I did (except for port# and paths):

datadir=/dev/shm/....

innodb_buffer_pool_size=4G

Sergei Petrunia added a comment - 2019-06-15 15:15 The only tuning I did (except for port# and paths): datadir=/dev/shm/.... innodb_buffer_pool_size=4G

Varun Gupta (Inactive) made changes - 2019-06-17 11:54

Attachment

Screenshot from 2019-06-17 11-34-59.png [ 48215 ]

Varun Gupta (Inactive) made changes - 2019-06-17 11:54

Attachment

Screenshot from 2019-06-17 11-34-59.png [ 48215 ]

Varun Gupta (Inactive) made changes - 2019-06-17 11:55

Attachment

benchmarks.png [ 48216 ]

Sergei Petrunia added a comment - 2019-06-19 10:54 - edited

Thinking about the potential overhead that optimizer trace should have...

Measuring the CPU instructions counts from Varun's benchmark , we can see that optimizer trace patch has caused extra 1557 CPU instructions to be executed per query.
Let's assume 10 instructions per "if (optimizer_trace)" check. This means 156 checks are made per query.

Looking at an example trace: https://mariadb.com/kb/en/library/basic-optimizer-trace-example/
one can see 86 (value written + array start + object start) operations and ~50 (array_end + object_end) operations.

This means ~150 "if (optimizer_trace is on)" checks in total, which agrees with the 156 we got above.

(The most arbitrary number is the 10 instructions per check. This looks like a reasonable upper bound, but could one get it lower?)

Sergei Petrunia added a comment - 2019-06-19 10:54 - edited Thinking about the potential overhead that optimizer trace should have... Measuring the CPU instructions counts from Varun's benchmark , we can see that optimizer trace patch has caused extra 1557 CPU instructions to be executed per query. Let's assume 10 instructions per "if (optimizer_trace)" check. This means 156 checks are made per query. Looking at an example trace: https://mariadb.com/kb/en/library/basic-optimizer-trace-example/ one can see 86 (value written + array start + object start) operations and ~50 (array_end + object_end) operations. This means ~150 "if (optimizer_trace is on)" checks in total, which agrees with the 156 we got above. (The most arbitrary number is the 10 instructions per check. This looks like a reasonable upper bound, but could one get it lower?)

Ralf Gebhardt made changes - 2021-01-28 11:35

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.4 [ 22408 ]
Summary	Benchmarking for optimizer trace	Benchmarking 10.4 for optimizer trace

Ralf Gebhardt made changes - 2021-01-28 11:36

Resolution		Done [ 10200 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2021-12-06 21:24

Workflow

MariaDB v3 [ 91576 ]

MariaDB v4 [ 133811 ]

People

Assignee:: Varun Gupta (Inactive)

Reporter:: Varun Gupta (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2019-01-08 17:57

Updated:: 2021-01-28 11:36

Resolved:: 2021-01-28 11:36

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration