Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
10.3(EOL)
-
None
Description
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup
- Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Sockets: 1
Virtual processors: 16
- MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
the relevant part of my.ini file is as follows[mysqld]
back_log=500
max_connections=4096
max_prepared_stmt_count=500000
table_open_cache=10000
transaction_isolation=REPEATABLE-READ
innodb_status_file=0
innodb_buffer_pool_size=20G
innodb_log_file_size=15G
innodb_log_buffer_size=1000M
innodb-io-capacity=4000
innodb-io-capacity-max=5000
innodb_doublewrite=0
innodb-page-cleaners=1
innodb-buffer-pool-instances=1
innodb_adaptive_flushing=0
innodb_adaptive_flushing_lwm=10
- sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)
The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2
Loading table, sysbench prepare
sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
|
sysbench run
sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
|
where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096
The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)
Below are the results from the runs, for 10.2.4 and recent 10.3.7
UPDATE: marko mentioned the git 1a62c8a39647db00283b4e35fb6db3c7bc31e3ca as the version right before 10.3 switched to lockless, so
I added a column for it too. It appears to be better than 10.3.7, but worse than 10.2.14
Users | TPS 10.2.14 | TPS 10.3.7 | 10.3-pre-lockless |
---|---|---|---|
1 | 1549.92 | 1547.75 | 1551.07 |
2 | 2373.89 | 2209.05 | 2334.80 |
4 | 4466.03 | 4276.46 | 4446.64 |
8 | 9069.82 | 8587.59 | 9098.95 |
16 | 16631.45 | 15719.70 | 16231.39 |
32 | 28989.79 | 27482.44 | 27786.32 |
64 | 36996.12 | 34843.23 | 35089.44 |
128 | 37287.05 | 35172.49 | 36223.68 |
256 | 38038.01 | 35896.52 | 36818.22 |
512 | 38360.71 | 36567.07 | 37195.47 |
1024 | 38265.25 | 36328.62 | 37253.98 |
2048 | 39990.85 | 36328.62 | 38572.81 |
4096 | 41988.71 | 39032.83 | 39606.38 |
(also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.
I tried to do some initial profiling (based on CPU sampling), and this is what showed up.
Below., in all places, baseline refers to 10.2.14, and *comparison" is 10.3.7
Exclusive samples comparison
Individual functions (or, exclusive sample percentage with threshold 1% difference)
Comparison Column | Delta | Baseline Value | Comparison Value |
---|---|---|---|
ut_delay | 4.02 | 16.05 | 20.07 |
l_find | 2.44 | 0.16 | 2.60 |
PolicyMutex<TTASEventMutex<GenericPolicy> >::enter | 2.42 | 0.18 | 2.60 |
TTASEventMutex<GenericPolicy>::enter | -4.42 | 4.42 | 0.00 |
SleepConditionVariableCS | -5.41 | 28.70 | 23.29 |
In short, more busy wait ut_delay, less lazy wait SleepConditionVariableCS . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
Inclusive samples comparison
Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)
Comparison Column | Delta | Baseline Value | Comparison Value |
---|---|---|---|
PolicyMutex<TTASEventMutex<GenericPolicy> >::enter | 28.69 | 2.53 | 31.22 |
ut_delay | 4.02 | 16.05 | 20.07 |
trx_purge | 3.81 | 5.67 | 9.48 |
srv_do_purge | 3.80 | 5.70 | 9.49 |
row_purge | 3.66 | 0.00 | 3.67 |
row_purge_step | 3.47 | 0.71 | 4.18 |
que_thr_step | 3.31 | 0.94 | 4.25 |
srv_purge_coordinator_thread | 3.25 | 6.37 | 9.63 |
trx_sys_t::clone_oldest_view | 3.18 | 0.00 | 3.18 |
btr_cur_search_to_nth_level_func | 3.12 | 0.00 | 3.12 |
row_purge_record_func | 2.94 | 0.00 | 2.94 |
row_purge_upd_exist_or_extern_func | 2.77 | 0.00 | 2.77 |
row_purge_reset_trx_id | 2.72 | 0.00 | 2.72 |
que_run_threads_low | 2.70 | 1.73 | 4.43 |
que_run_threads | 2.69 | 1.75 | 4.45 |
l_find | 2.46 | 0.18 | 2.64 |
ReadView::snapshot | 2.30 | 0.00 | 2.30 |
rw_trx_hash_t::iterate | 2.27 | 0.00 | 2.27 |
lf_hash_iterate | 2.25 | 0.00 | 2.25 |
srv_task_execute | 2.12 | 1.95 | 4.08 |
row_purge_reposition_pcur | 1.53 | 0.00 | 1.53 |
row_search_on_row_ref | 1.33 | 0.00 | 1.33 |
btr_pcur_open_low | 1.31 | 0.00 | 1.31 |
btr_search_guess_on_hash | 1.31 | 1.54 | 2.85 |
trx_purge_wait_for_workers_to_complete | 1.10 | 1.52 | 2.62 |
trx_undo_assign_low | 1.07 | 0.00 | 1.07 |
mtr_t::commit | 1.05 | 2.57 | 3.62 |
mtr_t::Command::execute | 1.01 | 2.42 | 3.43 |
srv_worker_thread | -1.09 | 9.84 | 8.75 |
sync_array_wait_event | -1.29 | 9.69 | 8.40 |
trx_write_serialisation_history | -1.34 | 2.62 | 1.28 |
trx_commit_low | -1.39 | 3.15 | 1.76 |
innobase_commit | -1.75 | 47.36 | 45.61 |
trx_commit | -1.75 | 4.79 | 3.04 |
ha_commit_one_phase | -1.76 | 47.55 | 45.78 |
commit_one_phase_2 | -1.76 | 47.54 | 45.77 |
trans_commit_stmt | -1.80 | 48.68 | 46.88 |
ha_commit_trans | -1.80 | 48.65 | 46.84 |
btr_cur_search_to_nth_level | -1.85 | 1.85 | 0.00 |
innobase_commit_ordered_2 | -2.04 | 5.17 | 3.13 |
trx_commit_for_mysql | -2.05 | 5.10 | 3.05 |
innobase_commit_low | -2.05 | 5.11 | 3.06 |
Prepared_statement::execute_loop | -2.05 | 74.58 | 72.52 |
mysql_stmt_execute_common | -2.14 | 74.85 | 72.71 |
mysqld_stmt_execute | -2.15 | 74.89 | 72.74 |
mysql_execute_command | -2.21 | 73.03 | 70.81 |
trx_undo_assign_undo | -2.24 | 2.24 | 0.00 |
Prepared_statement::execute | -2.24 | 74.03 | 71.79 |
threadpool_process_request | -2.32 | 81.71 | 79.38 |
dispatch_command | -2.33 | 79.21 | 76.88 |
do_command | -2.34 | 81.33 | 79.00 |
tp_callback | -2.38 | 82.66 | 80.28 |
srv_resume_thread | -3.33 | 7.94 | 4.61 |
os_event::wait_low | -4.98 | 28.41 | 23.43 |
SleepConditionVariableCS | -5.41 | 28.70 | 23.29 |
TTASEventMutex<GenericPolicy>::enter | -30.49 | 30.49 | 0.00 |
Attachments
Issue Links
- is duplicated by
-
MDEV-17410 mariabackup prepare crashes in Innodb recovery
-
- Closed
-
- relates to
-
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC
-
- Closed
-
-
MDEV-14425 Change the InnoDB redo log format to reduce write amplification
-
- Closed
-
-
MDEV-16232 Use fewer mini-transactions
-
- Stalled
-
-
MDEV-17003 service_manager_extend_timeout() being called too often
-
- Closed
-
-
MDEV-19845 Adaptive spin loops
-
- Closed
-
-
MDEV-17353 pinpoint perf regressions in early 10.3 to commits
-
- Open
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build) the my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_connect_errors=5000 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 enable-named-pipe character_set_server=latin1 collation_server=latin1_swedish_ci skip-character-set-client-handshake innodb_undo_log_truncate=off innodb_doublewrite=0 #innodb-max-purge-lag=1000000 #innodb_log_files_in_group=10 #innodb_max_purge_lag_delay=300000 #innodb_max_purge_lag=0 #innodb-write-io-threads=32 #innodb-page-size=4K [mariadb] innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 #innodb-undo-tablespaces=2 #innodb_buffer_pool_dump_pct=100 #innodb_max_dirty_pages_pct=99 #innodb_max_dirty_pages_pct_lwm=80 innodb_adaptive_flushing_lwm=10 [client] protocol=pipe * sysbench 0.4 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build) the my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_connect_errors=5000 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 enable-named-pipe character_set_server=latin1 collation_server=latin1_swedish_ci skip-character-set-client-handshake innodb_undo_log_truncate=off innodb_doublewrite=0 #innodb-max-purge-lag=1000000 #innodb_log_files_in_group=10 #innodb_max_purge_lag_delay=300000 #innodb_max_purge_lag=0 #innodb-write-io-threads=32 #innodb-page-size=4K [mariadb] innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 #innodb-undo-tablespaces=2 #innodb_buffer_pool_dump_pct=100 #innodb_max_dirty_pages_pct=99 #innodb_max_dirty_pages_pct_lwm=80 innodb_adaptive_flushing_lwm=10 [client] protocol=pipe * sysbench 0.4 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users||10.2.14||10.3.7 |1| |2| | |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users||10.2.14||10.3.7 |1| |2| | |
|I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| |
Description |
|I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling and |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling and |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions Comparison Column Delta Baseline Value Comparison Value ut_delay 4.02 16.05 20.07 l_find 2.44 0.16 2.60 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter 2.42 0.18 2.60 TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00 SleepConditionVariableCS -5.41 28.70 23.29 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions Comparison Column Delta Baseline Value Comparison Value ut_delay 4.02 16.05 20.07 l_find 2.44 0.16 2.60 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter 2.42 0.18 2.60 TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00 SleepConditionVariableCS -5.41 28.70 23.29 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| 4.02| 16.05| 20.07| |l_find| 2.44| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| 2.42|0.18| 2.60| TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00 SleepConditionVariableCS -5.41 28.70 23.29 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| 4.02| 16.05| 20.07| |l_find| 2.44| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| 2.42|0.18| 2.60| TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00 SleepConditionVariableCS -5.41 28.70 23.29 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives this ||Comparison Column ||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 <sync_array_wait_event|-1.29|9.69|8.40 trx_write_serialisation_history|-1.34|2.62|1.28 trx_commit_low|-1.39|3.15|1.76 innobase_commit|-1.75|47.36|45.61 trx_commit|-1.75|4.79|3.04 ha_commit_one_phase|-1.76|47.55|45.78 commit_one_phase_2|-1.76|47.54|45.77 trans_commit_stmt|-1.80|48.68|46.88 ha_commit_trans|-1.80|48.65|46.84 btr_cur_search_to_nth_level|-1.85|1.85|0.00 innobase_commit_ordered_2|-2.04|5.17|3.13 trx_commit_for_mysql|-2.05|5.10|3.05 innobase_commit_low|-2.05|5.11|3.06 Prepared_statement::execute_loop|-2.05|74.58|72.52 mysql_stmt_execute_common|-2.14|74.85|72.71 mysqld_stmt_execute|-2.15|74.89|72.74 mysql_execute_command|-2.21|73.03|70.81 trx_undo_assign_undo|-2.24|2.24|0.00 Prepared_statement::execute|-2.24|74.03|71.79 threadpool_process_request|-2.32|81.71|79.38 dispatch_command|-2.33|79.21|76.88 do_command|-2.34|81.33|79.00 tp_callback|-2.38|82.66|80.28 srv_resume_thread|-3.33|7.94|4.61 os_event::wait_low|-4.98|28.41|23.43 SleepConditionVariableCS|-5.41|28.70|23.29 TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives this ||Comparison Column ||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 <sync_array_wait_event|-1.29|9.69|8.40 trx_write_serialisation_history|-1.34|2.62|1.28 trx_commit_low|-1.39|3.15|1.76 innobase_commit|-1.75|47.36|45.61 trx_commit|-1.75|4.79|3.04 ha_commit_one_phase|-1.76|47.55|45.78 commit_one_phase_2|-1.76|47.54|45.77 trans_commit_stmt|-1.80|48.68|46.88 ha_commit_trans|-1.80|48.65|46.84 btr_cur_search_to_nth_level|-1.85|1.85|0.00 innobase_commit_ordered_2|-2.04|5.17|3.13 trx_commit_for_mysql|-2.05|5.10|3.05 innobase_commit_low|-2.05|5.11|3.06 Prepared_statement::execute_loop|-2.05|74.58|72.52 mysql_stmt_execute_common|-2.14|74.85|72.71 mysqld_stmt_execute|-2.15|74.89|72.74 mysql_execute_command|-2.21|73.03|70.81 trx_undo_assign_undo|-2.24|2.24|0.00 Prepared_statement::execute|-2.24|74.03|71.79 threadpool_process_request|-2.32|81.71|79.38 dispatch_command|-2.33|79.21|76.88 do_command|-2.34|81.33|79.00 tp_callback|-2.38|82.66|80.28 srv_resume_thread|-3.33|7.94|4.61 os_event::wait_low|-4.98|28.41|23.43 SleepConditionVariableCS|-5.41|28.70|23.29 TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there is now more CPU spent in purge threads, and not in foreground ones ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there is now more CPU spent in purge threads, and not in foreground ones ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , because foreground use less of it (decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Summary | [Draft]Performance regression on sysbench write benchmarks from 10.2 to 10.3 | Performance regression on sysbench write benchmarks from 10.2 to 10.3 |
Assignee | Marko Mäkelä [ marko ] |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , because foreground use less of it (decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling, and this is what showed up) Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. * Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. * Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. *Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Attachment | 1bp-1pc-10.2.14.txt [ 45632 ] | |
Attachment | 1bp-1pc-10.3.7.txt [ 45633 ] |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. *Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Sockets: 1 Virtual processors: 16 * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. *Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Attachment | 10.3.pre-lockless.txt [ 45641 ] |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Sockets: 1 Virtual processors: 16 * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7 |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. *Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Sockets: 1 Virtual processors: 16 * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7|| |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. *Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Description |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Sockets: 1 Virtual processors: 16 * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 ||Users|| TPS 10.2.14|| TPS 10.3.7|| |1|1549.92|1547.75| |2|2373.89|2209.05| |4|4466.03|4276.46| |8|9069.82|8587.59| |16|16631.45|15719.70| |32|28989.79|27482.44| |64|36996.12|34843.23| |128|37287.05|35172.49| |256|38038.01|35896.52| |512|38360.71|36567.07| |1024|38265.25|36328.62| |2048|39990.85|36328.62| |4096|41988.71|39032.83| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. *Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
Here is the setup * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk) Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Sockets: 1 Virtual processors: 16 * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9) the relevant part of my.ini file is as follows {noformat} [mysqld] back_log=500 max_connections=4096 max_prepared_stmt_count=500000 table_open_cache=10000 transaction_isolation=REPEATABLE-READ innodb_status_file=0 innodb_buffer_pool_size=20G innodb_log_file_size=15G innodb_log_buffer_size=1000M innodb-io-capacity=4000 innodb-io-capacity-max=5000 innodb_doublewrite=0 innodb-page-cleaners=1 innodb-buffer-pool-instances=1 innodb_adaptive_flushing=0 innodb_adaptive_flushing_lwm=10 {noformat} * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions) The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2 *Loading table, sysbench prepare* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare {noformat} *sysbench run* {noformat} sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run {noformat} where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096 The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy) Below are the results from the runs, for 10.2.4 and recent 10.3.7 *UPDATE:* [~marko] mentioned the git 1a62c8a39647db00283b4e35fb6db3c7bc31e3ca as the version right before 10.3 switched to lockless, so I added a column for it too. It appears to be better than 10.3.7, but worse than 10.2.14 ||Users|| TPS 10.2.14|| TPS 10.3.7|| 10.3-pre-lockless|| |1|1549.92|1547.75|1551.07| |2|2373.89|2209.05|2334.80| |4|4466.03|4276.46|4446.64| |8|9069.82|8587.59|9098.95| |16|16631.45|15719.70|16231.39| |32|28989.79|27482.44|27786.32| |64|36996.12|34843.23|35089.44| |128|37287.05|35172.49|36223.68| |256|38038.01|35896.52|36818.22| |512|38360.71|36567.07|37195.47| |1024|38265.25|36328.62|37253.98| |2048|39990.85|36328.62|38572.81| |4096|41988.71|39032.83|39606.38| (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0) It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating. It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture. I tried to do some initial profiling (based on CPU sampling), and this is what showed up. Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7 *Exclusive samples comparison* Individual functions (or, exclusive sample percentage with threshold 1% difference) ||Comparison Column|| Delta|| Baseline Value|| Comparison Value|| |ut_delay| {color:red}4.02{color}| 16.05| 20.07| |l_find| {color:red}2.44{color}| 0.16 |2.60| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60| |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00| |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29| In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb. *Inclusive samples comparison* Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example) ||Comparison Column ||Delta||Baseline Value|| Comparison Value|| |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22 |ut_delay|4.02|16.05|20.07 |trx_purge|3.81|5.67|9.48 |srv_do_purge|3.80|5.70|9.49 |row_purge|3.66|0.00|3.67 |row_purge_step|3.47|0.71|4.18 |que_thr_step|3.31|0.94|4.25 |srv_purge_coordinator_thread|3.25|6.37|9.63 |trx_sys_t::clone_oldest_view|3.18|0.00|3.18 |btr_cur_search_to_nth_level_func|3.12|0.00|3.12 |row_purge_record_func|2.94|0.00|2.94 |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77 |row_purge_reset_trx_id|2.72|0.00|2.72 |que_run_threads_low|2.70|1.73|4.43 |que_run_threads|2.69|1.75|4.45 |l_find|2.46|0.18|2.64 |ReadView::snapshot|2.30|0.00|2.30 |rw_trx_hash_t::iterate|2.27|0.00|2.27 |lf_hash_iterate|2.25|0.00|2.25 |srv_task_execute|2.12|1.95|4.08 |row_purge_reposition_pcur|1.53|0.00|1.53 |row_search_on_row_ref|1.33|0.00|1.33 |btr_pcur_open_low|1.31|0.00|1.31 |btr_search_guess_on_hash|1.31|1.54|2.85 |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62 |trx_undo_assign_low|1.07|0.00|1.07 |mtr_t::commit|1.05|2.57|3.62 |mtr_t::Command::execute|1.01|2.42|3.43 |srv_worker_thread|-1.09|9.84|8.75 |sync_array_wait_event|-1.29|9.69|8.40 |trx_write_serialisation_history|-1.34|2.62|1.28 |trx_commit_low|-1.39|3.15|1.76 |innobase_commit|-1.75|47.36|45.61 |trx_commit|-1.75|4.79|3.04 |ha_commit_one_phase|-1.76|47.55|45.78 |commit_one_phase_2|-1.76|47.54|45.77 |trans_commit_stmt|-1.80|48.68|46.88 |ha_commit_trans|-1.80|48.65|46.84 |btr_cur_search_to_nth_level|-1.85|1.85|0.00 |innobase_commit_ordered_2|-2.04|5.17|3.13 |trx_commit_for_mysql|-2.05|5.10|3.05 |innobase_commit_low|-2.05|5.11|3.06 |Prepared_statement::execute_loop|-2.05|74.58|72.52 |mysql_stmt_execute_common|-2.14|74.85|72.71 |mysqld_stmt_execute|-2.15|74.89|72.74 |mysql_execute_command|-2.21|73.03|70.81 |trx_undo_assign_undo|-2.24|2.24|0.00 |Prepared_statement::execute|-2.24|74.03|71.79 |threadpool_process_request|-2.32|81.71|79.38 |dispatch_command|-2.33|79.21|76.88 |do_command|-2.34|81.33|79.00 |tp_callback|-2.38|82.66|80.28 |srv_resume_thread|-3.33|7.94|4.61 |os_event::wait_low|-4.98|28.41|23.43 |SleepConditionVariableCS|-5.41|28.70|23.29 |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00 |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue relates to |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Assignee | Marko Mäkelä [ marko ] | Sergey Vojtovich [ svoj ] |
Attachment | winpmp_10_3_7.txt [ 45665 ] |
Attachment | winpmp_10_2_14.txt [ 45666 ] |
Link | This issue relates to MDEV-16232 [ MDEV-16232 ] |
Link |
This issue relates to |
Affects Version/s | 10.3 [ 22126 ] |
Fix Version/s | 10.3 [ 22126 ] |
Link |
This issue relates to |
Link | This issue relates to MDEV-17353 [ MDEV-17353 ] |
Link |
This issue is duplicated by |
Fix Version/s | 10.4 [ 22408 ] |
Link |
This issue relates to |
Assignee | Sergey Vojtovich [ svoj ] | Marko Mäkelä [ marko ] |
issue.field.resolutiondate | 2020-11-17 13:34:08.0 | 2020-11-17 13:34:08.664 |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Won't Fix [ 2 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Workflow | MariaDB v3 [ 87225 ] | MariaDB v4 [ 154359 ] |
Assigning to marko by default Innodb guru. svoj, you can jump in, if you want to repeat, or have any idea how to fix. axel, same for you.