Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16168

Performance regression on sysbench write benchmarks from 10.2 to 10.3

Details

    Description

      I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
      Here is the setup

      • Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)

      Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
      Sockets: 1
      Virtual processors: 16

      • MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
        the relevant part of my.ini file is as follows

        [mysqld]
        back_log=500
        max_connections=4096
        max_prepared_stmt_count=500000
        table_open_cache=10000
        transaction_isolation=REPEATABLE-READ
        innodb_status_file=0
        innodb_buffer_pool_size=20G
        innodb_log_file_size=15G
        innodb_log_buffer_size=1000M 
        innodb-io-capacity=4000
        innodb-io-capacity-max=5000
        innodb_doublewrite=0
        innodb-page-cleaners=1
        innodb-buffer-pool-instances=1
        innodb_adaptive_flushing=0
        innodb_adaptive_flushing_lwm=10
        

      • sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

      The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

      Loading table, sysbench prepare

      sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1  --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL  --num-threads=1 prepare
      

      sysbench run

      sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1  --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL  --num-threads=%N% run
      

      where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

      The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

      Below are the results from the runs, for 10.2.4 and recent 10.3.7
      UPDATE: marko mentioned the git 1a62c8a39647db00283b4e35fb6db3c7bc31e3ca as the version right before 10.3 switched to lockless, so
      I added a column for it too. It appears to be better than 10.3.7, but worse than 10.2.14

      Users TPS 10.2.14 TPS 10.3.7 10.3-pre-lockless
      1 1549.92 1547.75 1551.07
      2 2373.89 2209.05 2334.80
      4 4466.03 4276.46 4446.64
      8 9069.82 8587.59 9098.95
      16 16631.45 15719.70 16231.39
      32 28989.79 27482.44 27786.32
      64 36996.12 34843.23 35089.44
      128 37287.05 35172.49 36223.68
      256 38038.01 35896.52 36818.22
      512 38360.71 36567.07 37195.47
      1024 38265.25 36328.62 37253.98
      2048 39990.85 36328.62 38572.81
      4096 41988.71 39032.83 39606.38

      (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
      It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
      It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

      I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

      Below., in all places, baseline refers to 10.2.14, and *comparison" is 10.3.7

      Exclusive samples comparison
      Individual functions (or, exclusive sample percentage with threshold 1% difference)

      Comparison Column Delta Baseline Value Comparison Value
      ut_delay 4.02 16.05 20.07
      l_find 2.44 0.16 2.60
      PolicyMutex<TTASEventMutex<GenericPolicy> >::enter 2.42 0.18 2.60
      TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00
      SleepConditionVariableCS -5.41 28.70 23.29

      In short, more busy wait ut_delay, less lazy wait SleepConditionVariableCS . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
      Inclusive samples comparison
      Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

      Comparison Column Delta Baseline Value Comparison Value
      PolicyMutex<TTASEventMutex<GenericPolicy> >::enter 28.69 2.53 31.22
      ut_delay 4.02 16.05 20.07
      trx_purge 3.81 5.67 9.48
      srv_do_purge 3.80 5.70 9.49
      row_purge 3.66 0.00 3.67
      row_purge_step 3.47 0.71 4.18
      que_thr_step 3.31 0.94 4.25
      srv_purge_coordinator_thread 3.25 6.37 9.63
      trx_sys_t::clone_oldest_view 3.18 0.00 3.18
      btr_cur_search_to_nth_level_func 3.12 0.00 3.12
      row_purge_record_func 2.94 0.00 2.94
      row_purge_upd_exist_or_extern_func 2.77 0.00 2.77
      row_purge_reset_trx_id 2.72 0.00 2.72
      que_run_threads_low 2.70 1.73 4.43
      que_run_threads 2.69 1.75 4.45
      l_find 2.46 0.18 2.64
      ReadView::snapshot 2.30 0.00 2.30
      rw_trx_hash_t::iterate 2.27 0.00 2.27
      lf_hash_iterate 2.25 0.00 2.25
      srv_task_execute 2.12 1.95 4.08
      row_purge_reposition_pcur 1.53 0.00 1.53
      row_search_on_row_ref 1.33 0.00 1.33
      btr_pcur_open_low 1.31 0.00 1.31
      btr_search_guess_on_hash 1.31 1.54 2.85
      trx_purge_wait_for_workers_to_complete 1.10 1.52 2.62
      trx_undo_assign_low 1.07 0.00 1.07
      mtr_t::commit 1.05 2.57 3.62
      mtr_t::Command::execute 1.01 2.42 3.43
      srv_worker_thread -1.09 9.84 8.75
      sync_array_wait_event -1.29 9.69 8.40
      trx_write_serialisation_history -1.34 2.62 1.28
      trx_commit_low -1.39 3.15 1.76
      innobase_commit -1.75 47.36 45.61
      trx_commit -1.75 4.79 3.04
      ha_commit_one_phase -1.76 47.55 45.78
      commit_one_phase_2 -1.76 47.54 45.77
      trans_commit_stmt -1.80 48.68 46.88
      ha_commit_trans -1.80 48.65 46.84
      btr_cur_search_to_nth_level -1.85 1.85 0.00
      innobase_commit_ordered_2 -2.04 5.17 3.13
      trx_commit_for_mysql -2.05 5.10 3.05
      innobase_commit_low -2.05 5.11 3.06
      Prepared_statement::execute_loop -2.05 74.58 72.52
      mysql_stmt_execute_common -2.14 74.85 72.71
      mysqld_stmt_execute -2.15 74.89 72.74
      mysql_execute_command -2.21 73.03 70.81
      trx_undo_assign_undo -2.24 2.24 0.00
      Prepared_statement::execute -2.24 74.03 71.79
      threadpool_process_request -2.32 81.71 79.38
      dispatch_command -2.33 79.21 76.88
      do_command -2.34 81.33 79.00
      tp_callback -2.38 82.66 80.28
      srv_resume_thread -3.33 7.94 4.61
      os_event::wait_low -4.98 28.41 23.43
      SleepConditionVariableCS -5.41 28.70 23.29
      TTASEventMutex<GenericPolicy>::enter -30.49 30.49 0.00

      Attachments

        1. 10.3.pre-lockless.txt
          29 kB
        2. 1bp-1pc-10.2.14.txt
          29 kB
        3. 1bp-1pc-10.3.7.txt
          29 kB
        4. winpmp_10_2_14.txt
          317 kB
        5. winpmp_10_3_7.txt
          265 kB

        Issue Links

          Activity

            wlad Vladislav Vaintroub created issue -
            wlad Vladislav Vaintroub made changes -
            Field Original Value New Value
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build)
            the my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_connect_errors=5000
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            enable-named-pipe
            character_set_server=latin1
            collation_server=latin1_swedish_ci
            skip-character-set-client-handshake
            innodb_undo_log_truncate=off
            innodb_doublewrite=0
            #innodb-max-purge-lag=1000000
            #innodb_log_files_in_group=10
            #innodb_max_purge_lag_delay=300000
            #innodb_max_purge_lag=0
            #innodb-write-io-threads=32
            #innodb-page-size=4K
            [mariadb]
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            #innodb-undo-tablespaces=2
            #innodb_buffer_pool_dump_pct=100
            #innodb_max_dirty_pages_pct=99
            #innodb_max_dirty_pages_pct_lwm=80
            innodb_adaptive_flushing_lwm=10

            [client]
            protocol=pipe
            * sysbench 0.4
             
            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build)
            the my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_connect_errors=5000
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            enable-named-pipe
            character_set_server=latin1
            collation_server=latin1_swedish_ci
            skip-character-set-client-handshake
            innodb_undo_log_truncate=off
            innodb_doublewrite=0
            #innodb-max-purge-lag=1000000
            #innodb_log_files_in_group=10
            #innodb_max_purge_lag_delay=300000
            #innodb_max_purge_lag=0
            #innodb-write-io-threads=32
            #innodb-page-size=4K
            [mariadb]
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            #innodb-undo-tablespaces=2
            #innodb_buffer_pool_dump_pct=100
            #innodb_max_dirty_pages_pct=99
            #innodb_max_dirty_pages_pct_lwm=80
            innodb_adaptive_flushing_lwm=10

            [client]
            protocol=pipe
            * sysbench 0.4
             
            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users||10.2.14||10.3.7
            |1|
            |2|
            |
            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users||10.2.14||10.3.7
            |1|
            |2|
            |
            |I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|
            wlad Vladislav Vaintroub made changes -
            Description |I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|
            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling and

            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling and

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions
            Comparison Column Delta Baseline Value Comparison Value
            ut_delay 4.02 16.05 20.07
            l_find 2.44 0.16 2.60
            PolicyMutex<TTASEventMutex<GenericPolicy> >::enter 2.42 0.18 2.60
            TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00
            SleepConditionVariableCS -5.41 28.70 23.29

            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions
            Comparison Column Delta Baseline Value Comparison Value
            ut_delay 4.02 16.05 20.07
            l_find 2.44 0.16 2.60
            PolicyMutex<TTASEventMutex<GenericPolicy> >::enter 2.42 0.18 2.60
            TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00
            SleepConditionVariableCS -5.41 28.70 23.29

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| 4.02| 16.05| 20.07|
            |l_find| 2.44| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| 2.42|0.18| 2.60|
            TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00
            SleepConditionVariableCS -5.41 28.70 23.29

            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| 4.02| 16.05| 20.07|
            |l_find| 2.44| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| 2.42|0.18| 2.60|
            TTASEventMutex<GenericPolicy>::enter -4.42 4.42 0.00
            SleepConditionVariableCS -5.41 28.70 23.29

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives this

            ||Comparison Column ||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            <sync_array_wait_event|-1.29|9.69|8.40
            trx_write_serialisation_history|-1.34|2.62|1.28
            trx_commit_low|-1.39|3.15|1.76
            innobase_commit|-1.75|47.36|45.61
            trx_commit|-1.75|4.79|3.04
            ha_commit_one_phase|-1.76|47.55|45.78
            commit_one_phase_2|-1.76|47.54|45.77
            trans_commit_stmt|-1.80|48.68|46.88
            ha_commit_trans|-1.80|48.65|46.84
            btr_cur_search_to_nth_level|-1.85|1.85|0.00
            innobase_commit_ordered_2|-2.04|5.17|3.13
            trx_commit_for_mysql|-2.05|5.10|3.05
            innobase_commit_low|-2.05|5.11|3.06
            Prepared_statement::execute_loop|-2.05|74.58|72.52
            mysql_stmt_execute_common|-2.14|74.85|72.71
            mysqld_stmt_execute|-2.15|74.89|72.74
            mysql_execute_command|-2.21|73.03|70.81
            trx_undo_assign_undo|-2.24|2.24|0.00
            Prepared_statement::execute|-2.24|74.03|71.79
            threadpool_process_request|-2.32|81.71|79.38
            dispatch_command|-2.33|79.21|76.88
            do_command|-2.34|81.33|79.00
            tp_callback|-2.38|82.66|80.28
            srv_resume_thread|-3.33|7.94|4.61
            os_event::wait_low|-4.98|28.41|23.43
            SleepConditionVariableCS|-5.41|28.70|23.29
            TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00


            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives this

            ||Comparison Column ||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            <sync_array_wait_event|-1.29|9.69|8.40
            trx_write_serialisation_history|-1.34|2.62|1.28
            trx_commit_low|-1.39|3.15|1.76
            innobase_commit|-1.75|47.36|45.61
            trx_commit|-1.75|4.79|3.04
            ha_commit_one_phase|-1.76|47.55|45.78
            commit_one_phase_2|-1.76|47.54|45.77
            trans_commit_stmt|-1.80|48.68|46.88
            ha_commit_trans|-1.80|48.65|46.84
            btr_cur_search_to_nth_level|-1.85|1.85|0.00
            innobase_commit_ordered_2|-2.04|5.17|3.13
            trx_commit_for_mysql|-2.05|5.10|3.05
            innobase_commit_low|-2.05|5.11|3.06
            Prepared_statement::execute_loop|-2.05|74.58|72.52
            mysql_stmt_execute_common|-2.14|74.85|72.71
            mysqld_stmt_execute|-2.15|74.89|72.74
            mysql_execute_command|-2.21|73.03|70.81
            trx_undo_assign_undo|-2.24|2.24|0.00
            Prepared_statement::execute|-2.24|74.03|71.79
            threadpool_process_request|-2.32|81.71|79.38
            dispatch_command|-2.33|79.21|76.88
            do_command|-2.34|81.33|79.00
            tp_callback|-2.38|82.66|80.28
            srv_resume_thread|-3.33|7.94|4.61
            os_event::wait_low|-4.98|28.41|23.43
            SleepConditionVariableCS|-5.41|28.70|23.29
            TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00


            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there is now more CPU spent in purge threads, and not in foreground ones

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there is now more CPU spent in purge threads, and not in foreground ones

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , because foreground use less of it (decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            wlad Vladislav Vaintroub made changes -
            Summary [Draft]Performance regression on sysbench write benchmarks from 10.2 to 10.3 Performance regression on sysbench write benchmarks from 10.2 to 10.3
            wlad Vladislav Vaintroub made changes -
            Assignee Marko Mäkelä [ marko ]
            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , because foreground use less of it (decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling, and this is what showed up)

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.


            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *
            Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *
            Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            wlad Vladislav Vaintroub made changes -
            Attachment 1bp-1pc-10.2.14.txt [ 45632 ]
            Attachment 1bp-1pc-10.3.7.txt [ 45633 ]
            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure box with 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)
            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)

            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
            Sockets: 1
            Virtual processors: 16

            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            wlad Vladislav Vaintroub made changes -
            Attachment 10.3.pre-lockless.txt [ 45641 ]
            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)

            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
            Sockets: 1
            Virtual processors: 16

            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)

            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
            Sockets: 1
            Virtual processors: 16

            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7||
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            wlad Vladislav Vaintroub made changes -
            Description I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)

            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
            Sockets: 1
            Virtual processors: 16

            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7

            ||Users|| TPS 10.2.14|| TPS 10.3.7||
            |1|1549.92|1547.75|
            |2|2373.89|2209.05|
            |4|4466.03|4276.46|
            |8|9069.82|8587.59|
            |16|16631.45|15719.70|
            |32|28989.79|27482.44|
            |64|36996.12|34843.23|
            |128|37287.05|35172.49|
            |256|38038.01|35896.52|
            |512|38360.71|36567.07|
            |1024|38265.25|36328.62|
            |2048|39990.85|36328.62|
            |4096|41988.71|39032.83|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            I noticed reduced performance numbers when running somewhat large update_no_index sysbench benchmark, when comparing 10.2 to 10.3
            Here is the setup

            * Windows Azure VMwith 16 virtual CPUs, 32GB RAM and SSD storage (I used Local temporary disk)

            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
            Sockets: 1
            Virtual processors: 16

            * MariaDB 10.2.14 vs 10.3.7 (recent build of 4a5e23e257e229b548599133dbed5162af9df6d9)
            the relevant part of my.ini file is as follows
            {noformat}
            [mysqld]
            back_log=500
            max_connections=4096
            max_prepared_stmt_count=500000
            table_open_cache=10000
            transaction_isolation=REPEATABLE-READ
            innodb_status_file=0
            innodb_buffer_pool_size=20G
            innodb_log_file_size=15G
            innodb_log_buffer_size=1000M
            innodb-io-capacity=4000
            innodb-io-capacity-max=5000
            innodb_doublewrite=0
            innodb-page-cleaners=1
            innodb-buffer-pool-instances=1
            innodb_adaptive_flushing=0
            innodb_adaptive_flushing_lwm=10
            {noformat}
            * sysbench 0.4 (I mention it because current versions of sysbench do not run on Windows anymore, but we do not need anything from the current versions)

            The update-no-index run on a single large table (50 mio rows), with number of users ranging from 1 to 4096, in powers of 2

            *Loading table, sysbench prepare*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=1 prepare
            {noformat}
             
            *sysbench run*
            {noformat}
            sysbench --test=oltp --oltp-table-size=50000000 --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password= --db-driver=mysql --mysql-table-engine=innodb --max-time=300 --oltp-test-mode=complex --oltp-read-only=off --max-requests=0 --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-delete-inserts=0 --oltp-non-index-updates=1 --oltp-skip-trx=on --oltp-dist-type=uniform --mysql-socket=MySQL --num-threads=%N% run
            {noformat}

            where %N% is 1,2,4,8,16,32,64,128,256,512,1024,2048,4096

            The test takes a nap of 80 seconds between runs, and also between load and first test, and as far as I can see it is enough for all Innodb backgroup IO activity to finish (MB/sec goes down to 0 on a disk where test is ran). Due to innodb-flush-log-at-trx-commit being default (1), the test does not appear to be very CPU bound (sysbench and mysql use 70% of all available 16 CPUs, in windows measurement, i.e over 10 CPUs are busy)

            Below are the results from the runs, for 10.2.4 and recent 10.3.7
            *UPDATE:* [~marko] mentioned the git 1a62c8a39647db00283b4e35fb6db3c7bc31e3ca as the version right before 10.3 switched to lockless, so
            I added a column for it too. It appears to be better than 10.3.7, but worse than 10.2.14

            ||Users|| TPS 10.2.14|| TPS 10.3.7|| 10.3-pre-lockless||
            |1|1549.92|1547.75|1551.07|
            |2|2373.89|2209.05|2334.80|
            |4|4466.03|4276.46|4446.64|
            |8|9069.82|8587.59|9098.95|
            |16|16631.45|15719.70|16231.39|
            |32|28989.79|27482.44|27786.32|
            |64|36996.12|34843.23|35089.44|
            |128|37287.05|35172.49|36223.68|
            |256|38038.01|35896.52|36818.22|
            |512|38360.71|36567.07|37195.47|
            |1024|38265.25|36328.62|37253.98|
            |2048|39990.85|36328.62|38572.81|
            |4096|41988.71|39032.83|39606.38|

            (also see the graph https://docs.google.com/spreadsheets/d/1VqRYCwF4QATCKEKwxKuCcvRXKm2GWnICCdv4fSo4_IU/edit#gid=0)
            It is not as big as I thought initially (somehow I counted 15%, and it is more like 5%), but it is consistent starting from 64 users, and keeping until the 4K . So it needs investigating.
            It is odd because the test itself is not CPU bound, nor IO bound, the bottleneck in my understanding is waiting for flush in log_write_up_to() . I did not try to make it CPU bound by relaxing durability, this might be another exercise, however with increased TPS purging can become more of a bottleneck, and disturb the picture.

            I tried to do some initial profiling (based on CPU sampling), and this is what showed up.

            Below., in all places, *baseline* refers to 10.2.14, and *comparison" is 10.3.7

            *Exclusive samples comparison*
            Individual functions (or, exclusive sample percentage with threshold 1% difference)

            ||Comparison Column|| Delta|| Baseline Value|| Comparison Value||
            |ut_delay| {color:red}4.02{color}| 16.05| 20.07|
            |l_find| {color:red}2.44{color}| 0.16 |2.60|
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter| {color:red}2.42{color}|0.18| 2.60|
            |TTASEventMutex<GenericPolicy>::enter| {color:green}-4.42{color}| 4.42| 0.00|
            |SleepConditionVariableCS|{color:green}-5.41{color}| 28.70| 23.29|

            In short, more busy wait _ut_delay_, less lazy wait _SleepConditionVariableCS_ . There is a new quite visible function (4th most expensive individual functions now), l_find , which seems to come from replacing stl with custom hashtable in innodb.
            *Inclusive samples comparison*
            Comparing inclusive samples with 1% difference threshold, gives the below table, which gives me a vague idea that there something is more expensive in purge, and again that lazy wait was replaced with busy wait somewhere, background threads probably use more CPU , for example "coordinator" purge thread. foreground use less of it (as shown by decreased time in do_command for example)

            ||Comparison Column ||Delta||Baseline Value|| Comparison Value||
            |PolicyMutex<TTASEventMutex<GenericPolicy> >::enter|28.69|2.53|31.22
            |ut_delay|4.02|16.05|20.07
            |trx_purge|3.81|5.67|9.48
            |srv_do_purge|3.80|5.70|9.49
            |row_purge|3.66|0.00|3.67
            |row_purge_step|3.47|0.71|4.18
            |que_thr_step|3.31|0.94|4.25
            |srv_purge_coordinator_thread|3.25|6.37|9.63
            |trx_sys_t::clone_oldest_view|3.18|0.00|3.18
            |btr_cur_search_to_nth_level_func|3.12|0.00|3.12
            |row_purge_record_func|2.94|0.00|2.94
            |row_purge_upd_exist_or_extern_func|2.77|0.00|2.77
            |row_purge_reset_trx_id|2.72|0.00|2.72
            |que_run_threads_low|2.70|1.73|4.43
            |que_run_threads|2.69|1.75|4.45
            |l_find|2.46|0.18|2.64
            |ReadView::snapshot|2.30|0.00|2.30
            |rw_trx_hash_t::iterate|2.27|0.00|2.27
            |lf_hash_iterate|2.25|0.00|2.25
            |srv_task_execute|2.12|1.95|4.08
            |row_purge_reposition_pcur|1.53|0.00|1.53
            |row_search_on_row_ref|1.33|0.00|1.33
            |btr_pcur_open_low|1.31|0.00|1.31
            |btr_search_guess_on_hash|1.31|1.54|2.85
            |trx_purge_wait_for_workers_to_complete|1.10|1.52|2.62
            |trx_undo_assign_low|1.07|0.00|1.07
            |mtr_t::commit|1.05|2.57|3.62
            |mtr_t::Command::execute|1.01|2.42|3.43
            |srv_worker_thread|-1.09|9.84|8.75
            |sync_array_wait_event|-1.29|9.69|8.40
            |trx_write_serialisation_history|-1.34|2.62|1.28
            |trx_commit_low|-1.39|3.15|1.76
            |innobase_commit|-1.75|47.36|45.61
            |trx_commit|-1.75|4.79|3.04
            |ha_commit_one_phase|-1.76|47.55|45.78
            |commit_one_phase_2|-1.76|47.54|45.77
            |trans_commit_stmt|-1.80|48.68|46.88
            |ha_commit_trans|-1.80|48.65|46.84
            |btr_cur_search_to_nth_level|-1.85|1.85|0.00
            |innobase_commit_ordered_2|-2.04|5.17|3.13
            |trx_commit_for_mysql|-2.05|5.10|3.05
            |innobase_commit_low|-2.05|5.11|3.06
            |Prepared_statement::execute_loop|-2.05|74.58|72.52
            |mysql_stmt_execute_common|-2.14|74.85|72.71
            |mysqld_stmt_execute|-2.15|74.89|72.74
            |mysql_execute_command|-2.21|73.03|70.81
            |trx_undo_assign_undo|-2.24|2.24|0.00
            |Prepared_statement::execute|-2.24|74.03|71.79
            |threadpool_process_request|-2.32|81.71|79.38
            |dispatch_command|-2.33|79.21|76.88
            |do_command|-2.34|81.33|79.00
            |tp_callback|-2.38|82.66|80.28
            |srv_resume_thread|-3.33|7.94|4.61
            |os_event::wait_low|-4.98|28.41|23.43
            |SleepConditionVariableCS|-5.41|28.70|23.29
            |TTASEventMutex<GenericPolicy>::enter|-30.49|30.49|0.00

            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Sergey Vojtovich [ svoj ]
            wlad Vladislav Vaintroub made changes -
            Attachment winpmp_10_3_7.txt [ 45665 ]
            wlad Vladislav Vaintroub made changes -
            Attachment winpmp_10_2_14.txt [ 45666 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Affects Version/s 10.3 [ 22126 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.3 [ 22126 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -
            julien.fritsch Julien Fritsch made changes -
            Assignee Sergey Vojtovich [ svoj ] Marko Mäkelä [ marko ]
            wlad Vladislav Vaintroub made changes -
            issue.field.resolutiondate 2020-11-17 13:34:08.0 2020-11-17 13:34:08.664
            wlad Vladislav Vaintroub made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Won't Fix [ 2 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 87225 ] MariaDB v4 [ 154359 ]

            People

              marko Marko Mäkelä
              wlad Vladislav Vaintroub
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.