Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21724

Optimize page_cur_insert_rec_low() redo logging

Details

    Description

      MDEV-12353 introduces a purely physical redo log format. Some effort was made to optimize page_cur_insert_rec_low(), but it is still writing unnecessarily much log. Here is an example:

      --source include/innodb_page_size_small.inc
      CREATE TABLE t1 (a BIGINT PRIMARY KEY, b LONGBLOB) ENGINE=InnoDB;
      create table t2 (f1 bigint primary key, f2 longblob, f3 longblob,
        index(f2(10), f3(10))) engine=innodb;
      create table t3 (f1 bigint primary key, f2 longblob, f3 longblob,
        index(f2(10), f3(10))) engine=innodb row_format=compressed;
      let $shutdown_timeout=0;
      --source include/restart_mysqld.inc
      DROP TABLE t1,t2,t3;
      

      This test case, which used to fail on the 83rd round when running with the ,4k combination (because it would wrongly apply a mini-transaction whose LSN matches the FIL_PAGE_LSN on the page 0:13, or the only page of SYS_FIELDS) , shows that we are emitting suboptimal redo log records for a particular mini-transaction:

      recv_group_scan_log_recs: ib_log: scan 946527: rec 37 len 8 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec b9 len 10 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec b4 len 5 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec 34 len 5 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec d5 len 6 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec b6 len 7 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec d4 len 5 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec b0 len 21 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec 34 len 5 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec d4 len 5 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec b3 len 4 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec 35 len 6 page 0:13
      recv_group_scan_log_recs: ib_log: scan 946527: rec 34 len 5 page 0:13
      

      The decoded form is:

      WRITE(0x28, {0x6, 0xaf, 0x0, 0x28})
      WRITE(0x30, {0x6, 0x94, 0x0, 0x2, 0x0, 0x6, 0x0, 0x1c})
      WRITE(0x4f0, {0x6, 0x94})
      WRITE(0x6e, {0x9})
      MEMMOVE(0x689, {0x6, 0x82, 0xc3}) = from(0x4e7) {0x1b, 0x19, 0x12, 0xc, 0x8, 0x0}
      WRITE(0x68f, {0x1, 0x38, 0xb, 0x0, 0x74})
      MEMMOVE(0x69a, {0x3, 0x82, 0xc3}) = from(0x4f8) {0x1, 0xb2, 0x0}
      WRITE(0x69d, {0x1, 0x0, 0xa, 0x0, 0x0, 0x0, 0x0, 0x6, 0x7e, 0x8d, 0x0, 0x0, 0x3, 0x8a, 0x1, 0xd2, 0x66, 0x33})
      WRITE(0x27, {0x8})
      MEMMOVE(0xfe8, {0x2, 0x2}) = from (0xfea) {0x4, 0x80}
      WRITE(0xfea, {0x4, 0x80})
      WRITE(0x47a, {0x4})
      WRITE(0x6e, {0x5})
      

      • We update the n_owned of the supremum record at 0x6e multiple times.
      • The last MEMMOVE of 2 bytes and WRITE of 2 subsequent bytes could be combined to a single 4-byte WRITE of the page directory, possibly using 4+2 bytes instead of 5+4 bytes.
      • The second-last MEMMOVE of 3 bytes and WRITE of subsequent 18 bytes could be combined to a single record of 21+3 bytes, instead of two records of 5+21 bytes.
      • The WRITE offsets should be in ascending order, so that the same_page encoding can be used.
      • The WRITE to byte 0x27 should be combined with the very first WRITE record to 0x28, using 8+1 bytes instead of 8+5 bytes.

      Attachments

        Issue Links

          Activity

            The following test case, derived from main.sum_distinct-big, is showing a significant regression due to MDEV-12353. It seems to be mostly explained by the increased redo log volume:

            --source include/have_innodb.inc
            --source include/have_sequence.inc
            show status like 'innodb_lsn_current';
            CREATE TABLE t2 ENGINE=InnoDB SELECT * FROM seq_1_to_2097152;
            show status like 'innodb_lsn_current';
            DROP TABLE t2;
            

            revision LSN1 LSN2-LSN1 time
            10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before) 62,998 128,417,353 15.9s
            10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after MDEV-12353) 53,077 214,956,466 25.7s

            Because the perf report output was ‘polluted’ by PERFORMANCE_SCHEMA functions, I decided to rebuild with cmake -DPLUGIN_PERFSCHEMA_NO. I also disabled the concurrent purge activity (MDEV-12288) to remove one source of nondeterminism:

            perf record ./mtr --mysqld=--innodb-force-recovery=2 main.ib
            

            revision LSN1 LSN2-LSN1 time
            10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before) 60,516 128,417,675 14.9s
            10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after MDEV-12353) 51,149 215,006,904 23.9s

            The execution times include the perf record overhead.

            Here is the perf report output (evaluate_join_record() and anything that consumed more time):

            10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before)

              12.02%  mysqld  [.] rec_get_offsets_func
               9.36%  mysqld  [.] buf_page_get_gen
               6.32%  mysqld  [.] cmp_dtuple_rec_with_match_low
               6.31%  mysqld  [.] page_cur_search_with_match
               6.30%  mysqld  [.] btr_cur_search_to_nth_level_func
               2.99%  mysqld  [.] cmp_data
               2.04%  mysqld  [.] row_ins_step
               1.92%  mysqld  [.] mtr_t::commit
               1.88%  mysqld  [.] trx_undo_report_row_operation
               1.68%  mysqld  [.] page_cur_insert_rec_low
               1.63%  mysqld  [.] row_ins_clust_index_entry_low
               1.57%  mysqld  [.] rw_lock_x_lock_func
               1.51%  mysqld  [.] btr_cur_optimistic_insert
               1.39%  mysqld  [.] page_cur_insert_rec_write_log
               1.26%  mysqld  [.] rw_lock_x_lock_wait_func
               1.25%  mysqld  [.] rec_get_converted_size_comp
               1.24%  mysqld  [.] ReleaseLatches::operator()
               1.10%  mysqld  [.] mlog_open_and_write_index
               1.00%  mysqld  [.] row_insert_for_mysql
               1.00%  mysqld  [.] mtr_t::memo_push
               1.00%  mysqld  [.] TTASEventMutex<GenericPolicy>::enter
               0.98%  mysqld  [.] rec_convert_dtuple_to_rec_comp<false>
               0.94%  mysqld  [.] ut_crc32_hw
               0.83%  mysqld  [.] rw_lock_x_lock_low
               0.82%  mysqld  [.] ReleaseBlocks::add_dirty_page_to_flush_list
               0.82%  mysqld  [.] log_margin_checkpoint_age
               0.78%  mysqld  [.] mtr_t::prepare_write
               0.72%  mysqld  [.] TTASEventMutex<BlockMutexPolicy>::enter
               0.70%  mysqld  [.] buf_LRU_remove_block
               0.69%  mysqld  [.] mtr_t::release_block_at_savepoint
               0.66%  mysqld  [.] lock_rec_insert_check_and_lock
               0.66%  mysqld  [.] btr_node_ptr_max_size
               0.63%  mysqld  [.] ha_seq::index_next
               0.62%  mysqld  [.] mtr_t::finish_write
               0.60%  mysqld  [.] handler::ha_index_next
               0.59%  mysqld  [.] btr_cur_ins_lock_and_undo
               0.58%  mysqld  [.] fill_record
               0.57%  mysqld  [.] write_record
               0.56%  mysqld  [.] page_cur_tuple_insert
               0.53%  mysqld  [.] handler::ha_write_row
               0.52%  mysqld  [.] row_mysql_store_col_in_innobase_format
               0.52%  mysqld  [.] ha_innobase::write_row
               0.50%  mysqld  [.] evaluate_join_record
            

            10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after)

              10.03%  mysqld  [.] rec_get_offsets_func
               9.41%  mysqld  [.] buf_page_get_gen
               7.35%  mysqld  [.] page_cur_search_with_match
               5.60%  mysqld  [.] btr_cur_search_to_nth_level_func
               5.46%  mysqld  [.] cmp_dtuple_rec_with_match_low
               3.44%  mysqld  [.] page_cur_insert_rec_low
               3.29%  mysqld  [.] mtr_t::log_write<(unsigned char)48>
               2.64%  mysqld  [.] cmp_data
               2.01%  mysqld  [.] mtr_t::commit
               1.94%  mysqld  [.] row_ins_step
               1.92%  mysqld  [.] trx_undo_report_row_operation
               1.84%  mysqld  [.] rec_get_converted_size_comp
               1.64%  mysqld  [.] mtr_t::memcpy_low
               1.49%  mysqld  [.] row_ins_clust_index_entry_low
               1.47%  mysqld  [.] rw_lock_x_lock_func
               1.40%  mysqld  [.] btr_cur_optimistic_insert
               1.30%  mysqld  [.] rec_convert_dtuple_to_rec_comp<false>
               1.17%  mysqld  [.] mtr_t::memo_push
               1.16%  mysqld  [.] rw_lock_x_lock_wait_func
               1.09%  mysqld  [.] ReleaseLatches::operator()
               1.08%  mysqld  [.] ut_crc32_hw
               1.03%  mysqld  [.] row_insert_for_mysql
               0.81%  mysqld  [.] handler::ha_index_next
               0.79%  mysqld  [.] TTASEventMutex<GenericPolicy>::enter
               0.77%  mysqld  [.] ReleaseBlocks::add_dirty_page_to_flush_list
               0.77%  mysqld  [.] btr_node_ptr_max_size
               0.74%  mysqld  [.] rw_lock_x_lock_low
               0.73%  mysqld  [.] lock_rec_insert_check_and_lock
               0.71%  mysqld  [.] mtr_t::prepare_write
               0.69%  mysqld  [.] log_margin_checkpoint_age
               0.68%  mysqld  [.] TTASEventMutex<BlockMutexPolicy>::enter
               0.60%  mysqld  [.] ha_innobase::write_row
               0.57%  mysqld  [.] mtr_t::release_block_at_savepoint
               0.55%  mysqld  [.] row_mysql_store_col_in_innobase_format
               0.55%  mysqld  [.] btr_cur_ins_lock_and_undo
               0.54%  mysqld  [.] buf_LRU_remove_block
               0.54%  mysqld  [.] evaluate_join_record
            

            As expected, rec_get_offsets_func() was removed from the top, because we no longer call it when writing log for an insert.
            The function trx_undo_report_row_operation() is taking slightly more time, and mtr_t::log_write<WRITE>() and mtr_t::memcpy_low() are new entries related to the new redo log format. We also spend some more time on ut_crc32_hw() due to the 67% increase of redo log volume.

            With a single-row INSERT, there is no regression:

            --source include/have_innodb.inc
            CREATE TABLE t2 (a BIGINT) ENGINE=InnoDB;
            show status like 'innodb_lsn_current';
            INSERT INTO t2 VALUES(1);
            show status like 'innodb_lsn_current';
            DROP TABLE t2;
            

            ./mtr --mysqld=--innodb-force-recovery=2 main.ib1
            

            revision LSN1 LSN2-LSN1
            before 61,792 204
            after 52,361 175
            marko Marko Mäkelä added a comment - The following test case, derived from main.sum_distinct-big , is showing a significant regression due to MDEV-12353 . It seems to be mostly explained by the increased redo log volume: --source include/have_innodb.inc --source include/have_sequence.inc show status like 'innodb_lsn_current' ; CREATE TABLE t2 ENGINE=InnoDB SELECT * FROM seq_1_to_2097152; show status like 'innodb_lsn_current' ; DROP TABLE t2; revision LSN1 LSN2-LSN1 time 10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before) 62,998 128,417,353 15.9s 10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after MDEV-12353 ) 53,077 214,956,466 25.7s Because the perf report output was ‘polluted’ by PERFORMANCE_SCHEMA functions, I decided to rebuild with cmake -DPLUGIN_PERFSCHEMA_NO . I also disabled the concurrent purge activity ( MDEV-12288 ) to remove one source of nondeterminism: perf record ./mtr --mysqld=--innodb-force-recovery=2 main.ib revision LSN1 LSN2-LSN1 time 10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before) 60,516 128,417,675 14.9s 10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after MDEV-12353 ) 51,149 215,006,904 23.9s The execution times include the perf record overhead. Here is the perf report output ( evaluate_join_record() and anything that consumed more time): 10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before) 12.02% mysqld [.] rec_get_offsets_func 9.36% mysqld [.] buf_page_get_gen 6.32% mysqld [.] cmp_dtuple_rec_with_match_low 6.31% mysqld [.] page_cur_search_with_match 6.30% mysqld [.] btr_cur_search_to_nth_level_func 2.99% mysqld [.] cmp_data 2.04% mysqld [.] row_ins_step 1.92% mysqld [.] mtr_t::commit 1.88% mysqld [.] trx_undo_report_row_operation 1.68% mysqld [.] page_cur_insert_rec_low 1.63% mysqld [.] row_ins_clust_index_entry_low 1.57% mysqld [.] rw_lock_x_lock_func 1.51% mysqld [.] btr_cur_optimistic_insert 1.39% mysqld [.] page_cur_insert_rec_write_log 1.26% mysqld [.] rw_lock_x_lock_wait_func 1.25% mysqld [.] rec_get_converted_size_comp 1.24% mysqld [.] ReleaseLatches::operator() 1.10% mysqld [.] mlog_open_and_write_index 1.00% mysqld [.] row_insert_for_mysql 1.00% mysqld [.] mtr_t::memo_push 1.00% mysqld [.] TTASEventMutex<GenericPolicy>::enter 0.98% mysqld [.] rec_convert_dtuple_to_rec_comp<false> 0.94% mysqld [.] ut_crc32_hw 0.83% mysqld [.] rw_lock_x_lock_low 0.82% mysqld [.] ReleaseBlocks::add_dirty_page_to_flush_list 0.82% mysqld [.] log_margin_checkpoint_age 0.78% mysqld [.] mtr_t::prepare_write 0.72% mysqld [.] TTASEventMutex<BlockMutexPolicy>::enter 0.70% mysqld [.] buf_LRU_remove_block 0.69% mysqld [.] mtr_t::release_block_at_savepoint 0.66% mysqld [.] lock_rec_insert_check_and_lock 0.66% mysqld [.] btr_node_ptr_max_size 0.63% mysqld [.] ha_seq::index_next 0.62% mysqld [.] mtr_t::finish_write 0.60% mysqld [.] handler::ha_index_next 0.59% mysqld [.] btr_cur_ins_lock_and_undo 0.58% mysqld [.] fill_record 0.57% mysqld [.] write_record 0.56% mysqld [.] page_cur_tuple_insert 0.53% mysqld [.] handler::ha_write_row 0.52% mysqld [.] row_mysql_store_col_in_innobase_format 0.52% mysqld [.] ha_innobase::write_row 0.50% mysqld [.] evaluate_join_record 10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after) 10.03% mysqld [.] rec_get_offsets_func 9.41% mysqld [.] buf_page_get_gen 7.35% mysqld [.] page_cur_search_with_match 5.60% mysqld [.] btr_cur_search_to_nth_level_func 5.46% mysqld [.] cmp_dtuple_rec_with_match_low 3.44% mysqld [.] page_cur_insert_rec_low 3.29% mysqld [.] mtr_t::log_write<(unsigned char)48> 2.64% mysqld [.] cmp_data 2.01% mysqld [.] mtr_t::commit 1.94% mysqld [.] row_ins_step 1.92% mysqld [.] trx_undo_report_row_operation 1.84% mysqld [.] rec_get_converted_size_comp 1.64% mysqld [.] mtr_t::memcpy_low 1.49% mysqld [.] row_ins_clust_index_entry_low 1.47% mysqld [.] rw_lock_x_lock_func 1.40% mysqld [.] btr_cur_optimistic_insert 1.30% mysqld [.] rec_convert_dtuple_to_rec_comp<false> 1.17% mysqld [.] mtr_t::memo_push 1.16% mysqld [.] rw_lock_x_lock_wait_func 1.09% mysqld [.] ReleaseLatches::operator() 1.08% mysqld [.] ut_crc32_hw 1.03% mysqld [.] row_insert_for_mysql 0.81% mysqld [.] handler::ha_index_next 0.79% mysqld [.] TTASEventMutex<GenericPolicy>::enter 0.77% mysqld [.] ReleaseBlocks::add_dirty_page_to_flush_list 0.77% mysqld [.] btr_node_ptr_max_size 0.74% mysqld [.] rw_lock_x_lock_low 0.73% mysqld [.] lock_rec_insert_check_and_lock 0.71% mysqld [.] mtr_t::prepare_write 0.69% mysqld [.] log_margin_checkpoint_age 0.68% mysqld [.] TTASEventMutex<BlockMutexPolicy>::enter 0.60% mysqld [.] ha_innobase::write_row 0.57% mysqld [.] mtr_t::release_block_at_savepoint 0.55% mysqld [.] row_mysql_store_col_in_innobase_format 0.55% mysqld [.] btr_cur_ins_lock_and_undo 0.54% mysqld [.] buf_LRU_remove_block 0.54% mysqld [.] evaluate_join_record As expected, rec_get_offsets_func() was removed from the top, because we no longer call it when writing log for an insert. The function trx_undo_report_row_operation() is taking slightly more time, and mtr_t::log_write<WRITE>() and mtr_t::memcpy_low() are new entries related to the new redo log format. We also spend some more time on ut_crc32_hw() due to the 67% increase of redo log volume. With a single-row INSERT , there is no regression: --source include/have_innodb.inc CREATE TABLE t2 (a BIGINT ) ENGINE=InnoDB; show status like 'innodb_lsn_current' ; INSERT INTO t2 VALUES (1); show status like 'innodb_lsn_current' ; DROP TABLE t2; ./mtr --mysqld=--innodb-force-recovery=2 main.ib1 revision LSN1 LSN2-LSN1 before 61,792 204 after 52,361 175

            I think that to highlight the impact of undo log and B-tree operations, it is better to use a variation of the test, so that updates of the DICT_HDR page will be avoided (they will be done even for TEMPORARY TABLE, until MDEV-19506 is fixed):

            --source include/have_innodb.inc
            --source include/have_sequence.inc
            show status like 'innodb_lsn_current';
            CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=InnoDB;
            show status like 'innodb_lsn_current';
            INSERT INTO t2 SELECT * FROM seq_1_to_2097152;
            show status like 'innodb_lsn_current';
            DROP TABLE t2;
            

            Even if I disable the redo logging in btr_page_reorganize_low(), the log volume will not be reduced much. Similarly, there is very little impact of disabling logging in page_copy_rec_list_end_no_locks(), page_copy_rec_list_end(), page_copy_rec_list_start().

            Once I disabled the logging for page_cur_insert_rec_low(), the redo log volume will be reduced by 72%. The rest is almost entirely attributed to trx_undo_page_set_next_prev_and_add() (writing an undo log record).

            marko Marko Mäkelä added a comment - I think that to highlight the impact of undo log and B-tree operations, it is better to use a variation of the test, so that updates of the DICT_HDR page will be avoided (they will be done even for TEMPORARY TABLE , until MDEV-19506 is fixed): --source include/have_innodb.inc --source include/have_sequence.inc show status like 'innodb_lsn_current' ; CREATE TABLE t2 (a INT PRIMARY KEY ) ENGINE=InnoDB; show status like 'innodb_lsn_current' ; INSERT INTO t2 SELECT * FROM seq_1_to_2097152; show status like 'innodb_lsn_current' ; DROP TABLE t2; Even if I disable the redo logging in btr_page_reorganize_low() , the log volume will not be reduced much. Similarly, there is very little impact of disabling logging in page_copy_rec_list_end_no_locks() , page_copy_rec_list_end() , page_copy_rec_list_start() . Once I disabled the logging for page_cur_insert_rec_low() , the redo log volume will be reduced by 72%. The rest is almost entirely attributed to trx_undo_page_set_next_prev_and_add() (writing an undo log record).

            I disabled all other redo log sources and set a breakpoint on mtr_t::finish_write(). Most inserts are logged as 47‥55 bytes, and every time the page directory is split, the length can be as much as 80 bytes. The record payload is 5+4+6+7=22 bytes. We must typically update 4+2+4=10 page header bytes when the page directory does not grow. On about 20% of the cases, we must write at least 2+2+1 more bytes to grow the page directory, but this should be doable in much less than 80 bytes.

            One more idea, in addition to those mentioned in the Description, is that we could avoid writing leading zero bytes for DB_TRX_ID when the page already contains zeroes in that area. That would save space when the transaction ID fits in 32 bits or less.

            In the old format, MLOG_COMP_REC_INSERT is 30 bytes at the start of this test case (with small page numbers and transaction identifiers). We may have to introduce a special redo log record for inserting an index page record. That should fit in less than 30 bytes for 80% of the cases (not adjusting the page directory). Currently, we seem to be writing 60 bytes in the average, which is an 100% increase over the 30 bytes.

            marko Marko Mäkelä added a comment - I disabled all other redo log sources and set a breakpoint on mtr_t::finish_write() . Most inserts are logged as 47‥55 bytes, and every time the page directory is split, the length can be as much as 80 bytes. The record payload is 5+4+6+7=22 bytes. We must typically update 4+2+4=10 page header bytes when the page directory does not grow. On about 20% of the cases, we must write at least 2+2+1 more bytes to grow the page directory, but this should be doable in much less than 80 bytes. One more idea, in addition to those mentioned in the Description, is that we could avoid writing leading zero bytes for DB_TRX_ID when the page already contains zeroes in that area. That would save space when the transaction ID fits in 32 bits or less. In the old format, MLOG_COMP_REC_INSERT is 30 bytes at the start of this test case (with small page numbers and transaction identifiers). We may have to introduce a special redo log record for inserting an index page record. That should fit in less than 30 bytes for 80% of the cases (not adjusting the page directory). Currently, we seem to be writing 60 bytes in the average, which is an 100% increase over the 30 bytes.

            I posted some detailed log size statistics for a simple 2-row INSERT transaction in a comment in MDEV-12353.
            Here are the statistics for the test case in this ticket again. Only the last line is new:

            revision LSN1 LSN2-LSN1 time
            10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before MDEV-12353) 62,998 128,417,353 15.9s
            10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after MDEV-12353) 53,077 214,956,466 25.7s
            10.5 8db623038f7158529e804e9607362939bff37337 (after MDEV-21724) 46,615 91,201,573 7.63s

            In our example, we are writing 29% less log for the 2,097,152-row INSERT and spending 52% less total time than before MDEV-12353. Because log_write_up_to() is calculating log block checksums inside a mutex, even a small decrease in log size can greatly reduce the total execution time. In MDEV-14425 we will hopefully improve the log writing further, by only doing a memcpy() inside the mutex.

            marko Marko Mäkelä added a comment - I posted some detailed log size statistics for a simple 2-row INSERT transaction in a comment in MDEV-12353 . Here are the statistics for the test case in this ticket again. Only the last line is new: revision LSN1 LSN2-LSN1 time 10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before MDEV-12353 ) 62,998 128,417,353 15.9s 10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after MDEV-12353 ) 53,077 214,956,466 25.7s 10.5 8db623038f7158529e804e9607362939bff37337 (after MDEV-21724 ) 46,615 91,201,573 7.63s In our example, we are writing 29% less log for the 2,097,152-row INSERT and spending 52% less total time than before MDEV-12353 . Because log_write_up_to() is calculating log block checksums inside a mutex, even a small decrease in log size can greatly reduce the total execution time. In MDEV-14425 we will hopefully improve the log writing further, by only doing a memcpy() inside the mutex.

            For the record, in MDEV-27774 after MDEV-14425, also the memcpy() is done inside a shared log_sys.latch, that is, multiple threads executing mtr_t::commit() may concurrently copy their local log into log_sys.buf. An exclusive log_sys.latch would be acquired during a log checkpoint, to ensure that no writes are in progress.

            marko Marko Mäkelä added a comment - For the record, in MDEV-27774 after MDEV-14425 , also the memcpy() is done inside a shared log_sys.latch , that is, multiple threads executing mtr_t::commit() may concurrently copy their local log into log_sys.buf . An exclusive log_sys.latch would be acquired during a log checkpoint, to ensure that no writes are in progress.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.