Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.5
Description
MDEV-12353 introduces a purely physical redo log format. Some effort was made to optimize page_cur_insert_rec_low(), but it is still writing unnecessarily much log. Here is an example:
--source include/innodb_page_size_small.inc
|
CREATE TABLE t1 (a BIGINT PRIMARY KEY, b LONGBLOB) ENGINE=InnoDB; |
create table t2 (f1 bigint primary key, f2 longblob, f3 longblob, |
index(f2(10), f3(10))) engine=innodb; |
create table t3 (f1 bigint primary key, f2 longblob, f3 longblob, |
index(f2(10), f3(10))) engine=innodb row_format=compressed; |
let $shutdown_timeout=0;
|
--source include/restart_mysqld.inc
|
DROP TABLE t1,t2,t3; |
This test case, which used to fail on the 83rd round when running with the ,4k combination (because it would wrongly apply a mini-transaction whose LSN matches the FIL_PAGE_LSN on the page 0:13, or the only page of SYS_FIELDS) , shows that we are emitting suboptimal redo log records for a particular mini-transaction:
recv_group_scan_log_recs: ib_log: scan 946527: rec 37 len 8 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec b9 len 10 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec b4 len 5 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec 34 len 5 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec d5 len 6 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec b6 len 7 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec d4 len 5 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec b0 len 21 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec 34 len 5 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec d4 len 5 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec b3 len 4 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec 35 len 6 page 0:13
|
recv_group_scan_log_recs: ib_log: scan 946527: rec 34 len 5 page 0:13
|
The decoded form is:
WRITE(0x28, {0x6, 0xaf, 0x0, 0x28})
|
WRITE(0x30, {0x6, 0x94, 0x0, 0x2, 0x0, 0x6, 0x0, 0x1c})
|
WRITE(0x4f0, {0x6, 0x94})
|
WRITE(0x6e, {0x9})
|
MEMMOVE(0x689, {0x6, 0x82, 0xc3}) = from(0x4e7) {0x1b, 0x19, 0x12, 0xc, 0x8, 0x0}
|
WRITE(0x68f, {0x1, 0x38, 0xb, 0x0, 0x74})
|
MEMMOVE(0x69a, {0x3, 0x82, 0xc3}) = from(0x4f8) {0x1, 0xb2, 0x0}
|
WRITE(0x69d, {0x1, 0x0, 0xa, 0x0, 0x0, 0x0, 0x0, 0x6, 0x7e, 0x8d, 0x0, 0x0, 0x3, 0x8a, 0x1, 0xd2, 0x66, 0x33})
|
WRITE(0x27, {0x8})
|
MEMMOVE(0xfe8, {0x2, 0x2}) = from (0xfea) {0x4, 0x80}
|
WRITE(0xfea, {0x4, 0x80})
|
WRITE(0x47a, {0x4})
|
WRITE(0x6e, {0x5})
|
- We update the n_owned of the supremum record at 0x6e multiple times.
- The last MEMMOVE of 2 bytes and WRITE of 2 subsequent bytes could be combined to a single 4-byte WRITE of the page directory, possibly using 4+2 bytes instead of 5+4 bytes.
- The second-last MEMMOVE of 3 bytes and WRITE of subsequent 18 bytes could be combined to a single record of 21+3 bytes, instead of two records of 5+21 bytes.
- The WRITE offsets should be in ascending order, so that the same_page encoding can be used.
- The WRITE to byte 0x27 should be combined with the very first WRITE record to 0x28, using 8+1 bytes instead of 8+5 bytes.
Attachments
Issue Links
- causes
-
MDEV-21899 INSERT into a secondary index with zero-data-length key is not crash-safe
-
- Closed
-
-
MDEV-23304 Insert into TEMPORARY TABLE fails to invoke mtr_t::set_modified()
-
- Closed
-
-
MDEV-25031 Not applying INSERT_REUSE_REDUNDANT due to corruption on page
-
- Closed
-
-
MDEV-25745 InnoDB recovery fails with [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption
-
- Closed
-
-
MDEV-28668 Recovery or backup of INSERT may be incorrect
-
- Closed
-
-
MDEV-29438 Recovery or backup of instant ALTER TABLE is incorrect
-
- Closed
-
-
MDEV-29559 Recovery of INSERT_HEAP_DYNAMIC into secondary index fails
-
- Closed
-
- is caused by
-
MDEV-12353 Efficient InnoDB redo log record format
-
- Closed
-
- relates to
-
MDEV-29153 2022-07-22 18:12:56 0x7f3319fff700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/page/page0cur.cc line 11
-
- Closed
-
-
MDEV-21725 Optimize btr_page_reorganize_low() redo logging
-
- Stalled
-
-
MDEV-24197 Prepare step of backup failing on large backup
-
- Closed
-
The following test case, derived from main.sum_distinct-big, is showing a significant regression due to
MDEV-12353. It seems to be mostly explained by the increased redo log volume:--source include/have_innodb.inc
--source include/have_sequence.inc
MDEV-12353)Because the perf report output was ‘polluted’ by PERFORMANCE_SCHEMA functions, I decided to rebuild with cmake -DPLUGIN_PERFSCHEMA_NO. I also disabled the concurrent purge activity (
MDEV-12288) to remove one source of nondeterminism:perf record ./mtr --mysqld=--innodb-force-recovery=2 main.ib
MDEV-12353)The execution times include the perf record overhead.
Here is the perf report output (evaluate_join_record() and anything that consumed more time):
10.5 ac51bcfd8d3b08b62d88350fd9de8ae43f7b1637 (before)
12.02% mysqld [.] rec_get_offsets_func
9.36% mysqld [.] buf_page_get_gen
6.32% mysqld [.] cmp_dtuple_rec_with_match_low
6.31% mysqld [.] page_cur_search_with_match
6.30% mysqld [.] btr_cur_search_to_nth_level_func
2.99% mysqld [.] cmp_data
2.04% mysqld [.] row_ins_step
1.92% mysqld [.] mtr_t::commit
1.88% mysqld [.] trx_undo_report_row_operation
1.68% mysqld [.] page_cur_insert_rec_low
1.63% mysqld [.] row_ins_clust_index_entry_low
1.57% mysqld [.] rw_lock_x_lock_func
1.51% mysqld [.] btr_cur_optimistic_insert
1.39% mysqld [.] page_cur_insert_rec_write_log
1.26% mysqld [.] rw_lock_x_lock_wait_func
1.25% mysqld [.] rec_get_converted_size_comp
1.24% mysqld [.] ReleaseLatches::operator()
1.10% mysqld [.] mlog_open_and_write_index
1.00% mysqld [.] row_insert_for_mysql
1.00% mysqld [.] mtr_t::memo_push
1.00% mysqld [.] TTASEventMutex<GenericPolicy>::enter
0.98% mysqld [.] rec_convert_dtuple_to_rec_comp<false>
0.94% mysqld [.] ut_crc32_hw
0.83% mysqld [.] rw_lock_x_lock_low
0.82% mysqld [.] ReleaseBlocks::add_dirty_page_to_flush_list
0.82% mysqld [.] log_margin_checkpoint_age
0.78% mysqld [.] mtr_t::prepare_write
0.72% mysqld [.] TTASEventMutex<BlockMutexPolicy>::enter
0.70% mysqld [.] buf_LRU_remove_block
0.69% mysqld [.] mtr_t::release_block_at_savepoint
0.66% mysqld [.] lock_rec_insert_check_and_lock
0.66% mysqld [.] btr_node_ptr_max_size
0.63% mysqld [.] ha_seq::index_next
0.62% mysqld [.] mtr_t::finish_write
0.60% mysqld [.] handler::ha_index_next
0.59% mysqld [.] btr_cur_ins_lock_and_undo
0.58% mysqld [.] fill_record
0.57% mysqld [.] write_record
0.56% mysqld [.] page_cur_tuple_insert
0.53% mysqld [.] handler::ha_write_row
0.52% mysqld [.] row_mysql_store_col_in_innobase_format
0.52% mysqld [.] ha_innobase::write_row
0.50% mysqld [.] evaluate_join_record
10.5 f8a9f906679e1d1ab026c245f7d24c652050d8b3 (after)
10.03% mysqld [.] rec_get_offsets_func
9.41% mysqld [.] buf_page_get_gen
7.35% mysqld [.] page_cur_search_with_match
5.60% mysqld [.] btr_cur_search_to_nth_level_func
5.46% mysqld [.] cmp_dtuple_rec_with_match_low
3.44% mysqld [.] page_cur_insert_rec_low
3.29% mysqld [.] mtr_t::log_write<(unsigned char)48>
2.64% mysqld [.] cmp_data
2.01% mysqld [.] mtr_t::commit
1.94% mysqld [.] row_ins_step
1.92% mysqld [.] trx_undo_report_row_operation
1.84% mysqld [.] rec_get_converted_size_comp
1.64% mysqld [.] mtr_t::memcpy_low
1.49% mysqld [.] row_ins_clust_index_entry_low
1.47% mysqld [.] rw_lock_x_lock_func
1.40% mysqld [.] btr_cur_optimistic_insert
1.30% mysqld [.] rec_convert_dtuple_to_rec_comp<false>
1.17% mysqld [.] mtr_t::memo_push
1.16% mysqld [.] rw_lock_x_lock_wait_func
1.09% mysqld [.] ReleaseLatches::operator()
1.08% mysqld [.] ut_crc32_hw
1.03% mysqld [.] row_insert_for_mysql
0.81% mysqld [.] handler::ha_index_next
0.79% mysqld [.] TTASEventMutex<GenericPolicy>::enter
0.77% mysqld [.] ReleaseBlocks::add_dirty_page_to_flush_list
0.77% mysqld [.] btr_node_ptr_max_size
0.74% mysqld [.] rw_lock_x_lock_low
0.73% mysqld [.] lock_rec_insert_check_and_lock
0.71% mysqld [.] mtr_t::prepare_write
0.69% mysqld [.] log_margin_checkpoint_age
0.68% mysqld [.] TTASEventMutex<BlockMutexPolicy>::enter
0.60% mysqld [.] ha_innobase::write_row
0.57% mysqld [.] mtr_t::release_block_at_savepoint
0.55% mysqld [.] row_mysql_store_col_in_innobase_format
0.55% mysqld [.] btr_cur_ins_lock_and_undo
0.54% mysqld [.] buf_LRU_remove_block
0.54% mysqld [.] evaluate_join_record
As expected, rec_get_offsets_func() was removed from the top, because we no longer call it when writing log for an insert.
The function trx_undo_report_row_operation() is taking slightly more time, and mtr_t::log_write<WRITE>() and mtr_t::memcpy_low() are new entries related to the new redo log format. We also spend some more time on ut_crc32_hw() due to the 67% increase of redo log volume.
With a single-row INSERT, there is no regression:
--source include/have_innodb.inc
./mtr --mysqld=--innodb-force-recovery=2 main.ib1