Here is the start of the transaction with ID 0xfb6 (which last modified the clustered index record):
Thread 17 hit Breakpoint 12, trx_set_rw_mode (trx=0x7f08126d29a8) at /data/Server/10.5/storage/innobase/trx/trx0trx.cc:2320
2320 if (trx->read_view.is_open()) {
4: /x trx->id = 0xfb6
(rr) bt
#0 trx_set_rw_mode (trx=0x7f08126d29a8) at /data/Server/10.5/storage/innobase/trx/trx0trx.cc:2320
#1 0x0000562d87c70ca0 in lock_table (flags=0, table=0x618000063508, mode=LOCK_IX, thr=0x6210002ee988) at /data/Server/10.5/storage/innobase/lock/lock0lock.cc:3766
#2 0x0000562d87ee479e in row_search_mvcc (buf=0x61a0001b12b8 "\377", mode=PAGE_CUR_GE, prebuilt=0x6210002ee188, match_mode=1, direction=0) at /data/Server/10.5/storage/innobase/row/row0sel.cc:4608
#6 0x0000562d871df416 in handler::read_range_first (this=0x61d0007c42b8, start_key=0x61d0007c43b8, end_key=0x61d0007c43d8, eq_range_arg=true, sorted=true) at /data/Server/10.5/sql/handler.cc:6200
#7 0x0000562d86ee5ef1 in handler::multi_range_read_next (this=0x61d0007c42b8, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:516
#8 0x0000562d86ee623e in Mrr_simple_index_reader::get_next (this=0x61d0007c48e0, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:553
#9 0x0000562d86eef20d in DsMrr_impl::dsmrr_next (this=0x61d0007c4790, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:1653
#10 0x0000562d87b1bc3e in ha_innobase::multi_range_read_next (this=0x61d0007c42b8, range_info=0x7f07fc233790) at /data/Server/10.5/storage/innobase/handler/ha_innodb.cc:20401
#11 0x0000562d875a8e84 in QUICK_RANGE_SELECT::get_next (this=0x6130000e50c0) at /data/Server/10.5/sql/opt_range.cc:12647
#12 0x0000562d875da15c in rr_quick (info=0x7f07fc233ba0) at /data/Server/10.5/sql/records.cc:403
#13 0x0000562d868c27c2 in READ_RECORD::read_record (this=0x7f07fc233ba0) at /data/Server/10.5/sql/records.h:80
found_return=0x7f07fc2343c0, updated_return=0x7f07fc2343e0) at /data/Server/10.5/sql/sql_update.cc:996
#15 0x0000562d86a534ff in mysql_execute_command (thd=0x62b0001c0218) at /data/Server/10.5/sql/sql_parse.cc:4468
#16 0x0000562d86a6b5b6 in mysql_parse (thd=0x62b0001c0218, rawbuf=0x62b0001c7238 "UPDATE `tab1` SET e = 'sentence' WHERE pk = 5411 /* E_R Thread17 QNO 53 CON_ID 30 */", length=84, parser_state=0x7f07fc234cb0,
is_com_multi=false, is_next_command=false) at /data/Server/10.5/sql/sql_parse.cc:8096
…
(rr) break ibuf0ibuf.cc:3179 thread 12
Breakpoint 13 at 0x562d87c3feb8: file /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc, line 3179.
(rr) cond 13 index->id==27
(rr) c
Continuing.
Thread 17 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f081214f980, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000315
Current event: 1148257
First, our transaction performed a synchronous read of the clustered index page (5:0x315). Soon thereafter, another thread completed a synchronous read of the secondary index page, because (just like in the previous trace that I analyzed) it decided that some room needs to be allocated in the change buffer, while attempting to buffer another insert (coincidentally, to the same index, and same index page 0x602):
Thread 30 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f0812116460, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
err=0x7f07f82b9b20, allow_ibuf_merge=true) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:3482
#5 0x0000562d87c3b7a4 in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2325
at /data/Server/10.5/storage/innobase/row/row0ins.cc:2971
#10 0x0000562d87e152da in row_ins_sec_index_entry (index=0x617000035388, entry=0x619000a88508, thr=0x621000801a48, check_foreign=true) at /data/Server/10.5/storage/innobase/row/row0ins.cc:3294
#11 0x0000562d87f0b593 in row_upd_sec_index_entry (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:2236
#12 0x0000562d87f0bbbe in row_upd_sec_step (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:2263
#13 0x0000562d87f112d2 in row_upd (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:3024
#14 0x0000562d87f11a9c in row_upd_step (thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:3139
#15 0x0000562d87e5dedb in row_update_for_mysql (prebuilt=0x621000800d88) at /data/Server/10.5/storage/innobase/row/row0mysql.cc:1844
#16 0x0000562d87aea754 in ha_innobase::update_row (this=0x61d0008246b8, old_row=0x61a000211b20 "\300\trecipient", new_row=0x61a0002118b8 "\300\vcelebration")
at /data/Server/10.5/storage/innobase/handler/ha_innodb.cc:8445
#17 0x0000562d871e7213 in handler::ha_update_row (this=0x61d0008246b8, old_data=0x61a000211b20 "\300\trecipient", new_data=0x61a0002118b8 "\300\vcelebration") at /data/Server/10.5/sql/handler.cc:7207
found_return=0x7f07f82bf3c0, updated_return=0x7f07f82bf3e0) at /data/Server/10.5/sql/sql_update.cc:1077
#19 0x0000562d86a534ff in mysql_execute_command (thd=0x62b000299218) at /data/Server/10.5/sql/sql_parse.cc:4468
#20 0x0000562d86a6b5b6 in mysql_parse (thd=0x62b000299218, rawbuf=0x62b000268238 "UPDATE `tab1` SET b = 'celebration' WHERE pk = 6083 /* E_R Thread33 QNO 59 CON_ID 45 */", length=87,
parser_state=0x7f07f82bfcb0, is_com_multi=false, is_next_command=false) at /data/Server/10.5/sql/sql_parse.cc:8096
At this point, our ‘victim’ transaction is waiting for a latch on the change buffer root page (page 4). It has already decided to write to the change buffer, even though the page actually exists in the buffer pool:
#20 0x0000562d880d7392 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=1, guess=0x7f0812107aa0, mode=10, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x7f07fc230590, err=0x7f07fc22efd0, allow_ibuf_merge=false) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:3482
#21 0x0000562d88068b9b in btr_cur_search_to_nth_level_func (index=0x617000002e88, level=0, tuple=0x61900165e008, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x7f07fc2303e0, ahi_latch=0x0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x7f07fc230590, autoinc=0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:1614
#22 0x0000562d87c31daa in btr_pcur_open_low (index=0x617000002e88, level=0, tuple=0x61900165e008, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x7f07fc2303e0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, autoinc=0, mtr=0x7f07fc230590) at /data/Server/10.5/storage/innobase/include/btr0pcur.ic:441
#23 0x0000562d87c405e1 in ibuf_insert_low (mode=36, op=IBUF_OP_DELETE_MARK, no_counter=0, entry=0x61900165f908, entry_size=43, index=0x617000035388, page_id=..., zip_size=0, thr=0x6210002eee48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3276
#24 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_DELETE_MARK, entry=0x61900165f908, index=0x617000035388, page_id=..., zip_size=0, thr=0x6210002eee48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
That thread would complete the change buffer merge. Our ‘victim’ transaction is still suspended, waiting to open the change buffer leaf page. Meanwhile, the page 5:0x601 is evicted again on write completion:
#1 0x0000562d88127104 in buf_page_t::free_file_page (this=0x7f0812116460) at /data/Server/10.5/storage/innobase/include/buf0buf.h:956
#2 0x0000562d8811be7d in buf_LRU_block_free_hashed_page (block=0x7f0812116460) at /data/Server/10.5/storage/innobase/buf/buf0lru.cc:130
#3 0x0000562d88122bf4 in buf_LRU_free_page (bpage=0x7f0812116460, zip=true) at /data/Server/10.5/storage/innobase/buf/buf0lru.cc:962
#4 0x0000562d881095eb in buf_page_write_complete (request=...) at /data/Server/10.5/storage/innobase/buf/buf0flu.cc:417
#5 0x0000562d881f4f84 in fil_aio_callback (request=...) at /data/Server/10.5/storage/innobase/fil/fil0fil.cc:3372
#6 0x0000562d87d123cb in io_callback (cb=0x7f081268b140) at /data/Server/10.5/storage/innobase/os/os0file.cc:3604
#7 0x0000562d883218c8 in tpool::simulated_aio::simulated_aio_callback (param=0x7f081268b140) at /data/Server/10.5/tpool/aio_simulated.cc:162
…
(rr) when
Current event: 1151722
Finally, soon thereafter, our interesting transaction will resume execution:
at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3282
3282 buffered = ibuf_get_volume_buffered(&pcur,
3: /x index.id = 0x1b
(rr) when
Current event: 1151763
We will insert the record in change buffer page 0x356, right after the page infimum, before the first record that is buffered for the immediately following page 5:0x602. So far, so good. But, what is not so good is the following:
Thread 17 hit Breakpoint 19, mtr_t::commit (this=0x3f) at /data/Server/10.5/storage/innobase/mtr/mtr0mtr.cc:393
393 {
(rr) disa 19
(rr) c
Continuing.
[Switching to Thread 1599113.1608599]
Thread 30 hit Hardware access (read/write) watchpoint 21: $r[-2]@2
Value = "\001\262"
mach_read_from_2 (b=0x7f0812556149 "\001\262") at /data/Server/10.5/storage/innobase/include/mach0data.ic:84
84 return(uint16_t(uint16_t(b[0]) << 8 | b[1]));
(rr) bt
#0 mach_read_from_2 (b=0x7f0812556149 "\001\262") at /data/Server/10.5/storage/innobase/include/mach0data.ic:84
#1 0x0000562d87c58cbb in rec_get_next_offs (rec=0x7f081255614b "", comp=0) at /data/Server/10.5/storage/innobase/include/rem0rec.ic:291
#2 0x0000562d87c6d421 in lock_update_delete (block=0x7f0812151240, rec=0x7f081255614b "") at /data/Server/10.5/storage/innobase/lock/lock0lock.cc:3260
#3 0x0000562d88084cf8 in btr_cur_optimistic_delete_func (cursor=0x7f07f82b9b70, flags=0, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:5595
#4 0x0000562d87c45be3 in ibuf_delete_rec (page_id=..., pcur=0x7f07f82b9b70, search_tuple=0x6160018ff708, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:4078
#5 0x0000562d87c3bcd3 in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2361
at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3480
#7 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x619000a88508, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
Thread 30 that was in the process of merging the change buffer to that page is deleting our precious record immediately afterwards, even though the page no longer exists in the buffer pool and we cannot know if all those buffered records were actually applied. At the time we inserted the record in Thread 17, that culprit Thread 30 was waiting for a latch on that same change buffer leaf page 0:0x356:
#12 0x0000562d880600eb in btr_cur_latch_leaves (block=0x7f0812151240, latch_mode=2, cursor=0x7f07f82b9b70, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:249
#13 0x0000562d8806a570 in btr_cur_search_to_nth_level_func (index=0x617000002e88, level=0, tuple=0x6160018ff708, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f07f82b9b70, ahi_latch=0x0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2342, mtr=0x7f07f82b9ce0, autoinc=0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:1848
#14 0x0000562d87c31daa in btr_pcur_open_low (index=0x617000002e88, level=0, tuple=0x6160018ff708, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f07f82b9b70, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2342, autoinc=0, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/include/btr0pcur.ic:441
#15 0x0000562d87c3b9ae in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2342
#16 0x0000562d87c412b2 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x619000a88508, entry_size=55, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3480
#17 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x619000a88508, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
The wrongful deletion was added by me in order to fix a hang in change buffer merge (MDEV-20934), whose root cause should have been the recovery bug that we fixed in MDEV-24449.
I think that we must revert the work-around of MDEV-20394.
Marko Mäkelä
added a comment - I checked the contents of the secondary index page each time it was read into the buffer pool. The record never changed.
Thread 4 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f0812127e40,
node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
(rr) when
Current event: 228935
(rr) display/x ((buf_block_t*)bpage)->frame[0x457]@36
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
(rr) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>when
>c
>end
(rr) c
Continuing.
[Switching to Thread 1599113.1605076]
Thread 15 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f08121488c0, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 616159
[Switching to Thread 1599113.1608534]
Thread 28 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f08121198a0, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 818385
[Switching to Thread 1599113.1608599]
Thread 46 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f0812116460, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 1149964
[Switching to Thread 1599113.1608512]
Thread 21 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f08120fa6e0, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 1228265
Thread 47 received signal SIGKILL, Killed.
Here is the start of the transaction with ID 0xfb6 (which last modified the clustered index record):
Thread 17 hit Breakpoint 12, trx_set_rw_mode (trx=0x7f08126d29a8) at /data/Server/10.5/storage/innobase/trx/trx0trx.cc:2320
2320 if (trx->read_view.is_open()) {
4: /x trx->id = 0xfb6
(rr) bt
#0 trx_set_rw_mode (trx=0x7f08126d29a8) at /data/Server/10.5/storage/innobase/trx/trx0trx.cc:2320
#1 0x0000562d87c70ca0 in lock_table (flags=0, table=0x618000063508, mode=LOCK_IX, thr=0x6210002ee988) at /data/Server/10.5/storage/innobase/lock/lock0lock.cc:3766
#2 0x0000562d87ee479e in row_search_mvcc (buf=0x61a0001b12b8 "\377", mode=PAGE_CUR_GE, prebuilt=0x6210002ee188, match_mode=1, direction=0) at /data/Server/10.5/storage/innobase/row/row0sel.cc:4608
#3 0x0000562d87aec741 in ha_innobase::index_read (this=0x61d0007c42b8, buf=0x61a0001b12b8 "\377", key_ptr=0x6210032f0200 "#\025", key_len=4, find_flag=HA_READ_KEY_EXACT)
at /data/Server/10.5/storage/innobase/handler/ha_innodb.cc:8866
#4 0x0000562d871f0c6b in handler::index_read_map (this=0x61d0007c42b8, buf=0x61a0001b12b8 "\377", key=0x6210032f0200 "#\025", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /data/Server/10.5/sql/handler.h:3818
#5 0x0000562d871cad30 in handler::ha_index_read_map (this=0x61d0007c42b8, buf=0x61a0001b12b8 "\377", key=0x6210032f0200 "#\025", keypart_map=1, find_flag=HA_READ_KEY_EXACT)
at /data/Server/10.5/sql/handler.cc:3123
#6 0x0000562d871df416 in handler::read_range_first (this=0x61d0007c42b8, start_key=0x61d0007c43b8, end_key=0x61d0007c43d8, eq_range_arg=true, sorted=true) at /data/Server/10.5/sql/handler.cc:6200
#7 0x0000562d86ee5ef1 in handler::multi_range_read_next (this=0x61d0007c42b8, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:516
#8 0x0000562d86ee623e in Mrr_simple_index_reader::get_next (this=0x61d0007c48e0, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:553
#9 0x0000562d86eef20d in DsMrr_impl::dsmrr_next (this=0x61d0007c4790, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:1653
#10 0x0000562d87b1bc3e in ha_innobase::multi_range_read_next (this=0x61d0007c42b8, range_info=0x7f07fc233790) at /data/Server/10.5/storage/innobase/handler/ha_innodb.cc:20401
#11 0x0000562d875a8e84 in QUICK_RANGE_SELECT::get_next (this=0x6130000e50c0) at /data/Server/10.5/sql/opt_range.cc:12647
#12 0x0000562d875da15c in rr_quick (info=0x7f07fc233ba0) at /data/Server/10.5/sql/records.cc:403
#13 0x0000562d868c27c2 in READ_RECORD::read_record (this=0x7f07fc233ba0) at /data/Server/10.5/sql/records.h:80
#14 0x0000562d86d16f11 in mysql_update (thd=0x62b0001c0218, table_list=0x62b0001c73d0, fields=..., values=..., conds=0x62b0001c7f40, order_num=0, order=0x0, limit=18446744073709551615, ignore=false,
found_return=0x7f07fc2343c0, updated_return=0x7f07fc2343e0) at /data/Server/10.5/sql/sql_update.cc:996
#15 0x0000562d86a534ff in mysql_execute_command (thd=0x62b0001c0218) at /data/Server/10.5/sql/sql_parse.cc:4468
#16 0x0000562d86a6b5b6 in mysql_parse (thd=0x62b0001c0218, rawbuf=0x62b0001c7238 "UPDATE `tab1` SET e = 'sentence' WHERE pk = 5411 /* E_R Thread17 QNO 53 CON_ID 30 */", length=84, parser_state=0x7f07fc234cb0,
is_com_multi=false, is_next_command=false) at /data/Server/10.5/sql/sql_parse.cc:8096
…
(rr) break ibuf0ibuf.cc:3179 thread 12
Breakpoint 13 at 0x562d87c3feb8: file /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc, line 3179.
(rr) cond 13 index->id==27
(rr) c
Continuing.
Thread 17 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f081214f980, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000315
Current event: 1148257
First, our transaction performed a synchronous read of the clustered index page (5:0x315). Soon thereafter, another thread completed a synchronous read of the secondary index page, because (just like in the previous trace that I analyzed) it decided that some room needs to be allocated in the change buffer, while attempting to buffer another insert (coincidentally, to the same index, and same index page 0x602):
Thread 30 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f0812116460, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 1149964
(rr) bt
#0 buf_page_read_complete (bpage=0x7f0812116460, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
#1 0x0000562d8812ad23 in buf_read_page_low (err=0x7f07f82b9410, space=0x615000562f00, sync=true, mode=132, page_id=..., zip_size=0, unzip=false) at /data/Server/10.5/storage/innobase/buf/buf0rea.cc:360
#2 0x0000562d8812bb4f in buf_read_page (page_id=..., zip_size=0) at /data/Server/10.5/storage/innobase/buf/buf0rea.cc:489
#3 0x0000562d880d55e6 in buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=16, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2328, mtr=0x7f07f82b9ce0,
err=0x7f07f82b9b20, allow_ibuf_merge=true) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:3024
#4 0x0000562d880d7392 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=16, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2328, mtr=0x7f07f82b9ce0,
err=0x7f07f82b9b20, allow_ibuf_merge=true) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:3482
#5 0x0000562d87c3b7a4 in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2325
#6 0x0000562d87c412b2 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x619000a88508, entry_size=55, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48)
at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3480
#7 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x619000a88508, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
#8 0x0000562d880690be in btr_cur_search_to_nth_level_func (index=0x617000035388, level=0, tuple=0x619000a88508, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x7f07f82bc470, ahi_latch=0x0,
file=0x562d8925ae40 "/data/Server/10.5/storage/innobase/row/row0ins.cc", line=2974, mtr=0x7f07f82bc830, autoinc=0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:1650
#9 0x0000562d87e13938 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x617000035388, offsets_heap=0x619000a87a80, heap=0x619000f39880, entry=0x619000a88508, trx_id=0, thr=0x621000801a48)
at /data/Server/10.5/storage/innobase/row/row0ins.cc:2971
#10 0x0000562d87e152da in row_ins_sec_index_entry (index=0x617000035388, entry=0x619000a88508, thr=0x621000801a48, check_foreign=true) at /data/Server/10.5/storage/innobase/row/row0ins.cc:3294
#11 0x0000562d87f0b593 in row_upd_sec_index_entry (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:2236
#12 0x0000562d87f0bbbe in row_upd_sec_step (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:2263
#13 0x0000562d87f112d2 in row_upd (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:3024
#14 0x0000562d87f11a9c in row_upd_step (thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:3139
#15 0x0000562d87e5dedb in row_update_for_mysql (prebuilt=0x621000800d88) at /data/Server/10.5/storage/innobase/row/row0mysql.cc:1844
#16 0x0000562d87aea754 in ha_innobase::update_row (this=0x61d0008246b8, old_row=0x61a000211b20 "\300\trecipient", new_row=0x61a0002118b8 "\300\vcelebration")
at /data/Server/10.5/storage/innobase/handler/ha_innodb.cc:8445
#17 0x0000562d871e7213 in handler::ha_update_row (this=0x61d0008246b8, old_data=0x61a000211b20 "\300\trecipient", new_data=0x61a0002118b8 "\300\vcelebration") at /data/Server/10.5/sql/handler.cc:7207
#18 0x0000562d86d1753d in mysql_update (thd=0x62b000299218, table_list=0x62b0002683d0, fields=..., values=..., conds=0x62b000268f48, order_num=0, order=0x0, limit=18446744073709551615, ignore=false,
found_return=0x7f07f82bf3c0, updated_return=0x7f07f82bf3e0) at /data/Server/10.5/sql/sql_update.cc:1077
#19 0x0000562d86a534ff in mysql_execute_command (thd=0x62b000299218) at /data/Server/10.5/sql/sql_parse.cc:4468
#20 0x0000562d86a6b5b6 in mysql_parse (thd=0x62b000299218, rawbuf=0x62b000268238 "UPDATE `tab1` SET b = 'celebration' WHERE pk = 6083 /* E_R Thread33 QNO 59 CON_ID 45 */", length=87,
parser_state=0x7f07f82bfcb0, is_com_multi=false, is_next_command=false) at /data/Server/10.5/sql/sql_parse.cc:8096
At this point, our ‘victim’ transaction is waiting for a latch on the change buffer root page (page 4). It has already decided to write to the change buffer, even though the page actually exists in the buffer pool:
#20 0x0000562d880d7392 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=1, guess=0x7f0812107aa0, mode=10, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x7f07fc230590, err=0x7f07fc22efd0, allow_ibuf_merge=false) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:3482
#21 0x0000562d88068b9b in btr_cur_search_to_nth_level_func (index=0x617000002e88, level=0, tuple=0x61900165e008, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x7f07fc2303e0, ahi_latch=0x0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x7f07fc230590, autoinc=0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:1614
#22 0x0000562d87c31daa in btr_pcur_open_low (index=0x617000002e88, level=0, tuple=0x61900165e008, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x7f07fc2303e0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, autoinc=0, mtr=0x7f07fc230590) at /data/Server/10.5/storage/innobase/include/btr0pcur.ic:441
#23 0x0000562d87c405e1 in ibuf_insert_low (mode=36, op=IBUF_OP_DELETE_MARK, no_counter=0, entry=0x61900165f908, entry_size=43, index=0x617000035388, page_id=..., zip_size=0, thr=0x6210002eee48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3276
#24 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_DELETE_MARK, entry=0x61900165f908, index=0x617000035388, page_id=..., zip_size=0, thr=0x6210002eee48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
That thread would complete the change buffer merge. Our ‘victim’ transaction is still suspended, waiting to open the change buffer leaf page. Meanwhile, the page 5:0x601 is evicted again on write completion:
#1 0x0000562d88127104 in buf_page_t::free_file_page (this=0x7f0812116460) at /data/Server/10.5/storage/innobase/include/buf0buf.h:956
#2 0x0000562d8811be7d in buf_LRU_block_free_hashed_page (block=0x7f0812116460) at /data/Server/10.5/storage/innobase/buf/buf0lru.cc:130
#3 0x0000562d88122bf4 in buf_LRU_free_page (bpage=0x7f0812116460, zip=true) at /data/Server/10.5/storage/innobase/buf/buf0lru.cc:962
#4 0x0000562d881095eb in buf_page_write_complete (request=...) at /data/Server/10.5/storage/innobase/buf/buf0flu.cc:417
#5 0x0000562d881f4f84 in fil_aio_callback (request=...) at /data/Server/10.5/storage/innobase/fil/fil0fil.cc:3372
#6 0x0000562d87d123cb in io_callback (cb=0x7f081268b140) at /data/Server/10.5/storage/innobase/os/os0file.cc:3604
#7 0x0000562d883218c8 in tpool::simulated_aio::simulated_aio_callback (param=0x7f081268b140) at /data/Server/10.5/tpool/aio_simulated.cc:162
…
(rr) when
Current event: 1151722
Finally, soon thereafter, our interesting transaction will resume execution:
(rr) c
Continuing.
[Switching to Thread 1599113.1608542]
Thread 17 hit Breakpoint 16, ibuf_insert_low (mode=36, op=IBUF_OP_DELETE_MARK, no_counter=0, entry=0x61900165f908, entry_size=43, index=0x617000035388, page_id=..., zip_size=0, thr=0x6210002eee48)
at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3282
3282 buffered = ibuf_get_volume_buffered(&pcur,
3: /x index.id = 0x1b
(rr) when
Current event: 1151763
We will insert the record in change buffer page 0x356, right after the page infimum, before the first record that is buffered for the immediately following page 5:0x602. So far, so good. But, what is not so good is the following:
(rr) p page_cursor.rec
$35 = (rec_t *) 0x7f0812556065 "infimum"
(rr) p/x page_cursor.block.page.id_
$36 = {m_id = 0x356}
(rr) p *rec
$37 = (rec_t *) 0x7f081255614b ""
(rr) dump binary memory /dev/shm/ibuf356.bin page_cursor.block.frame page_cursor.block.frame+srv_page_size
(rr) p/x **rec@90
$38 = {0x0, 0x0, 0x0, 0x5, 0x0, 0x0, 0x0, 0x6, 0x1, 0x0, 0x0, 0x1, 0x0, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x1, 0xf,
0x0, 0x64, 0x0, 0x8, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x86, 0x3, 0x0, 0x4, 0x80, 0x0, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66,
0x72, 0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23, 0x0, 0x61, 0x5d, 0x56, 0x50}
(rr) set $r=*rec
(rr) awatch $r[-2]@2
Hardware access (read/write) watchpoint 21: $r[-2]@2
(rr) when
Current event: 1152412
(rr) c
Continuing.
Thread 17 hit Breakpoint 19, mtr_t::commit (this=0x3f) at /data/Server/10.5/storage/innobase/mtr/mtr0mtr.cc:393
393 {
(rr) disa 19
(rr) c
Continuing.
[Switching to Thread 1599113.1608599]
Thread 30 hit Hardware access (read/write) watchpoint 21: $r[-2]@2
Value = "\001\262"
mach_read_from_2 (b=0x7f0812556149 "\001\262") at /data/Server/10.5/storage/innobase/include/mach0data.ic:84
84 return(uint16_t(uint16_t(b[0]) << 8 | b[1]));
(rr) bt
#0 mach_read_from_2 (b=0x7f0812556149 "\001\262") at /data/Server/10.5/storage/innobase/include/mach0data.ic:84
#1 0x0000562d87c58cbb in rec_get_next_offs (rec=0x7f081255614b "", comp=0) at /data/Server/10.5/storage/innobase/include/rem0rec.ic:291
#2 0x0000562d87c6d421 in lock_update_delete (block=0x7f0812151240, rec=0x7f081255614b "") at /data/Server/10.5/storage/innobase/lock/lock0lock.cc:3260
#3 0x0000562d88084cf8 in btr_cur_optimistic_delete_func (cursor=0x7f07f82b9b70, flags=0, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:5595
#4 0x0000562d87c45be3 in ibuf_delete_rec (page_id=..., pcur=0x7f07f82b9b70, search_tuple=0x6160018ff708, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:4078
#5 0x0000562d87c3bcd3 in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2361
#6 0x0000562d87c412b2 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x619000a88508, entry_size=55, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48)
at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3480
#7 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x619000a88508, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
Thread 30 that was in the process of merging the change buffer to that page is deleting our precious record immediately afterwards, even though the page no longer exists in the buffer pool and we cannot know if all those buffered records were actually applied. At the time we inserted the record in Thread 17, that culprit Thread 30 was waiting for a latch on that same change buffer leaf page 0:0x356:
#12 0x0000562d880600eb in btr_cur_latch_leaves (block=0x7f0812151240, latch_mode=2, cursor=0x7f07f82b9b70, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:249
#13 0x0000562d8806a570 in btr_cur_search_to_nth_level_func (index=0x617000002e88, level=0, tuple=0x6160018ff708, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f07f82b9b70, ahi_latch=0x0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2342, mtr=0x7f07f82b9ce0, autoinc=0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:1848
#14 0x0000562d87c31daa in btr_pcur_open_low (index=0x617000002e88, level=0, tuple=0x6160018ff708, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f07f82b9b70, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2342, autoinc=0, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/include/btr0pcur.ic:441
#15 0x0000562d87c3b9ae in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2342
#16 0x0000562d87c412b2 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x619000a88508, entry_size=55, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3480
#17 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x619000a88508, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
The wrongful deletion was added by me in order to fix a hang in change buffer merge ( MDEV-20934 ), whose root cause should have been the recovery bug that we fixed in MDEV-24449 .
I think that we must revert the work-around of MDEV-20394 .
Due to MDEV-19514 (which entered 10.5.0), the MDEV-20934 logic would kick in also during normal server operation and not only during a slow shutdown (innodb_fast_shutdown=0).
innodb_fast_shutdown=0 never produces a post-restart count mismatch corruption, while innodb_fast_shutdown=1 very reliably does
Thus, that corruption cannot be caused by MDEV-20934, because before 10.5 the work-around would only be enabled during a shutdown with innodb_fast_shutdown=0.
Marko Mäkelä
added a comment - Due to MDEV-19514 (which entered 10.5.0), the MDEV-20934 logic would kick in also during normal server operation and not only during a slow shutdown ( innodb_fast_shutdown=0 ).
In MDEV-22373 , spachev claimed the following about 10.4:
innodb_fast_shutdown=0 never produces a post-restart count mismatch corruption, while innodb_fast_shutdown=1 very reliably does
Thus, that corruption cannot be caused by MDEV-20934 , because before 10.5 the work-around would only be enabled during a shutdown with innodb_fast_shutdown=0 .
I am a little undecided regarding whether to fix this in older versions than MariaDB 10.5. Normally, nobody should be using innodb_fast_shutdown=0. It is not really required even before upgrades (see MDEV-23755).
If we removed the corruption work-around that was added in MDEV-20934 from release builds of 10.2, 10.3, 10.4, then users who suffer from corrupted change buffer data structures (presumably due to having used old versions where MDEV-24449 was not yet fixed) would be out of luck. Using a debug build and a special debug option ought to be impractically slow when there are terabytes of data. For the record, I was considering to push the following:
On 10.5, I would carry on and disable it, because 10.5 never was affected by MDEV-24449, and a potential other cause of change buffer corruption (MDEV-24709) was fixed already in 10.5.9.
Maybe in 10.5 we should also remove the do_merge logic of ibuf_insert_low() and simply refuse to buffer more operations if the change buffer is full. In that way, in 10.5 outside a slow shutdown, change buffer merges should only occur when the secondary index leaf page is actively being read.
Marko Mäkelä
added a comment - I am a little undecided regarding whether to fix this in older versions than MariaDB 10.5. Normally, nobody should be using innodb_fast_shutdown=0 . It is not really required even before upgrades (see MDEV-23755 ).
If we removed the corruption work-around that was added in MDEV-20934 from release builds of 10.2, 10.3, 10.4, then users who suffer from corrupted change buffer data structures (presumably due to having used old versions where MDEV-24449 was not yet fixed) would be out of luck. Using a debug build and a special debug option ought to be impractically slow when there are terabytes of data. For the record, I was considering to push the following:
diff --git a/storage/innobase/ibuf/ibuf0ibuf.cc b/storage/innobase/ibuf/ibuf0ibuf.cc
index 46649187ca0..b32b52c2a46 100644
--- a/storage/innobase/ibuf/ibuf0ibuf.cc
+++ b/storage/innobase/ibuf/ibuf0ibuf.cc
@@ -4444,6 +4444,23 @@ ibuf_merge_or_delete_for_page(
/* No changes are buffered for this page. */
fil_space_release(space);
+#ifndef DBUG_OFF
+ /* The following code works around a hang when
+ the change buffer is corrupted, likely due to
+ the race condition in crash recovery that was
+ fixed in MDEV-24449. But, it also introduces
+ corruption by itself in the following
+ scenario:
+
+ (1) We released the bitmap page latch.
+ (2) Another thread buffers changes to the page.
+ (3) We will wrongly discard those newly
+ buffered changes below.
+
+ This code will be available in debug builds, so that
+ users may try to fix a shutdown hang that occurs due
+ to a corrupted change buffer. */
+
if (UNIV_UNLIKELY(srv_shutdown_state)
&& !srv_fast_shutdown
&& (!block
@@ -4453,8 +4470,10 @@ ibuf_merge_or_delete_for_page(
shutdown, in case the bitmap bits are
wrongly clear even though buffered
changes exist. */
- ibuf_delete_recs(page_id);
+ DBUG_EXECUTE_IF("ibuf_merge_corruption",
+ ibuf_delete_recs(page_id););
}
+#endif
return;
}
}
On 10.5, I would carry on and disable it, because 10.5 never was affected by MDEV-24449 , and a potential other cause of change buffer corruption ( MDEV-24709 ) was fixed already in 10.5.9.
Maybe in 10.5 we should also remove the do_merge logic of ibuf_insert_low() and simply refuse to buffer more operations if the change buffer is full. In that way, in 10.5 outside a slow shutdown, change buffer merges should only occur when the secondary index leaf page is actively being read.
This looks like MDEV-25004 but versioned tables are not involved in the current case.
without crash injection and recovery.
Given the results above I am not able to tell if the current bug is a plain crash recovery problem or caused by "misbehavior" before SIGKILL
followed by restart invoking crash recovery.
bb-10.6-MDEV-25506 GIT_SHOW: HEAD 8850f7d854465bee24639ceb2148baf7b0ec1b39 2021-06-04T19:54:45+03:00
which contains a fix for the current bug but also other fixes.
~ 600 RQG tests
One time
[ 'TBR-977', 'mysqld: .{1,200}buf0buf.h:.{1,20}: buf_page_t\* buf_pool_t::page_hash_get_low\(page_id_t, ulint\): Assertion \`bpage->in_page_hash\' failed.+The RQG run ended with status STATUS_SERVER_CRASHED' ],
without crash injection followed by recovery.
This failure seems to be a sibling of MDEV-25776 (here it happens after crash recovery).
Matthias Leich
added a comment -
Attempts to replay the problem with RQG
==================================
GIT_SHOW: HEAD, origin/bb-10.6-wlad, origin/HEAD, origin/10.6, 10.6 3d6eb7afcfb47cb432f790d0b25c8c1f4ec5bcf3 2021-06-06T13:21:03+02:00
~ 200 RQG tests
Two replays of the effect described above (Corruption detected after SIGKILL + restart with success).
12 runs show
[ 'TBR-857', 'mysqld: .{1,150}ibuf0ibuf.cc:.{1,30} void ibuf_set_del_mark.{1,200}: Assertion .0. failed.' ],
without crash injection and recovery.
8 runs show
[ 'TBR-835', '\[ERROR\] InnoDB: Record in index `(?!FTS_DOC_ID_INDEX).{0,200}` of table .{1,100} was not found on update: ' ],
Tests using FULLTEXT indexes might miss records in FTS_DOC_ID_INDEX (-> whatever other bug).
But in the current case FULLTEXT indexes are not used at all. And a record within a index created by the user is missing.
[ 'TBR-915-MDEV-25004', 'mysqld: .{1,200}row0upd.cc:.{1,20} dberr_t row_upd_sec_index_entry\(upd_node_t\*, que_thr_t\*\): Assertion .0. failed' ],
This looks like MDEV-25004 but versioned tables are not involved in the current case.
without crash injection and recovery.
Given the results above I am not able to tell if the current bug is a plain crash recovery problem or caused by "misbehavior" before SIGKILL
followed by restart invoking crash recovery.
bb-10.6-MDEV-25506 GIT_SHOW: HEAD 8850f7d854465bee24639ceb2148baf7b0ec1b39 2021-06-04T19:54:45+03:00
which contains a fix for the current bug but also other fixes.
~ 600 RQG tests
One time
[ 'TBR-977', 'mysqld: .{1,200}buf0buf.h:.{1,20}: buf_page_t\* buf_pool_t::page_hash_get_low\(page_id_t, ulint\): Assertion \`bpage->in_page_hash\' failed.+The RQG run ended with status STATUS_SERVER_CRASHED' ],
without crash injection followed by recovery.
This failure seems to be a sibling of MDEV-25776 (here it happens after crash recovery).
My fix did not optimize the ibuf_insert_low() logic, because we do not have time for the extensive testing right now.
The problematic code will be omitted from 10.5.11 onwards. It can be enabled in debug builds by SET GLOBAL debug_dbug='+d,ibuf_merge_corruption';. In 10.5, this bug has a much larger impact than in older major versions, because in 10.5 the incorrect code would kick in during normal operation, without involving any startup or shutdown. For older releases, the MDEV-20934 logic is only enabled during shutdown with innodb_fast_shutdown=0. Normally, there should be no reason to use innodb_fast_shutdown=0, and besides, one could SET GLOBAL innodb_change_buffering=none; before the slow shutdown to work around this problem.
Marko Mäkelä
added a comment - My fix did not optimize the ibuf_insert_low() logic, because we do not have time for the extensive testing right now.
The problematic code will be omitted from 10.5.11 onwards. It can be enabled in debug builds by SET GLOBAL debug_dbug='+d,ibuf_merge_corruption'; . In 10.5, this bug has a much larger impact than in older major versions, because in 10.5 the incorrect code would kick in during normal operation, without involving any startup or shutdown. For older releases, the MDEV-20934 logic is only enabled during shutdown with innodb_fast_shutdown=0 . Normally, there should be no reason to use innodb_fast_shutdown=0 , and besides, one could SET GLOBAL innodb_change_buffering=none; before the slow shutdown to work around this problem.
People
Marko Mäkelä
Matthias Leich
Votes:
1Vote for this issue
Watchers:
8Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
I checked the contents of the secondary index page each time it was read into the buffer pool. The record never changed.
Thread 4 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f0812127e40,
node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
(rr) when
Current event: 228935
(rr) display/x ((buf_block_t*)bpage)->frame[0x457]@36
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
(rr) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>when
>c
>end
(rr) c
Continuing.
[Switching to Thread 1599113.1605076]
Thread 15 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f08121488c0, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 616159
[Switching to Thread 1599113.1608534]
Thread 28 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f08121198a0, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 818385
[Switching to Thread 1599113.1608599]
Thread 46 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f0812116460, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 1149964
[Switching to Thread 1599113.1608512]
Thread 21 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f08120fa6e0, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 1228265
Thread 47 received signal SIGKILL, Killed.
Here is the start of the transaction with ID 0xfb6 (which last modified the clustered index record):
Thread 17 hit Breakpoint 12, trx_set_rw_mode (trx=0x7f08126d29a8) at /data/Server/10.5/storage/innobase/trx/trx0trx.cc:2320
2320 if (trx->read_view.is_open()) {
4: /x trx->id = 0xfb6
(rr) bt
#0 trx_set_rw_mode (trx=0x7f08126d29a8) at /data/Server/10.5/storage/innobase/trx/trx0trx.cc:2320
#1 0x0000562d87c70ca0 in lock_table (flags=0, table=0x618000063508, mode=LOCK_IX, thr=0x6210002ee988) at /data/Server/10.5/storage/innobase/lock/lock0lock.cc:3766
#2 0x0000562d87ee479e in row_search_mvcc (buf=0x61a0001b12b8 "\377", mode=PAGE_CUR_GE, prebuilt=0x6210002ee188, match_mode=1, direction=0) at /data/Server/10.5/storage/innobase/row/row0sel.cc:4608
#3 0x0000562d87aec741 in ha_innobase::index_read (this=0x61d0007c42b8, buf=0x61a0001b12b8 "\377", key_ptr=0x6210032f0200 "#\025", key_len=4, find_flag=HA_READ_KEY_EXACT)
at /data/Server/10.5/storage/innobase/handler/ha_innodb.cc:8866
#4 0x0000562d871f0c6b in handler::index_read_map (this=0x61d0007c42b8, buf=0x61a0001b12b8 "\377", key=0x6210032f0200 "#\025", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /data/Server/10.5/sql/handler.h:3818
#5 0x0000562d871cad30 in handler::ha_index_read_map (this=0x61d0007c42b8, buf=0x61a0001b12b8 "\377", key=0x6210032f0200 "#\025", keypart_map=1, find_flag=HA_READ_KEY_EXACT)
at /data/Server/10.5/sql/handler.cc:3123
#6 0x0000562d871df416 in handler::read_range_first (this=0x61d0007c42b8, start_key=0x61d0007c43b8, end_key=0x61d0007c43d8, eq_range_arg=true, sorted=true) at /data/Server/10.5/sql/handler.cc:6200
#7 0x0000562d86ee5ef1 in handler::multi_range_read_next (this=0x61d0007c42b8, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:516
#8 0x0000562d86ee623e in Mrr_simple_index_reader::get_next (this=0x61d0007c48e0, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:553
#9 0x0000562d86eef20d in DsMrr_impl::dsmrr_next (this=0x61d0007c4790, range_info=0x7f07fc233790) at /data/Server/10.5/sql/multi_range_read.cc:1653
#10 0x0000562d87b1bc3e in ha_innobase::multi_range_read_next (this=0x61d0007c42b8, range_info=0x7f07fc233790) at /data/Server/10.5/storage/innobase/handler/ha_innodb.cc:20401
#11 0x0000562d875a8e84 in QUICK_RANGE_SELECT::get_next (this=0x6130000e50c0) at /data/Server/10.5/sql/opt_range.cc:12647
#12 0x0000562d875da15c in rr_quick (info=0x7f07fc233ba0) at /data/Server/10.5/sql/records.cc:403
#13 0x0000562d868c27c2 in READ_RECORD::read_record (this=0x7f07fc233ba0) at /data/Server/10.5/sql/records.h:80
#14 0x0000562d86d16f11 in mysql_update (thd=0x62b0001c0218, table_list=0x62b0001c73d0, fields=..., values=..., conds=0x62b0001c7f40, order_num=0, order=0x0, limit=18446744073709551615, ignore=false,
found_return=0x7f07fc2343c0, updated_return=0x7f07fc2343e0) at /data/Server/10.5/sql/sql_update.cc:996
#15 0x0000562d86a534ff in mysql_execute_command (thd=0x62b0001c0218) at /data/Server/10.5/sql/sql_parse.cc:4468
#16 0x0000562d86a6b5b6 in mysql_parse (thd=0x62b0001c0218, rawbuf=0x62b0001c7238 "UPDATE `tab1` SET e = 'sentence' WHERE pk = 5411 /* E_R Thread17 QNO 53 CON_ID 30 */", length=84, parser_state=0x7f07fc234cb0,
is_com_multi=false, is_next_command=false) at /data/Server/10.5/sql/sql_parse.cc:8096
…
(rr) break ibuf0ibuf.cc:3179 thread 12
Breakpoint 13 at 0x562d87c3feb8: file /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc, line 3179.
(rr) cond 13 index->id==27
(rr) c
Continuing.
Thread 17 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f081214f980, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000315
Current event: 1148257
First, our transaction performed a synchronous read of the clustered index page (5:0x315). Soon thereafter, another thread completed a synchronous read of the secondary index page, because (just like in the previous trace that I analyzed) it decided that some room needs to be allocated in the change buffer, while attempting to buffer another insert (coincidentally, to the same index, and same index page 0x602):
Thread 30 hit Breakpoint 1, buf_page_read_complete (bpage=0x7f0812116460, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
4101 const page_id_t id(bpage->id());
1: /x bpage.id_.m_id = 0x500000601
2: /x ((buf_block_t*)bpage)->frame[0x457]@36 = {0x0, 0x0, 0xd0, 0xf, 0x4, 0x88, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66, 0x72,
0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23}
Current event: 1149964
(rr) bt
#0 buf_page_read_complete (bpage=0x7f0812116460, node=...) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:4101
#1 0x0000562d8812ad23 in buf_read_page_low (err=0x7f07f82b9410, space=0x615000562f00, sync=true, mode=132, page_id=..., zip_size=0, unzip=false) at /data/Server/10.5/storage/innobase/buf/buf0rea.cc:360
#2 0x0000562d8812bb4f in buf_read_page (page_id=..., zip_size=0) at /data/Server/10.5/storage/innobase/buf/buf0rea.cc:489
#3 0x0000562d880d55e6 in buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=16, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2328, mtr=0x7f07f82b9ce0,
err=0x7f07f82b9b20, allow_ibuf_merge=true) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:3024
#4 0x0000562d880d7392 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=16, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2328, mtr=0x7f07f82b9ce0,
err=0x7f07f82b9b20, allow_ibuf_merge=true) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:3482
#5 0x0000562d87c3b7a4 in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2325
#6 0x0000562d87c412b2 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x619000a88508, entry_size=55, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48)
at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3480
#7 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x619000a88508, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
#8 0x0000562d880690be in btr_cur_search_to_nth_level_func (index=0x617000035388, level=0, tuple=0x619000a88508, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x7f07f82bc470, ahi_latch=0x0,
file=0x562d8925ae40 "/data/Server/10.5/storage/innobase/row/row0ins.cc", line=2974, mtr=0x7f07f82bc830, autoinc=0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:1650
#9 0x0000562d87e13938 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x617000035388, offsets_heap=0x619000a87a80, heap=0x619000f39880, entry=0x619000a88508, trx_id=0, thr=0x621000801a48)
at /data/Server/10.5/storage/innobase/row/row0ins.cc:2971
#10 0x0000562d87e152da in row_ins_sec_index_entry (index=0x617000035388, entry=0x619000a88508, thr=0x621000801a48, check_foreign=true) at /data/Server/10.5/storage/innobase/row/row0ins.cc:3294
#11 0x0000562d87f0b593 in row_upd_sec_index_entry (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:2236
#12 0x0000562d87f0bbbe in row_upd_sec_step (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:2263
#13 0x0000562d87f112d2 in row_upd (node=0x621000801638, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:3024
#14 0x0000562d87f11a9c in row_upd_step (thr=0x621000801a48) at /data/Server/10.5/storage/innobase/row/row0upd.cc:3139
#15 0x0000562d87e5dedb in row_update_for_mysql (prebuilt=0x621000800d88) at /data/Server/10.5/storage/innobase/row/row0mysql.cc:1844
#16 0x0000562d87aea754 in ha_innobase::update_row (this=0x61d0008246b8, old_row=0x61a000211b20 "\300\trecipient", new_row=0x61a0002118b8 "\300\vcelebration")
at /data/Server/10.5/storage/innobase/handler/ha_innodb.cc:8445
#17 0x0000562d871e7213 in handler::ha_update_row (this=0x61d0008246b8, old_data=0x61a000211b20 "\300\trecipient", new_data=0x61a0002118b8 "\300\vcelebration") at /data/Server/10.5/sql/handler.cc:7207
#18 0x0000562d86d1753d in mysql_update (thd=0x62b000299218, table_list=0x62b0002683d0, fields=..., values=..., conds=0x62b000268f48, order_num=0, order=0x0, limit=18446744073709551615, ignore=false,
found_return=0x7f07f82bf3c0, updated_return=0x7f07f82bf3e0) at /data/Server/10.5/sql/sql_update.cc:1077
#19 0x0000562d86a534ff in mysql_execute_command (thd=0x62b000299218) at /data/Server/10.5/sql/sql_parse.cc:4468
#20 0x0000562d86a6b5b6 in mysql_parse (thd=0x62b000299218, rawbuf=0x62b000268238 "UPDATE `tab1` SET b = 'celebration' WHERE pk = 6083 /* E_R Thread33 QNO 59 CON_ID 45 */", length=87,
parser_state=0x7f07f82bfcb0, is_com_multi=false, is_next_command=false) at /data/Server/10.5/sql/sql_parse.cc:8096
At this point, our ‘victim’ transaction is waiting for a latch on the change buffer root page (page 4). It has already decided to write to the change buffer, even though the page actually exists in the buffer pool:
#20 0x0000562d880d7392 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=1, guess=0x7f0812107aa0, mode=10, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x7f07fc230590, err=0x7f07fc22efd0, allow_ibuf_merge=false) at /data/Server/10.5/storage/innobase/buf/buf0buf.cc:3482
#21 0x0000562d88068b9b in btr_cur_search_to_nth_level_func (index=0x617000002e88, level=0, tuple=0x61900165e008, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x7f07fc2303e0, ahi_latch=0x0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x7f07fc230590, autoinc=0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:1614
#22 0x0000562d87c31daa in btr_pcur_open_low (index=0x617000002e88, level=0, tuple=0x61900165e008, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x7f07fc2303e0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, autoinc=0, mtr=0x7f07fc230590) at /data/Server/10.5/storage/innobase/include/btr0pcur.ic:441
#23 0x0000562d87c405e1 in ibuf_insert_low (mode=36, op=IBUF_OP_DELETE_MARK, no_counter=0, entry=0x61900165f908, entry_size=43, index=0x617000035388, page_id=..., zip_size=0, thr=0x6210002eee48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3276
#24 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_DELETE_MARK, entry=0x61900165f908, index=0x617000035388, page_id=..., zip_size=0, thr=0x6210002eee48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
That thread would complete the change buffer merge. Our ‘victim’ transaction is still suspended, waiting to open the change buffer leaf page. Meanwhile, the page 5:0x601 is evicted again on write completion:
#1 0x0000562d88127104 in buf_page_t::free_file_page (this=0x7f0812116460) at /data/Server/10.5/storage/innobase/include/buf0buf.h:956
#2 0x0000562d8811be7d in buf_LRU_block_free_hashed_page (block=0x7f0812116460) at /data/Server/10.5/storage/innobase/buf/buf0lru.cc:130
#3 0x0000562d88122bf4 in buf_LRU_free_page (bpage=0x7f0812116460, zip=true) at /data/Server/10.5/storage/innobase/buf/buf0lru.cc:962
#4 0x0000562d881095eb in buf_page_write_complete (request=...) at /data/Server/10.5/storage/innobase/buf/buf0flu.cc:417
#5 0x0000562d881f4f84 in fil_aio_callback (request=...) at /data/Server/10.5/storage/innobase/fil/fil0fil.cc:3372
#6 0x0000562d87d123cb in io_callback (cb=0x7f081268b140) at /data/Server/10.5/storage/innobase/os/os0file.cc:3604
#7 0x0000562d883218c8 in tpool::simulated_aio::simulated_aio_callback (param=0x7f081268b140) at /data/Server/10.5/tpool/aio_simulated.cc:162
…
(rr) when
Current event: 1151722
Finally, soon thereafter, our interesting transaction will resume execution:
(rr) c
Continuing.
[Switching to Thread 1599113.1608542]
Thread 17 hit Breakpoint 16, ibuf_insert_low (mode=36, op=IBUF_OP_DELETE_MARK, no_counter=0, entry=0x61900165f908, entry_size=43, index=0x617000035388, page_id=..., zip_size=0, thr=0x6210002eee48)
at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3282
3282 buffered = ibuf_get_volume_buffered(&pcur,
3: /x index.id = 0x1b
(rr) when
Current event: 1151763
We will insert the record in change buffer page 0x356, right after the page infimum, before the first record that is buffered for the immediately following page 5:0x602. So far, so good. But, what is not so good is the following:
(rr) p page_cursor.rec
$35 = (rec_t *) 0x7f0812556065 "infimum"
(rr) p/x page_cursor.block.page.id_
$36 = {m_id = 0x356}
(rr) p *rec
$37 = (rec_t *) 0x7f081255614b ""
(rr) dump binary memory /dev/shm/ibuf356.bin page_cursor.block.frame page_cursor.block.frame+srv_page_size
(rr) p/x **rec@90
$38 = {0x0, 0x0, 0x0, 0x5, 0x0, 0x0, 0x0, 0x6, 0x1, 0x0, 0x0, 0x1, 0x0, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x1, 0xf,
0x0, 0x64, 0x0, 0x8, 0x1, 0xf, 0x0, 0x64, 0x0, 0x8, 0x86, 0x3, 0x0, 0x4, 0x80, 0x0, 0x65, 0x63, 0x74, 0x66, 0x72, 0x78, 0x74, 0x66, 0x72, 0x78, 0x70, 0x76, 0x70, 0x70, 0x6b, 0x6f, 0x77, 0x70, 0x61, 0x66,
0x72, 0x78, 0x70, 0x76, 0x70, 0x72, 0x0, 0x0, 0x15, 0x23, 0x0, 0x61, 0x5d, 0x56, 0x50}
(rr) set $r=*rec
(rr) awatch $r[-2]@2
Hardware access (read/write) watchpoint 21: $r[-2]@2
(rr) when
Current event: 1152412
(rr) c
Continuing.
Thread 17 hit Breakpoint 19, mtr_t::commit (this=0x3f) at /data/Server/10.5/storage/innobase/mtr/mtr0mtr.cc:393
393 {
(rr) disa 19
(rr) c
Continuing.
[Switching to Thread 1599113.1608599]
Thread 30 hit Hardware access (read/write) watchpoint 21: $r[-2]@2
Value = "\001\262"
mach_read_from_2 (b=0x7f0812556149 "\001\262") at /data/Server/10.5/storage/innobase/include/mach0data.ic:84
84 return(uint16_t(uint16_t(b[0]) << 8 | b[1]));
(rr) bt
#0 mach_read_from_2 (b=0x7f0812556149 "\001\262") at /data/Server/10.5/storage/innobase/include/mach0data.ic:84
#1 0x0000562d87c58cbb in rec_get_next_offs (rec=0x7f081255614b "", comp=0) at /data/Server/10.5/storage/innobase/include/rem0rec.ic:291
#2 0x0000562d87c6d421 in lock_update_delete (block=0x7f0812151240, rec=0x7f081255614b "") at /data/Server/10.5/storage/innobase/lock/lock0lock.cc:3260
#3 0x0000562d88084cf8 in btr_cur_optimistic_delete_func (cursor=0x7f07f82b9b70, flags=0, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:5595
#4 0x0000562d87c45be3 in ibuf_delete_rec (page_id=..., pcur=0x7f07f82b9b70, search_tuple=0x6160018ff708, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:4078
#5 0x0000562d87c3bcd3 in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2361
#6 0x0000562d87c412b2 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x619000a88508, entry_size=55, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48)
at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3480
#7 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x619000a88508, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
Thread 30 that was in the process of merging the change buffer to that page is deleting our precious record immediately afterwards, even though the page no longer exists in the buffer pool and we cannot know if all those buffered records were actually applied. At the time we inserted the record in Thread 17, that culprit Thread 30 was waiting for a latch on that same change buffer leaf page 0:0x356:
#12 0x0000562d880600eb in btr_cur_latch_leaves (block=0x7f0812151240, latch_mode=2, cursor=0x7f07f82b9b70, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:249
#13 0x0000562d8806a570 in btr_cur_search_to_nth_level_func (index=0x617000002e88, level=0, tuple=0x6160018ff708, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f07f82b9b70, ahi_latch=0x0, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2342, mtr=0x7f07f82b9ce0, autoinc=0) at /data/Server/10.5/storage/innobase/btr/btr0cur.cc:1848
#14 0x0000562d87c31daa in btr_pcur_open_low (index=0x617000002e88, level=0, tuple=0x6160018ff708, mode=PAGE_CUR_GE, latch_mode=2, cursor=0x7f07f82b9b70, file=0x562d891851a0 "/data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc", line=2342, autoinc=0, mtr=0x7f07f82b9ce0) at /data/Server/10.5/storage/innobase/include/btr0pcur.ic:441
#15 0x0000562d87c3b9ae in ibuf_read_merge_pages (space_ids=0x7f07f82ba590, page_nos=0x7f07f82ba5d0, n_stored=3) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:2342
#16 0x0000562d87c412b2 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x619000a88508, entry_size=55, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3480
#17 0x0000562d87c41a50 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x619000a88508, index=0x617000035388, page_id=..., zip_size=0, thr=0x621000801a48) at /data/Server/10.5/storage/innobase/ibuf/ibuf0ibuf.cc:3603
The wrongful deletion was added by me in order to fix a hang in change buffer merge (
MDEV-20934), whose root cause should have been the recovery bug that we fixed inMDEV-24449.I think that we must revert the work-around of MDEV-20394.