|
The recovery of the byte 0x1 (instead of 0x2) happens here:
#2 0x00005563d4fc9c81 in log_phys_t::apply (this=this@entry=0x640000446df0,
|
block=..., last_offset=@0x6060000719ec: 8)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/log/log0recv.cc:498
|
498 memcpy(frame + last_offset, l, llen);
|
(rr) p *this
|
$5 = {<log_rec_t> = {next = 0x640000447960, lsn = 74021668},
|
start_lsn = 74017249}
|
The mini-transaction is a WRITE record with 2888 (0xb48) bytes of data payload, to be written to page offset 8 (right after the FIL_PAGE_OFFSET):
0x30, 0x8a, 0xbf, 0x2e, 0x82, 0xc7, 0x8,
|
0x0, 0x0, 0x3, 0x46, 0x0, 0x0, 0x3, 0x49, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x45, 0xbf, …
|
After this write, the PAGE_MAX_TRX_ID became 0x129. The write of 0x2e0 happened much later, around LSN=96928241. That is the next write to the location on recovery:
#2 0x00005563d4fc9c81 in log_phys_t::apply (this=this@entry=0x6400020717d8, block=..., last_offset=@0x6060000719ec: 63) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/log/log0recv.cc:498
|
498 memcpy(frame + last_offset, l, llen);
|
(rr) p *this
|
$15 = {<log_rec_t> = {next = 0x640002071800, lsn = 96928281}, start_lsn = 96928241}
|
The records of the mini-transaction are as follows:
0x35, 0x2e, 0x82, 0xc7, 0x3f, 0xe0: WRITE the byte 0xe0 to page 0x2e:0x347, at offset 0x3f (FIL_PAGE_DATA+PAGE_MAX_TRX_ID+7)
0xb3, 0x9e, 0x28, 0xc7: WRITE the byte 0xc7 to the same page, at byte offset 0x40+0x1ea8
The first record appears to have been written incorrectly. It turns out that the PAGE_MAX_TRX_ID had been previously updated to 0x2be, but no log had been written for it. The update to
0x2e0 was logged as 1 byte, because the 7 most significant bytes would not change at that time. We need to check when the PAGE_MAX_TRX_ID was changed from 0x1e0 and why no log was written for it. That happened during the execution of
ALTER TABLE `t8` ROW_FORMAT=COMPRESSED ENCRYPTED=YES ENCRYPTION_KEY_ID=5 /* E_R Thread1 QNO 147 CON_ID 19 */;
|
with the following stack trace:
#0 page_set_max_trx_id (block=block@entry=0x507405322e10,
|
page_zip=page_zip@entry=0x507405322e58, trx_id=trx_id@entry=297,
|
mtr=mtr@entry=0x619000d4c690)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/page/page0page.cc:182
|
#1 0x00005646107d6024 in page_update_max_trx_id (mtr=0x619000d4c690,
|
trx_id=297, page_zip=0x507405322e58, block=0x507405322e10)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/page0page.inl:52
|
#2 PageBulk::init (this=this@entry=0x619000d4c680)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/btr/btr0bulk.cc:126
|
#3 0x00005646107d902a in BtrBulk::insert (this=this@entry=0x44212d0aef70,
|
tuple=tuple@entry=0x6310031b08b0, level=level@entry=0)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/btr/btr0bulk.cc:1077
|
#4 0x000056461057fd53 in BtrBulk::insert (tuple=0x6310031b08b0,
|
this=0x44212d0aef70)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/btr0bulk.h:301
|
#5 row_merge_insert_index_tuples (index=0x61600603a608,
|
This mini-transaction was committed at around log_sys.lsn=74017249, with the following log:
(rr) p/x *m_log.m_first_block.m_data@134
|
$16 = {0x13, 0x2e, 0x82, 0xc7, 0x13, 0x2e, 0x82, 0xc7, 0x30, 0x5c, 0x2e, 0x82,
|
0xc7, 0x8, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0,
|
0x0, 0x0, 0x0, 0x0, 0x0, 0x45, 0xbf, 0x0 <repeats 11 times>, 0x2e, 0x0, 0x2,
|
0x0, 0x78, 0x80, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x5,
|
0x0 <repeats 21 times>, 0x5f, 0x0 <repeats 20 times>, 0x68, 0x81, 0x12,
|
0x64, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x3, 0x0, 0x0, 0x24, 0x0, 0x12, 0x36,
|
0x2e, 0x82, 0xc7, 0x3e, 0x1, 0x29, 0x13, 0x2e, 0x82, 0xc7, 0x30, 0x8a, 0xbf,
|
0x2e, 0x82, 0xc7, 0x8}
|
The first record is an INIT_PAGE of the page 0x2e:0x347. At this point of time, log_sys.last_checkpoint_lsn=log_sys.next_checkpoint_lsn=72135592. This looks like a page flushing bug:
2023-01-11 15:48:41 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=72135592,72136847
|
Finally, I found the race condition:
#0 buf_page_t::clear_oldest_modification (this=0x507405322e10) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/buf0buf.h:2017
|
#1 buf_pool_t::delete_from_flush_list (this=this@entry=0x5646121ca700 <buf_pool>, bpage=bpage@entry=0x507405322e10, clear=clear@entry=true) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/buf/buf0flu.cc:207
|
#2 0x00005646108cc2c5 in buf_pool_t::delete_from_flush_list (bpage=0x507405322e10, this=<optimized out>) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/buf0buf.h:1873
|
#3 buf_flush_discard_page (bpage=bpage@entry=0x507405322e10) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/buf/buf0flu.cc:1206
|
#4 0x00005646108d4c1b in buf_do_flush_list_batch (max_n=max_n@entry=2000, lsn=lsn@entry=70815685) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/buf/buf0flu.cc:1447
|
#5 0x00005646108d56eb in buf_flush_list (max_n=2000, lsn=lsn@entry=70815685) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/buf/buf0flu.cc:1532
|
This thread is marking the page clean, even though the ALTER TABLE thread at the same time is holding an exclusive page latch:
#3 0x00005646108c9a85 in inline_mysql_mutex_lock (src_line=179, src_file=0x564611673de0 "/data/Server/bb-10.6-MDEV-29835F/storage/innobase/buf/buf0flu.cc", that=0x5646121cea80 <buf_pool+17280>) at /data/Server/bb-10.6-MDEV-29835F/include/mysql/psi/mysql_thread.h:750
|
#4 buf_pool_t::insert_into_flush_list (this=0x5646121ca700 <buf_pool>, block=block@entry=0x5074053c46a0, lsn=lsn@entry=71662004) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/buf/buf0flu.cc:179
|
#5 0x00005646103dd8bf in buf_flush_note_modification (b=0x5074053c46a0, start=71662004, end=<optimized out>) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/buf0flu.h:129
|
#6 0x000056461041d8b3 in mtr_t::commit (this=this@entry=0x619001dd2690) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/mtr/mtr0mtr.cc:194
|
#7 0x00005646107d75ea in PageBulk::commit (this=this@entry=0x619001dd2680, succ--Type <RET> for more, q to quit, c to continue without paging--
|
ess=success@entry=true) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/btr/btr0bulk.cc:569
|
#8 0x00005646107db710 in BtrBulk::pageCommit (this=this@entry=0x44212d0aef70, page_bulk=page_bulk@entry=0x619001dd2680, next_page_bulk=next_page_bulk@entry=0x619000d62480, insert_father=insert_father@entry=true) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/btr/btr0bulk.cc:964
|
#9 0x00005646107d9101 in BtrBulk::insert (this=this@entry=0x44212d0aef70, tuple=tuple@entry=0x6310002f88b0, level=level@entry=0) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/btr/btr0bulk.cc:1084
|
#10 0x000056461057fd53 in BtrBulk::insert (…) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/row/row0merge.cc:3672
|
#12 0x0000564610592457 in row_merge_read_clustered_index…
|
This recovery bug ought to affect 10.5 already, due to MDEV-15528. I think that this could be reproduced with a DML-only workload (say, INSERT and ROLLBACK of lots of BLOB data).
|
|
This failure is specific to ROW_FORMAT=COMPRESSED. Here is some more context of the non-logged update of PAGE_MAX_TRX_ID:
#0 mtr_t::set_modified (block=..., this=0x36330b335af0) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/mtr0mtr.h:338
|
#1 mtr_t::memcpy_low (this=0x36330b335af0, block=..., offset=<optimized out>, data=<optimized out>, len=<optimized out>) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/mtr0log.h:321
|
#2 0x00005646104855ce in mtr_t::write<8u, (mtr_t::write_type)0, unsigned long> (this=this@entry=0x36330b335af0, block=..., ptr=ptr@entry=0x507406d04038, val=val@entry=702)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/mtr0log.h:202
|
#3 0x000056461047013c in page_set_max_trx_id (block=block@entry=0x507405322e10, page_zip=page_zip@entry=0x507405322e58, trx_id=trx_id@entry=702, mtr=mtr@entry=0x36330b335af0)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/page/page0page.cc:189
|
#4 0x00005646104705b9 in page_update_max_trx_id (block=block@entry=0x507405322e10, page_zip=page_zip@entry=0x507405322e58, trx_id=702, mtr=mtr@entry=0x36330b335af0)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/page0page.inl:52
|
#5 0x0000564610483754 in page_copy_rec_list_start (new_block=0x507405322e10, block=block@entry=0x507405322ba0, rec=<optimized out>, index=index@entry=0x61600603a608, mtr=mtr@entry=0x36330b335af0,
|
err=err@entry=0x36330b334b20) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/page/page0page.cc:791
|
#6 0x00005646107bd920 in btr_compress (cursor=cursor@entry=0x36330b335810, adjust=adjust@entry=false, mtr=mtr@entry=0x36330b335af0) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/btr/btr0btr.cc:3756
|
#7 0x00005646107fee69 in btr_cur_compress_if_useful (cursor=cursor@entry=0x36330b335810, adjust=adjust@entry=false, mtr=mtr@entry=0x36330b335af0)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/include/btr0cur.h:781
|
#8 0x000056461081f699 in btr_cur_pessimistic_delete (err=err@entry=0x36330b335760, has_reserved_extents=has_reserved_extents@entry=0, cursor=cursor@entry=0x36330b335810, flags=flags@entry=0,
|
rollback=rollback@entry=false, mtr=mtr@entry=0x36330b335af0) at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/btr/btr0cur.cc:5507
|
#9 0x00005646105fc682 in row_purge_remove_sec_if_poss_tree (node=node@entry=0x61a00000a908, index=index@entry=0x61600603a608, entry=entry@entry=0x61900012c608)
|
at /data/Server/bb-10.6-MDEV-29835F/storage/innobase/row/row0purge.cc:410
|
In mtr_t::set_modified() we will not mark the page as modified, because m_log_mode==MTR_LOG_NONE. The purpose of that mode is to suppress log writes for ROW_FORMAT=COMPRESSED pages, in case we would run out of space and the change would have to be rolled back. Apparently, exactly that happens, because there will be a failing call to page_zip_reorganize() during page_copy_rec_list_start(), which was called by btr_compress(). In the end, at the time of mtr_t::commit(), the mini-transaction will have flagged only the page 0x2e:0x349 as modified, and only written log records for it. The intention was to revert any changes to our page 0x2e:0x347. But, the change to PAGE_MAX_TRX_ID was neither reverted nor logged!
As far as I can tell, this recovery and backup bug can break locking and MVCC on secondary indexes of ROW_FORMAT=COMPRESSED tables, and it should affect all MySQL and MariaDB versions that support ROW_FORMAT=COMPRESSED for InnoDB tables. It may take some effort to fix this and to test the fix.
|