Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.6.20, 10.11.10, 11.2.6, 11.4.4, 11.6.2, 11.7.1
Description
mleich produced an rr replay trace of a MDEV-35049 development branch that features this same error, but without the involvement of OPTIMIZE TABLE. Initially, I was suspecting that this failure is something specific to that branch, because the branch includes a rewrite of how keys are being searched for in B-tree pages.
ssh sdp
|
rr replay /data/results/1732217540/Marko-3/1/rr/latest-trace
|
In the end, this seems to be a race condition. It took me a while to figure out how to debug this. In the end, I set a hardware data watchpoint on the clustered index record (DB_ROW_ID=0x4b7) in the buffer pool, specifically on the last 4 bytes of the DB_ROLL_PTR field and the 4 bytes of the problematic col_int_key field, to catch what is going on. I also set a watchpoint on the delete-mark flag of the secondary index record (col_int_key,DB_ROW_ID)=(5,0x4b7).
The hardware data watchpoint on the clustered index record was being hit by the following:
INSERT /*! IGNORE */ INTO table10000_innodb VALUES (5, 5), (5,5), …; |
UPDATE table10000_innodb SET `col_int_key` = 4 /* E_R Thread6 QNO 8 CON_ID 23 */ |
UPDATE table10000_innodb SET `col_int_key` = 5 /* E_R Thread7 QNO 13 CON_ID 24 */ |
During the execution of the second UPDATE (transaction 0x1e, just a little too new to be included in purge_sys.view), the purge of the first UPDATE was blocked in the following:
10.6-MDEV-35049 36a8b44ebd96ec9a8d449c83248109d5e893f534 |
#17 log_free_check () at /data/Server/10.6-MDEV-35049E/storage/innobase/log/log0log.cc:956
|
#18 0x000063a8330d1a61 in row_purge_remove_sec_if_poss_tree (node=node@entry=0x63a8354a62b8, index=index@entry=0x7d1998069e78, entry=entry@entry=0x7d19a402df88, page_max_trx_id=page_max_trx_id@entry=0x1e)
|
at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:767
|
#19 0x000063a8330d2925 in row_purge_remove_sec_if_poss (node=node@entry=0x63a8354a62b8, index=0x7d1998069e78, entry=0x7d19a402df88) at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:991
|
#20 0x000063a8330d31c6 in row_purge_upd_exist_or_extern_func (thr=thr@entry=0x63a8354a6218, node=node@entry=0x63a8354a62b8, undo_rec=undo_rec@entry=0x7d19c51abdda ">\b\f\202\267\022")
|
at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:1157
|
#21 0x000063a8330d36a6 in row_purge_record_func (node=node@entry=0x63a8354a62b8, undo_rec=undo_rec@entry=0x7d19c51abdda ">\b\f\202\267\022", thr=thr@entry=0x63a8354a6218, updated_extern=0x0)
|
at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:1548
|
#22 0x000063a8330d3baa in row_purge (node=node@entry=0x63a8354a62b8, undo_rec=undo_rec@entry=0x7d19c51abdda ">\b\f\202\267\022", thr=thr@entry=0x63a8354a6218)
|
at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:1587
|
#23 0x000063a8330d3c09 in row_purge_step (thr=thr@entry=0x63a8354a6218) at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:1650
|
During this blockage, the second UPDATE had updated both the clustered index record and removed the delete-mark on the secondary index record (5,0x4b7) which had been delete-marked by the first UPDATE. Then, purge would report an error and hit ut_ad(0), crashing the debug instrumented build:
2024-11-21 13:14:38 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `col_int_key` of table `test`.`table10000_innodb`: tuple: TUPLE (info_bits=0, 2 fields): {[4] (0x80000005),[6] (0x0000000004B7)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4] (0x80000005),[6] (0x0000000004B7)}
|
The problem turns out to be that MDEV-34515 introduced an unsafe optimization: If the PAGE_MAX_TRX_ID did not change between row_purge_remove_sec_if_poss_leaf() and row_purge_remove_sec_if_poss_tree(), a call to row_purge_poss_sec() would be skipped.
A more correct condition would be the following: If the PAGE_MAX_TRX_ID was not changed and it did not belong to an active transaction when row_purge_remove_sec_if_poss_leaf() was holding the secondary index leaf page latch, the check would be redundant.
As far as I can tell, the impact of this bug is limited to some error log "spam" and the debug assertion failure. This should not cause any actual corruption.
Attachments
Issue Links
- causes
-
MDEV-35619 Assertion failure in row_purge_del_mark_error
- Closed
- is caused by
-
MDEV-34515 Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size
- Closed