[MDEV-35508] Race condition between purge and secondary index INSERT or UPDATE - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6.20, 10.11.10, 11.2.6, 11.4.4, 11.6.2, 11.7.1
Fix Version/s: 10.6.21, 11.4.5, 11.7.2
Component/s: Storage Engine - InnoDB
Labels:

Description

mleich produced an rr replay trace of a ~~MDEV-35049~~ development branch that features this same error, but without the involvement of OPTIMIZE TABLE. Initially, I was suspecting that this failure is something specific to that branch, because the branch includes a rewrite of how keys are being searched for in B-tree pages.

ssh sdp

rr replay /data/results/1732217540/Marko-3/1/rr/latest-trace

In the end, this seems to be a race condition. It took me a while to figure out how to debug this. In the end, I set a hardware data watchpoint on the clustered index record (DB_ROW_ID=0x4b7) in the buffer pool, specifically on the last 4 bytes of the DB_ROLL_PTR field and the 4 bytes of the problematic col_int_key field, to catch what is going on. I also set a watchpoint on the delete-mark flag of the secondary index record (col_int_key,DB_ROW_ID)=(5,0x4b7).

The hardware data watchpoint on the clustered index record was being hit by the following:

INSERT /*! IGNORE */ INTO table10000_innodb VALUES (5, 5), (5,5), …;

UPDATE table10000_innodb SET `col_int_key` = 4 /* E_R Thread6 QNO 8 CON_ID 23 */

UPDATE table10000_innodb SET `col_int_key` = 5 /* E_R Thread7 QNO 13 CON_ID 24 */

During the execution of the second UPDATE (transaction 0x1e, just a little too new to be included in purge_sys.view), the purge of the first UPDATE was blocked in the following:

10.6-MDEV-35049 36a8b44ebd96ec9a8d449c83248109d5e893f534
#17 log_free_check () at /data/Server/10.6-MDEV-35049E/storage/innobase/log/log0log.cc:956
#18 0x000063a8330d1a61 in row_purge_remove_sec_if_poss_tree (node=node@entry=0x63a8354a62b8, index=index@entry=0x7d1998069e78, entry=entry@entry=0x7d19a402df88, page_max_trx_id=page_max_trx_id@entry=0x1e)
at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:767
#19 0x000063a8330d2925 in row_purge_remove_sec_if_poss (node=node@entry=0x63a8354a62b8, index=0x7d1998069e78, entry=0x7d19a402df88) at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:991
#20 0x000063a8330d31c6 in row_purge_upd_exist_or_extern_func (thr=thr@entry=0x63a8354a6218, node=node@entry=0x63a8354a62b8, undo_rec=undo_rec@entry=0x7d19c51abdda ">\b\f\202\267\022")
at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:1157
#21 0x000063a8330d36a6 in row_purge_record_func (node=node@entry=0x63a8354a62b8, undo_rec=undo_rec@entry=0x7d19c51abdda ">\b\f\202\267\022", thr=thr@entry=0x63a8354a6218, updated_extern=0x0)
at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:1548
#22 0x000063a8330d3baa in row_purge (node=node@entry=0x63a8354a62b8, undo_rec=undo_rec@entry=0x7d19c51abdda ">\b\f\202\267\022", thr=thr@entry=0x63a8354a6218)
at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:1587
#23 0x000063a8330d3c09 in row_purge_step (thr=thr@entry=0x63a8354a6218) at /data/Server/10.6-MDEV-35049E/storage/innobase/row/row0purge.cc:1650

During this blockage, the second UPDATE had updated both the clustered index record and removed the delete-mark on the secondary index record (5,0x4b7) which had been delete-marked by the first UPDATE. Then, purge would report an error and hit ut_ad(0), crashing the debug instrumented build:

2024-11-21 13:14:38 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `col_int_key` of table `test`.`table10000_innodb`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x80000005),[6]      (0x0000000004B7)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x80000005),[6]      (0x0000000004B7)}

The problem turns out to be that ~~MDEV-34515~~ introduced an unsafe optimization: If the PAGE_MAX_TRX_ID did not change between row_purge_remove_sec_if_poss_leaf() and row_purge_remove_sec_if_poss_tree(), a call to row_purge_poss_sec() would be skipped.

A more correct condition would be the following: If the PAGE_MAX_TRX_ID was not changed and it did not belong to an active transaction when row_purge_remove_sec_if_poss_leaf() was holding the secondary index leaf page latch, the check would be redundant.

As far as I can tell, the impact of this bug is limited to some error log "spam" and the debug assertion failure. This should not cause any actual corruption.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

10.6_fix.patch
1 kB
2024-11-28 06:05
10.6_test_repeat.patch
4 kB
2024-11-28 06:05

Issue Links

causes

MDEV-35619 Assertion failure in row_purge_del_mark_error

Closed

duplicates

MDEV-35829 galera node crash with race condition

Open

is caused by

MDEV-34515 Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2024-11-27 09:26

The following test reproduces this rather easily:

--source include/have_innodb.inc

--source include/have_sequence.inc

CREATE TABLE t(a INT PRIMARY KEY, b INT NOT NULL, INDEX(b)) ENGINE=InnoDB;

SET STATEMENT unique_checks=0,foreign_key_checks=0 FOR

INSERT INTO t SELECT seq, seq from seq_1_to_10000;

--disable_query_log

let $N=30;

while ($N)

UPDATE t SET b=a+1; UPDATE t SET b=a;

dec $N;

--enable_query_log

SET GLOBAL innodb_max_purge_lag_wait=0;

DROP TABLE t;

./mtr --parallel=5 main.m-MDEV-35508{,,,,}

10.6 9ba18d1aa049603b8e865e6616e26a4da8b1ecce
main.m-MDEV-35508 w3 [ pass ] 13148
main.m-MDEV-35508 w4 [ fail ] Found warnings/errors in server log file!
Test ended at 2024-11-27 11:24:14
line
2024-11-27 11:24:06 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[4] "?(0x8000223F),[4] ">(0x8000223E)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4] "?(0x8000223F),[4] ">(0x8000223E)}

Marko Mäkelä added a comment - 2024-11-27 09:26 The following test reproduces this rather easily: --source include/have_innodb.inc --source include/have_sequence.inc CREATE TABLE t(a INT PRIMARY KEY , b INT NOT NULL , INDEX (b)) ENGINE=InnoDB; SET STATEMENT unique_checks=0,foreign_key_checks=0 FOR INSERT INTO t SELECT seq, seq from seq_1_to_10000; --disable_query_log let $N=30; while ($N) { UPDATE t SET b=a+1; UPDATE t SET b=a; dec $N; } --enable_query_log SET GLOBAL innodb_max_purge_lag_wait=0; DROP TABLE t; . /mtr --parallel=5 main.m-MDEV-35508{,,,,} 10.6 9ba18d1aa049603b8e865e6616e26a4da8b1ecce main.m-MDEV-35508 w3 [ pass ] 13148 main.m-MDEV-35508 w4 [ fail ] Found warnings/errors in server log file! Test ended at 2024-11-27 11:24:14 line 2024-11-27 11:24:06 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[4] "?(0x8000223F),[4] ">(0x8000223E)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4] "?(0x8000223F),[4] ">(0x8000223E)}

Debarun Banerjee added a comment - 2024-11-28 06:03 - edited

marko, mleich It is great that we could catch the issue. I completely agree that the issue is caused by ~~MDEV-34515~~ PAGE_MAX_TRX_ID related optimization. I think the issue is more serious and could affect release build too possibly corrupting secondary index. To understand the issue effectively I could create a debug_sync test that repeats the issue consistently every time. Please find attached 10.6_test_repeat.patch.

Here is what I find from analyzing the issue.

1. The assert could still be hit with the current fix and needs to be improved. Essentially, the transaction ID is the start point and there could be many transaction running with ID smaller than PAGE_MAX_TRX_ID set by some transaction that is already committed. A transaction with ID smaller than PAGE_MAX_TRX_ID is not going to update the page. So, the condition needs to be stricter following the same logic that we apply while checking for implicit lock for secondary index page i.e. checking for all active transaction equal or smaller than the PAGE_MAX_TRX_ID. I have attached the patch that fixes the issue - 10.6_fix.patch

2. Since it is also possible another transaction to further delete mark the key (after being unmarked), the purge could actually successfully delete the secondary index which it should not. This could affect several cases including MVCC where a transaction may miss reading a record, Rollback not able to find the older record for removing delete mark eventually leading to missing entries in secondary index. I will test some more trying to simulate other scenarios.

Debarun Banerjee added a comment - 2024-11-28 06:03 - edited marko , mleich It is great that we could catch the issue. I completely agree that the issue is caused by MDEV-34515 PAGE_MAX_TRX_ID related optimization. I think the issue is more serious and could affect release build too possibly corrupting secondary index. To understand the issue effectively I could create a debug_sync test that repeats the issue consistently every time. Please find attached 10.6_test_repeat.patch. Here is what I find from analyzing the issue. 1. The assert could still be hit with the current fix and needs to be improved. Essentially, the transaction ID is the start point and there could be many transaction running with ID smaller than PAGE_MAX_TRX_ID set by some transaction that is already committed. A transaction with ID smaller than PAGE_MAX_TRX_ID is not going to update the page. So, the condition needs to be stricter following the same logic that we apply while checking for implicit lock for secondary index page i.e. checking for all active transaction equal or smaller than the PAGE_MAX_TRX_ID. I have attached the patch that fixes the issue - 10.6_fix.patch 2. Since it is also possible another transaction to further delete mark the key (after being unmarked), the purge could actually successfully delete the secondary index which it should not. This could affect several cases including MVCC where a transaction may miss reading a record, Rollback not able to find the older record for removing delete mark eventually leading to missing entries in secondary index. I will test some more trying to simulate other scenarios.

Marko Mäkelä added a comment - 2024-11-28 08:10

debarun, thank you for 10.6_test_repeat.patch. If I disable the failing debug assertion, the test would trigger a write of the message to the server error log, but neither CHECK TABLE nor CHECK TABLE…EXTENDED would report any error.

Nevertheless, I can imagine a scenario where a secondary index record is being constantly delete-marked and delete-unmarked by a flow of DML operations, and a purge operation would prematurely delete the record, without showing any warning message. That is, a more complex test case could demonstrate actual corruption of a secondary index.

Marko Mäkelä added a comment - 2024-11-28 08:10 debarun , thank you for 10.6_test_repeat.patch . If I disable the failing debug assertion, the test would trigger a write of the message to the server error log, but neither CHECK TABLE nor CHECK TABLE…EXTENDED would report any error. Nevertheless, I can imagine a scenario where a secondary index record is being constantly delete-marked and delete-unmarked by a flow of DML operations, and a purge operation would prematurely delete the record, without showing any warning message. That is, a more complex test case could demonstrate actual corruption of a secondary index.

Debarun Banerjee added a comment - 2024-11-28 12:46

I checked for further impacts for the two cases I mentioned in last comment.

1. MVCC: By changing the attached test case to delete the record again by another transaction, and creating a read only transaction between Update and Delete, missing data read could be simulated. After purge incorrectly purges the delete marked row, SELECT using secondary index for the read only transaction returns no rows. The same SELECT would correctly show the record secondary index is not used in scan.

# Correct - One record fetched

SELECT * FROM t1 IGNORE INDEX (k1);

 col1	col2

 1	100

# Incorrect - No records fetched

SELECT * FROM t1 FORCE INDEX (k1);

col1	col2

2. Rollback: After the delete marked record is purged, rollback resulted in following error in server log. Fortunately, it looks like the rollback operation is able to re-insert back the correct record/key after facing the issue. So, things are in correct state after rollback and no corruption is observed.

2024-11-28 13:08:14 4 [Warning] InnoDB: Record in index `k1` of table `test`.`t1` was not found on rollback, trying to insert: TUPLE (info_bits=0, 2 fields): {[4]   d(0x80000064),[4]    (0x80000001)} at: COMPACT RECORD(info_bits=0, 1 fields): {[8]infimum (0x696E66696D756D00)}

Debarun Banerjee added a comment - 2024-11-28 12:46 I checked for further impacts for the two cases I mentioned in last comment. 1. MVCC: By changing the attached test case to delete the record again by another transaction, and creating a read only transaction between Update and Delete, missing data read could be simulated. After purge incorrectly purges the delete marked row, SELECT using secondary index for the read only transaction returns no rows. The same SELECT would correctly show the record secondary index is not used in scan. # Correct - One record fetched SELECT * FROM t1 IGNORE INDEX (k1); col1 col2 1 100 # Incorrect - No records fetched SELECT * FROM t1 FORCE INDEX (k1); col1 col2 2. Rollback: After the delete marked record is purged, rollback resulted in following error in server log. Fortunately, it looks like the rollback operation is able to re-insert back the correct record/key after facing the issue. So, things are in correct state after rollback and no corruption is observed. 2024-11-28 13:08:14 4 [Warning] InnoDB: Record in index `k1` of table `test`.`t1` was not found on rollback, trying to insert: TUPLE (info_bits=0, 2 fields): {[4] d(0x80000064),[4] (0x80000001)} at: COMPACT RECORD(info_bits=0, 1 fields): {[8]infimum (0x696E66696D756D00)}

Marko Mäkelä added a comment - 2024-11-29 11:38

Note that the fix of this moved some duplicated error reporting to a new function row_purge_del_mark_error(), which also includes a debug assertion ut_ad(0). Previously, such a debug assertion was only present in row_purge_remove_sec_if_poss_tree(), not in the function row_purge_remove_sec_if_poss_leaf(). Therefore, after this fix, we will get more corruption-related crashes in debug instrumented builds.

Marko Mäkelä added a comment - 2024-11-29 11:38 Note that the fix of this moved some duplicated error reporting to a new function row_purge_del_mark_error() , which also includes a debug assertion ut_ad(0) . Previously, such a debug assertion was only present in row_purge_remove_sec_if_poss_tree() , not in the function row_purge_remove_sec_if_poss_leaf() . Therefore, after this fix, we will get more corruption-related crashes in debug instrumented builds.

MariaDB Server

Race condition between purge and secondary index INSERT or UPDATE

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration