[MDEV-27992] DELETE fails to delete record after blocking is released Created: 2022-03-03  Updated: 2023-03-10  Resolved: 2022-03-07

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Delete, Locking, Storage Engine - InnoDB
Affects Version/s: 10.5.14, 10.6.6, 10.7.3
Fix Version/s: 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Critical
Reporter: John Jove Assignee: Vladislav Lesin
Resolution: Fixed Votes: 0
Labels: transactions
Environment:

OS: Ubuntu-20.04 LTS


Issue Links:
Blocks
Problem/Incident
is caused by MDEV-27025 insert-intention lock conflicts with ... Closed
Relates
relates to MDEV-29433 innodb.lock_delete_updated is unstable Closed
relates to MDEV-24813 Locking full table scan fails to use ... Confirmed

 Description   

Isolation Level: Read Committed & Read Uncommitted
One transaction modifies a row, the other transaction concurrently deletes that row and is blocked. After blocking is released, the second transaction DELETE fails.

/* init */ DROP TABLE IF EXISTS t;
/* init */ CREATE TABLE t(c1 INT PRIMARY KEY, c2 INT);
/* init */ INSERT INTO t(c1) VALUES (8);
 
/* t1 */ BEGIN;
/* t2 */ BEGIN;
/* t1 */ UPDATE t SET c1 = 5, c2 = 5;
/* t2 */ DELETE FROM t; -- blocked
/* t1 */ UPDATE t SET c1 = 3;
/* t1 */ COMMIT; -- t2 unblocked
/* t2 */ SELECT * FROM t FOR UPDATE; -- [(3, 5)]
/* t2 */ ROLLBACK;



 Comments   
Comment by Marko Mäkelä [ 2022-03-03 ]

Here is a simpler, executable version of the test, again (like in MDEV-27922) using COMMIT instead of ROLLBACK because COMMIT is a simpler operation for InnoDB:

--source include/have_innodb.inc
 
CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT) ENGINE=InnoDB;
INSERT INTO t1(c1) VALUES (8);
 
BEGIN;
 
connection default;
UPDATE t1 SET c1 = 5, c2 = 5;
 
connect con1,localhost,root;
send DELETE FROM t1;
 
connection default;
let $wait_condition=
    select count(*) = 1 from information_schema.processlist
    where state = "Updating" and info = "DELETE FROM t1";
--source include/wait_condition.inc
 
UPDATE t1 SET c1 = 3;
COMMIT;
 
connection con1;
reap;
disconnect con1;
 
connection default;
SELECT * FROM t1;
DROP TABLE t1;

I tested some versions of MariaDB where no wrong result was returned. In those cases, DELETE reported an error:

mariadb-10.6.5

mysqltest: At line 24: query 'reap' failed: 1213: Deadlock found when trying to get lock; try restarting transaction

I verified the wrong result (no error reported for DELETE) also for 10.5.14 and 10.6.6. If I revert the MDEV-27025 fix, the deadlock error will be reported.

I debugged this with a version of the server that includes the MDEV-27025 (and returns the reported result). At the time the lock_wait() returns to the DELETE statement, the m_prebuilt->pcur was already positioned on the meanwhile delete-marked record (5,5).

At the time lock_wait() was invoked, the table contained the record (5,5) and the delete-marked record (8,NULL). So, the problem appears to be that after lock_wait() is resumed and the current record is committed as delete-marked, we fail to notice that the record had been replaced with one with a smaller PRIMARY KEY value (3,5). That is why the record (3,5) will survive the execution of the DELETE statement.

In this special case where DELETE is operating on the entire table, escalating to table-level locking (MDEV-24813) would prevent this wrong result.

vlad.lesin, please check if we could do anything else to prevent this at a lower level.

Comment by Marko Mäkelä [ 2022-03-03 ]

The DELETE did acquire a gap lock on the gap before the record (5,5), but that did not conflict with the preceding UPDATE.

Even if I add SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE before the DELETE, the record (5,3) will survive the DELETE.

Comment by Marko Mäkelä [ 2022-03-04 ]

Even if both transactions report SELECT @@tx_isolation as SERIALIZABLE, the DELETE will still fail to remove the record. Note: If there was a WHERE condition on the DELETE that is supposed to match all rows, then a work-around provided MDEV-24813 would not work.

I was thinking a bit more, and I currently hope that this problem is only related to a corner case: When the PRIMARY KEY of a record is being updated to a smaller value, a concurrent DELETE or UPDATE may have trouble.

I found two source code comments that refers to a related problem, the Halloween problem "allowing the row to be visited more than once during the operation". Here, a matching row is not being visited at all.

	if (row_upd_changes_ord_field_binary(index, node->update, thr,
					     node->row, node->ext)) {
 
		/* Update causes an ordering field (ordering fields within
		the B-tree) of the clustered index record to change: perform
		the update by delete marking and inserting.
 
		TODO! What to do to the 'Halloween problem', where an update
		moves the record forward in index so that it is again
		updated when the cursor arrives there? Solution: the
		read operation must check the undo record undo number when
		choosing records to update. MySQL solves now the problem
		externally! */
 
		err = row_upd_clust_rec_by_insert(

I understand that the comment is a description of a solution that has not been implemented. At a higher level, there is another comment about this:

/**
Updates a row given as a parameter to a new value. Note that we are given
whole rows, not just the fields which are updated: this incurs some
overhead for CPU when we check which fields are actually updated.
TODO: currently InnoDB does not prevent the 'Halloween problem':
in a searched update a single row can get updated several times
if its index columns are updated!
@param[in] old_row	Old row contents in MySQL format
@param[out] new_row	Updated row contents in MySQL format
@return error number or 0 */
 
int
ha_innobase::update_row(
	const uchar*	old_row,
	const uchar*	new_row)

For fixing this bug, I can offer a wild idea that I do not think can be implemented easily: Once the lock has been granted and the record is committed as delete-marked, then we could try to look up the record that replaced it. If the new key is smaller, we should somehow rewind to that record (without running into the Halloween problem). Theoretically we could look up the replacement record if our transaction holds a read view (this is not necessarily the case for locking reads!) that allows us to access the undo log of the initially conflicting and now committed transaction. I do not think that the current undo log format allows us to distinguish the UPDATE of a PRIMARY KEY from a DELETE and INSERT.

Comment by Vladislav Lesin [ 2022-03-04 ]

I simplified the test a bit:

--source include/have_innodb.inc                                                
                                                                                
CREATE TABLE t1(c1 INT PRIMARY KEY) ENGINE=InnoDB;                              
INSERT INTO t1 VALUES (3);                                                     
                                                                                
BEGIN;                                                                          
                                                                                
connection default;                                                             
UPDATE t1 SET c1 = 2;                                                          
                                                                                
connect con1,localhost,root;                                                    
send DELETE FROM t1;                                                            
                                                                                
connection default;                                                             
let $wait_condition=                                                            
    select count(*) = 1 from information_schema.processlist                     
    where state = "Updating" and info = "DELETE FROM t1";                       
--source include/wait_condition.inc                                             
                                                                                
UPDATE t1 SET c1 = 1;                                                          
COMMIT;                                                                         
                                                                                
connection con1;                                                                
reap;                                                                           
disconnect con1;                                                                
                                                                                
connection default;                                                             
SELECT * FROM t1;                                                               
DROP TABLE t1;

Here is the sequence of actions which cause the error:
1) The first "UPDATE" sets ORDINARY X-lock on record "3", delete-marks it, inserts record "2" and inherits gap X-lock from ORDINARY X-lock of record "3" to record "2".
2) "DELETE" converts implicit lock of record "2" to explicit not-gap X-lock, creates it's own waiting ordinary X-lock for record "2" which conflicts with previously converted explicit lock of "UPDATE" transaction.
3) "UPDATE transaction acquires locks on "2", but they don't conflict with "DELETE" locks due to MDEV-27025 fix. Then record "2" is delete-marked, record "1" is inserted, the transaction is committed, and its locks are released.
4) "DELETE" transaction restores cursor position, it points to record "2" as it was delete-marked and not purged, and then it continues table scanning missing record "1".

Comment by Vladislav Lesin [ 2022-03-07 ]

As there is no better solution at the moment, MDEV-27025 fix is reverted.

Generated at Thu Feb 08 09:57:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.