Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27992

DELETE fails to delete record after blocking is released

Details

    Description

      Isolation Level: Read Committed & Read Uncommitted
      One transaction modifies a row, the other transaction concurrently deletes that row and is blocked. After blocking is released, the second transaction DELETE fails.

      /* init */ DROP TABLE IF EXISTS t;
      /* init */ CREATE TABLE t(c1 INT PRIMARY KEY, c2 INT);
      /* init */ INSERT INTO t(c1) VALUES (8);
       
      /* t1 */ BEGIN;
      /* t2 */ BEGIN;
      /* t1 */ UPDATE t SET c1 = 5, c2 = 5;
      /* t2 */ DELETE FROM t; -- blocked
      /* t1 */ UPDATE t SET c1 = 3;
      /* t1 */ COMMIT; -- t2 unblocked
      /* t2 */ SELECT * FROM t FOR UPDATE; -- [(3, 5)]
      /* t2 */ ROLLBACK;
      

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä added a comment - - edited

            Here is a simpler, executable version of the test, again (like in MDEV-27922) using COMMIT instead of ROLLBACK because COMMIT is a simpler operation for InnoDB:

            --source include/have_innodb.inc
             
            CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT) ENGINE=InnoDB;
            INSERT INTO t1(c1) VALUES (8);
             
            BEGIN;
             
            connection default;
            UPDATE t1 SET c1 = 5, c2 = 5;
             
            connect con1,localhost,root;
            send DELETE FROM t1;
             
            connection default;
            let $wait_condition=
                select count(*) = 1 from information_schema.processlist
                where state = "Updating" and info = "DELETE FROM t1";
            --source include/wait_condition.inc
             
            UPDATE t1 SET c1 = 3;
            COMMIT;
             
            connection con1;
            reap;
            disconnect con1;
             
            connection default;
            SELECT * FROM t1;
            DROP TABLE t1;
            

            I tested some versions of MariaDB where no wrong result was returned. In those cases, DELETE reported an error:

            mariadb-10.6.5

            mysqltest: At line 24: query 'reap' failed: 1213: Deadlock found when trying to get lock; try restarting transaction
            

            I verified the wrong result (no error reported for DELETE) also for 10.5.14 and 10.6.6. If I revert the MDEV-27025 fix, the deadlock error will be reported.

            I debugged this with a version of the server that includes the MDEV-27025 (and returns the reported result). At the time the lock_wait() returns to the DELETE statement, the m_prebuilt->pcur was already positioned on the meanwhile delete-marked record (5,5).

            At the time lock_wait() was invoked, the table contained the record (5,5) and the delete-marked record (8,NULL). So, the problem appears to be that after lock_wait() is resumed and the current record is committed as delete-marked, we fail to notice that the record had been replaced with one with a smaller PRIMARY KEY value (3,5). That is why the record (3,5) will survive the execution of the DELETE statement.

            In this special case where DELETE is operating on the entire table, escalating to table-level locking (MDEV-24813) would prevent this wrong result.

            vlad.lesin, please check if we could do anything else to prevent this at a lower level.

            marko Marko Mäkelä added a comment - - edited Here is a simpler, executable version of the test, again (like in MDEV-27922 ) using COMMIT instead of ROLLBACK because COMMIT is a simpler operation for InnoDB: --source include/have_innodb.inc   CREATE TABLE t1(c1 INT PRIMARY KEY , c2 INT ) ENGINE=InnoDB; INSERT INTO t1(c1) VALUES (8);   BEGIN ;   connection default ; UPDATE t1 SET c1 = 5, c2 = 5;   connect con1,localhost,root; send DELETE FROM t1;   connection default ; let $wait_condition= select count (*) = 1 from information_schema.processlist where state = "Updating" and info = "DELETE FROM t1" ; --source include/wait_condition.inc   UPDATE t1 SET c1 = 3; COMMIT ;   connection con1; reap; disconnect con1;   connection default ; SELECT * FROM t1; DROP TABLE t1; I tested some versions of MariaDB where no wrong result was returned. In those cases, DELETE reported an error: mariadb-10.6.5 mysqltest: At line 24: query 'reap' failed: 1213: Deadlock found when trying to get lock; try restarting transaction I verified the wrong result (no error reported for DELETE ) also for 10.5.14 and 10.6.6. If I revert the MDEV-27025 fix, the deadlock error will be reported. I debugged this with a version of the server that includes the MDEV-27025 (and returns the reported result). At the time the lock_wait() returns to the DELETE statement, the m_prebuilt->pcur was already positioned on the meanwhile delete-marked record (5,5). At the time lock_wait() was invoked, the table contained the record (5,5) and the delete-marked record (8,NULL). So, the problem appears to be that after lock_wait() is resumed and the current record is committed as delete-marked, we fail to notice that the record had been replaced with one with a smaller PRIMARY KEY value (3,5). That is why the record (3,5) will survive the execution of the DELETE statement. In this special case where DELETE is operating on the entire table, escalating to table-level locking ( MDEV-24813 ) would prevent this wrong result. vlad.lesin , please check if we could do anything else to prevent this at a lower level.

            The DELETE did acquire a gap lock on the gap before the record (5,5), but that did not conflict with the preceding UPDATE.

            Even if I add SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE before the DELETE, the record (5,3) will survive the DELETE.

            marko Marko Mäkelä added a comment - The DELETE did acquire a gap lock on the gap before the record (5,5), but that did not conflict with the preceding UPDATE . Even if I add SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE before the DELETE , the record (5,3) will survive the DELETE .

            Even if both transactions report SELECT @@tx_isolation as SERIALIZABLE, the DELETE will still fail to remove the record. Note: If there was a WHERE condition on the DELETE that is supposed to match all rows, then a work-around provided MDEV-24813 would not work.

            I was thinking a bit more, and I currently hope that this problem is only related to a corner case: When the PRIMARY KEY of a record is being updated to a smaller value, a concurrent DELETE or UPDATE may have trouble.

            I found two source code comments that refers to a related problem, the Halloween problem "allowing the row to be visited more than once during the operation". Here, a matching row is not being visited at all.

            	if (row_upd_changes_ord_field_binary(index, node->update, thr,
            					     node->row, node->ext)) {
             
            		/* Update causes an ordering field (ordering fields within
            		the B-tree) of the clustered index record to change: perform
            		the update by delete marking and inserting.
             
            		TODO! What to do to the 'Halloween problem', where an update
            		moves the record forward in index so that it is again
            		updated when the cursor arrives there? Solution: the
            		read operation must check the undo record undo number when
            		choosing records to update. MySQL solves now the problem
            		externally! */
             
            		err = row_upd_clust_rec_by_insert(
            

            I understand that the comment is a description of a solution that has not been implemented. At a higher level, there is another comment about this:

            /**
            Updates a row given as a parameter to a new value. Note that we are given
            whole rows, not just the fields which are updated: this incurs some
            overhead for CPU when we check which fields are actually updated.
            TODO: currently InnoDB does not prevent the 'Halloween problem':
            in a searched update a single row can get updated several times
            if its index columns are updated!
            @param[in] old_row	Old row contents in MySQL format
            @param[out] new_row	Updated row contents in MySQL format
            @return error number or 0 */
             
            int
            ha_innobase::update_row(
            	const uchar*	old_row,
            	const uchar*	new_row)
            

            For fixing this bug, I can offer a wild idea that I do not think can be implemented easily: Once the lock has been granted and the record is committed as delete-marked, then we could try to look up the record that replaced it. If the new key is smaller, we should somehow rewind to that record (without running into the Halloween problem). Theoretically we could look up the replacement record if our transaction holds a read view (this is not necessarily the case for locking reads!) that allows us to access the undo log of the initially conflicting and now committed transaction. I do not think that the current undo log format allows us to distinguish the UPDATE of a PRIMARY KEY from a DELETE and INSERT.

            marko Marko Mäkelä added a comment - Even if both transactions report SELECT @@tx_isolation as SERIALIZABLE , the DELETE will still fail to remove the record. Note: If there was a WHERE condition on the DELETE that is supposed to match all rows, then a work-around provided MDEV-24813 would not work. I was thinking a bit more, and I currently hope that this problem is only related to a corner case: When the PRIMARY KEY of a record is being updated to a smaller value, a concurrent DELETE or UPDATE may have trouble. I found two source code comments that refers to a related problem, the Halloween problem "allowing the row to be visited more than once during the operation". Here, a matching row is not being visited at all. if (row_upd_changes_ord_field_binary(index, node->update, thr, node->row, node->ext)) {   /* Update causes an ordering field (ordering fields within the B-tree) of the clustered index record to change: perform the update by delete marking and inserting.   TODO! What to do to the 'Halloween problem', where an update moves the record forward in index so that it is again updated when the cursor arrives there? Solution: the read operation must check the undo record undo number when choosing records to update. MySQL solves now the problem externally! */   err = row_upd_clust_rec_by_insert( I understand that the comment is a description of a solution that has not been implemented. At a higher level, there is another comment about this: /** Updates a row given as a parameter to a new value. Note that we are given whole rows, not just the fields which are updated: this incurs some overhead for CPU when we check which fields are actually updated. TODO: currently InnoDB does not prevent the 'Halloween problem': in a searched update a single row can get updated several times if its index columns are updated! @param[in] old_row Old row contents in MySQL format @param[out] new_row Updated row contents in MySQL format @return error number or 0 */   int ha_innobase::update_row( const uchar* old_row, const uchar* new_row) For fixing this bug, I can offer a wild idea that I do not think can be implemented easily: Once the lock has been granted and the record is committed as delete-marked, then we could try to look up the record that replaced it. If the new key is smaller, we should somehow rewind to that record (without running into the Halloween problem). Theoretically we could look up the replacement record if our transaction holds a read view (this is not necessarily the case for locking reads!) that allows us to access the undo log of the initially conflicting and now committed transaction. I do not think that the current undo log format allows us to distinguish the UPDATE of a PRIMARY KEY from a DELETE and INSERT .

            I simplified the test a bit:

            --source include/have_innodb.inc                                                
                                                                                            
            CREATE TABLE t1(c1 INT PRIMARY KEY) ENGINE=InnoDB;                              
            INSERT INTO t1 VALUES (3);                                                     
                                                                                            
            BEGIN;                                                                          
                                                                                            
            connection default;                                                             
            UPDATE t1 SET c1 = 2;                                                          
                                                                                            
            connect con1,localhost,root;                                                    
            send DELETE FROM t1;                                                            
                                                                                            
            connection default;                                                             
            let $wait_condition=                                                            
                select count(*) = 1 from information_schema.processlist                     
                where state = "Updating" and info = "DELETE FROM t1";                       
            --source include/wait_condition.inc                                             
                                                                                            
            UPDATE t1 SET c1 = 1;                                                          
            COMMIT;                                                                         
                                                                                            
            connection con1;                                                                
            reap;                                                                           
            disconnect con1;                                                                
                                                                                            
            connection default;                                                             
            SELECT * FROM t1;                                                               
            DROP TABLE t1;
            

            Here is the sequence of actions which cause the error:
            1) The first "UPDATE" sets ORDINARY X-lock on record "3", delete-marks it, inserts record "2" and inherits gap X-lock from ORDINARY X-lock of record "3" to record "2".
            2) "DELETE" converts implicit lock of record "2" to explicit not-gap X-lock, creates it's own waiting ordinary X-lock for record "2" which conflicts with previously converted explicit lock of "UPDATE" transaction.
            3) "UPDATE transaction acquires locks on "2", but they don't conflict with "DELETE" locks due to MDEV-27025 fix. Then record "2" is delete-marked, record "1" is inserted, the transaction is committed, and its locks are released.
            4) "DELETE" transaction restores cursor position, it points to record "2" as it was delete-marked and not purged, and then it continues table scanning missing record "1".

            vlad.lesin Vladislav Lesin added a comment - I simplified the test a bit: --source include/have_innodb.inc CREATE TABLE t1(c1 INT PRIMARY KEY) ENGINE=InnoDB; INSERT INTO t1 VALUES ( 3 ); BEGIN; connection default ; UPDATE t1 SET c1 = 2 ; connect con1,localhost,root; send DELETE FROM t1; connection default ; let $wait_condition= select count(*) = 1 from information_schema.processlist where state = "Updating" and info = "DELETE FROM t1" ; --source include/wait_condition.inc UPDATE t1 SET c1 = 1 ; COMMIT; connection con1; reap; disconnect con1; connection default ; SELECT * FROM t1; DROP TABLE t1; Here is the sequence of actions which cause the error: 1) The first "UPDATE" sets ORDINARY X-lock on record "3", delete-marks it, inserts record "2" and inherits gap X-lock from ORDINARY X-lock of record "3" to record "2". 2) "DELETE" converts implicit lock of record "2" to explicit not-gap X-lock, creates it's own waiting ordinary X-lock for record "2" which conflicts with previously converted explicit lock of "UPDATE" transaction. 3) "UPDATE transaction acquires locks on "2", but they don't conflict with "DELETE" locks due to MDEV-27025 fix. Then record "2" is delete-marked, record "1" is inserted, the transaction is committed, and its locks are released. 4) "DELETE" transaction restores cursor position, it points to record "2" as it was delete-marked and not purged, and then it continues table scanning missing record "1".

            As there is no better solution at the moment, MDEV-27025 fix is reverted.

            vlad.lesin Vladislav Lesin added a comment - As there is no better solution at the moment, MDEV-27025 fix is reverted.

            People

              vlad.lesin Vladislav Lesin
              John Jove John Jove
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.