[MDEV-24581] MDL race condition in trans_rollback_to_savepoint() Created: 2021-01-13  Updated: 2023-11-06  Resolved: 2023-11-06

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - InnoDB
Affects Version/s: 10.0.11, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.3

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: race, upstream

Issue Links:
Duplicate
duplicates MDEV-25902 Unexpected ER_LOCK_WAIT_TIMEOUT and r... Closed
duplicates MDEV-26077 Assertion failure err != DB_DUPLICATE... Closed
Relates
relates to MDEV-25296 Failing assertion: UT_LIST_GET_LEN(tr... Closed
relates to MDEV-24322 Assertion failure in file /home/build... Closed

 Description   

In MariaDB Server 10.0.11, a change from MySQL 5.6.16 was applied.

As far as I understand, the intention of the ha_rollback_to_savepoint_can_release_mdl() check is to ensure that no other transaction inside InnoDB is waiting for a lock on the table that the ROLLBACK TO SAVEPOINT would release. But, the check seems to be misplaced. We could have the following happen:

  1. innobase_rollback_to_savepoint_can_release_mdl() returns true, because the transaction only had placed implicit record locks. (Until MDEV-16232 is implemented, this is only possible if the transaction performed nothing else than INSERT operations).
  2. Another transaction starts to wait for a record lock that our transaction had implicitly locked. The implicit lock will be converted to an explicit one.
  3. ha_rollback_to_savepoint() will be executed. No explicit locks will be released, because we did not roll back the entire transaction.
  4. The MDL will be released in thd->mdl_context.rollback_to_savepoint(), even though the other transaction will be blocked inside InnoDB until our transaction has been committed or rolled back.

Note: In innobase_rollback_to_savepoint_can_release_mdl() there is another race that would be fixed by the first hunk of the following untested patch:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 7cc4401d0ff..bca47f011a6 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -4999,15 +4999,14 @@ innobase_rollback_to_savepoint_can_release_mdl(
 	DBUG_ASSERT(hton == innodb_hton_ptr);
 
 	trx_t*	trx = check_trx_exists(thd);
+	trx_mutex_enter(trx);
+	ulint n_locks = UT_LIST_GET_LEN(trx->lock.trx_locks) == 0;
+	trx_mutex_exit(trx);
+
 
 	/* If transaction has not acquired any locks then it is safe
 	to release MDL after rollback to savepoint */
-	if (UT_LIST_GET_LEN(trx->lock.trx_locks) == 0) {
-
-		DBUG_RETURN(true);
-	}
-
-	DBUG_RETURN(false);
+	DBUG_RETURN(!n_locks);
 }
 
 /*****************************************************************//**
diff --git a/sql/transaction.cc b/sql/transaction.cc
index 543e0b7ad38..66b40d23e80 100644
--- a/sql/transaction.cc
+++ b/sql/transaction.cc
@@ -718,10 +718,6 @@ bool trans_rollback_to_savepoint(THD *thd, LEX_STRING name)
     For backward-compatibility reasons we always release MDL if binary
     logging is off.
   */
-  bool mdl_can_safely_rollback_to_savepoint=
-                (!(mysql_bin_log.is_open() && thd->variables.sql_log_bin) ||
-                 ha_rollback_to_savepoint_can_release_mdl(thd));
-
   if (ha_rollback_to_savepoint(thd, sv))
     res= TRUE;
   else if (((thd->variables.option_bits & OPTION_KEEP_LOG) ||
@@ -733,7 +729,8 @@ bool trans_rollback_to_savepoint(THD *thd, LEX_STRING name)
 
   thd->transaction.savepoints= sv;
 
-  if (!res && mdl_can_safely_rollback_to_savepoint)
+  if (!res && (!(mysql_bin_log.is_open() && thd->variables.sql_log_bin) ||
+               ha_rollback_to_savepoint_can_release_mdl(thd)))
     thd->mdl_context.rollback_to_savepoint(sv->mdl_savepoint);
 
   DBUG_RETURN(MY_TEST(res));

Note: I did not review whether this could break something in other implementations of handlerton::savepoint_rollback_can_release_mdl.



 Comments   
Comment by Sergei Golubchik [ 2023-11-06 ]

server part of the change is done in MDEV-26077, commit https://github.com/MariaDB/server/commit/2bf6f2c054c2

Comment by Marko Mäkelä [ 2023-11-06 ]

I believe that this was fixed by MDEV-25902 and its MDEV-26077 backport.

The function innobase_rollback_to_savepoint_can_release_mdl() checks if trx->lock.trx_locks is 0. That read is not protected by trx->mutex, but a dirty read should be fine, because if the value is 0, then there can’t be any table locks, and therefore also no implicit or explicit record locks may be created.

As I had pointed out, the optimization is not optimal: perhaps we could release some MDL, but the API does not let InnoDB tell which MDL can be released: it is an "all or nothing" deal.

Generated at Thu Feb 08 09:31:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.