Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22733

XA PREPARE breaks MDL in pseudo_slave_mode=1

Details

    Description

      USE test;
      CREATE TABLE t(a INT);
      XA START '0';
      SET pseudo_slave_mode=1;
      INSERT INTO t VALUES(7050+0.75);
      XA PREPARE '0';
      XA END '0';
      XA PREPARE '0';
      TRUNCATE TABLE t;
      # Shutdown to observe hang (mysqladmin shutdown will hang)
      

      Will cause 10.5.4 8569dac1ec9f6853a0b2f3ea9bcbda67644ead24 (dbg+opt) to hang. Earlier releases not affected.

      Note the difference on the last command:

      10.5.4 8569dac1ec9f6853a0b2f3ea9bcbda67644ead24

      10.5.4>TRUNCATE TABLE t;
      Query OK, 0 rows affected (0.013 sec)
      

      10.4.14 ea7830eef48333e28f98a9b91f05a95735b465a3

      10.4.14>TRUNCATE TABLE t;
      ERROR 1399 (XAE07): XAER_RMFAIL: The command cannot be executed when global transaction is in the  PREPARED state
      

      Attachments

        Issue Links

          Activity

            After MDEV-21602 is fixed and the fix is merged up to 10.5, the original test case from this issue needs to be re-run on the presumably fixed branch to verify that the reported issue is gone. Or not.

            elenst Elena Stepanova added a comment - After MDEV-21602 is fixed and the fix is merged up to 10.5, the original test case from this issue needs to be re-run on the presumably fixed branch to verify that the reported issue is gone. Or not.

            Tested 10.4,10.5,10.6 and all results are the same, EXCEPT that there no longer is a hang. May indeed be good to re-check after MDEV-21602 as per elenst. Thank you

            Roel Roel Van de Paar added a comment - Tested 10.4,10.5,10.6 and all results are the same, EXCEPT that there no longer is a hang. May indeed be good to re-check after MDEV-21602 as per elenst . Thank you

            The MDEV-21602 fix cannot be trivially ported to 10.5. The following test seems to work correctly in 10.6:

            --source include/have_innodb.inc
            CREATE TABLE t(a INT) ENGINE=InnoDB;
            XA START '0';
            SET pseudo_slave_mode=1;
            INSERT INTO t VALUES(7050+0.75);
            XA END '0';
            XA PREPARE '0';
            TRUNCATE TABLE t;
            --source include/restart_mysqld.inc
            XA COMMIT '0';
            DROP TABLE t;
            

            10.6 ee39757f3c91e04a0ccbb5424fba7dd56167ad93

            mysqltest: At line 8: query 'TRUNCATE TABLE t' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction
            

            The lock wait looked like this:

            10.6 ee39757f3c91e04a0ccbb5424fba7dd56167ad93

            #3  0x000055ec3a54af50 in lock_wait (thr=thr@entry=0x7f478c0eab30) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:1804
            #4  0x000055ec3a5a33b8 in row_mysql_handle_errors (new_err=new_err@entry=0x7f47c8eb49ec, trx=trx@entry=0x7f47ca16b330, thr=thr@entry=0x7f478c0eab30, savept=savept@entry=0x0) at /mariadb/10.6/storage/innobase/row/row0mysql.cc:686
            #5  0x000055ec3a54b950 in lock_table_for_trx (table=table@entry=0x7f478c0f2ea0, trx=trx@entry=0x7f47ca16b330, mode=mode@entry=LOCK_X) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:3616
            #6  0x000055ec3a5132ac in ha_innobase::truncate (this=0x7f478c0ea300) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:13789
            

            The TRUNCATE cannot acquire an exclusive table lock, because the INSERT in the XA PREPARE transaction is holding a conflicting lock on the table (in IX mode).

            marko Marko Mäkelä added a comment - The MDEV-21602 fix cannot be trivially ported to 10.5. The following test seems to work correctly in 10.6: --source include/have_innodb.inc CREATE TABLE t(a INT ) ENGINE=InnoDB; XA START '0' ; SET pseudo_slave_mode=1; INSERT INTO t VALUES (7050+0.75); XA END '0' ; XA PREPARE '0' ; TRUNCATE TABLE t; --source include/restart_mysqld.inc XA COMMIT '0' ; DROP TABLE t; 10.6 ee39757f3c91e04a0ccbb5424fba7dd56167ad93 mysqltest: At line 8: query 'TRUNCATE TABLE t' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction The lock wait looked like this: 10.6 ee39757f3c91e04a0ccbb5424fba7dd56167ad93 #3 0x000055ec3a54af50 in lock_wait (thr=thr@entry=0x7f478c0eab30) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:1804 #4 0x000055ec3a5a33b8 in row_mysql_handle_errors (new_err=new_err@entry=0x7f47c8eb49ec, trx=trx@entry=0x7f47ca16b330, thr=thr@entry=0x7f478c0eab30, savept=savept@entry=0x0) at /mariadb/10.6/storage/innobase/row/row0mysql.cc:686 #5 0x000055ec3a54b950 in lock_table_for_trx (table=table@entry=0x7f478c0f2ea0, trx=trx@entry=0x7f47ca16b330, mode=mode@entry=LOCK_X) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:3616 #6 0x000055ec3a5132ac in ha_innobase::truncate (this=0x7f478c0ea300) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:13789 The TRUNCATE cannot acquire an exclusive table lock, because the INSERT in the XA PREPARE transaction is holding a conflicting lock on the table (in IX mode).

            The MDL breakage (as claimed in the title of this ticket) is still affecting 10.6. The test case that I posted on 2020-06-04 would trigger an InnoDB table lock wait timeout, instead of MDL conflict:

            --source include/have_innodb.inc
            CREATE TABLE t(a INT) ENGINE=InnoDB;
            XA START '0';
            SET pseudo_slave_mode=1;
            INSERT INTO t VALUES(1);
            XA END '0';
            XA PREPARE '0';
            --echo # FIXME: This should fail with MDL timeout, not InnoDB table lock wait timeout!
            TRUNCATE TABLE t;
            

            It should be possible to flag the conflict instantly without any timeout, because the current thread should be holding MDL for the XA PREPARE transaction.

            For some reason, that MDL is apparently being released prematurely and the TRUNCATE is wrongly allowed to acquire MDL_EXCLUSIVE.

            A simple fix could be to return an error if XA PREPARE is followed by any other statement than XA ROLLBACK, XA COMMIT or some ‘harmless’ statement such as SHOW, which should not access any transactions. I do not see why we would ever have to allow DDL statements while an XA transaction is open.

            marko Marko Mäkelä added a comment - The MDL breakage (as claimed in the title of this ticket) is still affecting 10.6. The test case that I posted on 2020-06-04 would trigger an InnoDB table lock wait timeout, instead of MDL conflict: --source include/have_innodb.inc CREATE TABLE t(a INT ) ENGINE=InnoDB; XA START '0' ; SET pseudo_slave_mode=1; INSERT INTO t VALUES (1); XA END '0' ; XA PREPARE '0' ; --echo # FIXME: This should fail with MDL timeout, not InnoDB table lock wait timeout! TRUNCATE TABLE t; It should be possible to flag the conflict instantly without any timeout, because the current thread should be holding MDL for the XA PREPARE transaction. For some reason, that MDL is apparently being released prematurely and the TRUNCATE is wrongly allowed to acquire MDL_EXCLUSIVE . A simple fix could be to return an error if XA PREPARE is followed by any other statement than XA ROLLBACK , XA COMMIT or some ‘harmless’ statement such as SHOW , which should not access any transactions. I do not see why we would ever have to allow DDL statements while an XA transaction is open.
            Roel Roel Van de Paar added a comment - - edited

            When everything in the testcase is executed from one client thread, and then TRUNCATE TABLE t; is executed from a secondary thread, we get a lock timeout without any UBSAN/ASAN error being observed.

            ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
            

            This being in connection with the testing requested in MDEV-30941, here.

            Roel Roel Van de Paar added a comment - - edited When everything in the testcase is executed from one client thread, and then TRUNCATE TABLE t; is executed from a secondary thread, we get a lock timeout without any UBSAN/ASAN error being observed. ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction This being in connection with the testing requested in MDEV-30941 , here .

            People

              Elkin Andrei Elkin
              Roel Roel Van de Paar
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.