Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25902

Unexpected ER_LOCK_WAIT_TIMEOUT and result after timeout, assertion failure err != DB_DUPLICATE_KEY

Details

    Description

      --source include/have_partition.inc
      --source include/have_innodb.inc
       
      CREATE TABLE t1 (pk int primary key) ENGINE=InnoDB;
      CREATE TABLE t2 (pk int primary key) ENGINE=InnoDB;
       
      --connect (con1,localhost,root,,test)
       
      START TRANSACTION;
      INSERT INTO t2 (pk) VALUES (1);
      SAVEPOINT sp;
      INSERT INTO t1 (pk) VALUES (1);
      ROLLBACK TO SAVEPOINT sp;
       
      --connection default
      SET innodb_lock_wait_timeout= 1;
      --error ER_LOCK_WAIT_TIMEOUT
      ALTER TABLE t1 PARTITION BY HASH(pk);
       
      SHOW CREATE TABLE t1;
       
      --connection con1
      COMMIT;
       
      --connection default
      ALTER TABLE t2 PARTITION BY HASH(pk);
       
      # Cleanup
      --disconnect con1
      --connection default
      DROP TABLE t1, t2;
      

      In the test case above, the problem starts at

      10.6 6c39eaeb

      --error ER_LOCK_WAIT_TIMEOUT
      ALTER TABLE t1 PARTITION BY HASH(pk);
      ERROR HY000: Lock wait timeout exceeded; try restarting transaction
      

      It looks new, 10.5 doesn't time out here.

      Further, even though the query fails, the partitioning is still performed:

      ALTER TABLE t1 PARTITION BY HASH(pk);
      ERROR HY000: Lock wait timeout exceeded; try restarting transaction
      SHOW CREATE TABLE t1;
      Table	Create Table
      t1	CREATE TABLE `t1` (
        `pk` int(11) NOT NULL,
        PRIMARY KEY (`pk`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1
       PARTITION BY HASH (`pk`)
      

      And then, further partitioning on a different table returns an unexpected error (on non-debug) or a debug assertion failure:

      mysqltest: At line 26: query 'ALTER TABLE t2 PARTITION BY HASH(pk)' failed: ER_TABLE_EXISTS_ERROR (1050): Table './test/#sql-backup-3ea389-4' already exists
      

      Please note that the assertion failure is just an aftermath, the problems start long before it.

      10.6 6c39eaeb

      mariadbd: /data/src/10.6/storage/innobase/row/row0mysql.cc:2962: dberr_t row_rename_table_for_mysql(const char*, const char*, trx_t*, bool): Assertion `err != DB_DUPLICATE_KEY' failed.
      210612 20:24:56 [ERROR] mysqld got signal 6 ;
       
      #7  0x00007f81ceefef36 in __GI___assert_fail (assertion=0x559b433158a5 "err != DB_DUPLICATE_KEY", file=0x559b43313c08 "/data/src/10.6/storage/innobase/row/row0mysql.cc", line=2962, function=0x559b43315778 "dberr_t row_rename_table_for_mysql(const char*, const char*, trx_t*, bool)") at assert.c:101
      #8  0x0000559b42b901f5 in row_rename_table_for_mysql (old_name=0x7f81c8141ca0 "test/t2", new_name=0x7f81c8141aa0 "test/#sql-backup-3ea506-4", trx=0x7f81c93f33c0, use_fk=true) at /data/src/10.6/storage/innobase/row/row0mysql.cc:2962
      #9  0x0000559b429f6814 in innobase_rename_table (trx=0x7f81c93f33c0, from=0x7f81c81427e0 "./test/t2", to=0x7f81c81429e0 "./test/#sql-backup-3ea506-4", use_fk=true) at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:13423
      #10 0x0000559b429de12f in ha_innobase::rename_table (this=0x7f819c018288, from=0x7f81c81427e0 "./test/t2", to=0x7f81c81429e0 "./test/#sql-backup-3ea506-4") at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:13671
      #11 0x0000559b425542af in handler::ha_rename_table (this=0x7f819c018288, from=0x7f81c81427e0 "./test/t2", to=0x7f81c81429e0 "./test/#sql-backup-3ea506-4") at /data/src/10.6/sql/handler.cc:5270
      #12 0x0000559b422bcfb1 in mysql_rename_table (base=0x559b46095668, old_db=0x7f81c81445e0, old_name=0x7f81c81445f0, new_db=0x7f81c81445e0, new_name=0x7f81c8143140, id=0x7f81c8144680, flags=2) at /data/src/10.6/sql/sql_table.cc:4972
      #13 0x0000559b422cd6a1 in mysql_alter_table (thd=0x7f819c000db8, new_db=0x7f819c0059c8, new_name=0x7f819c005de0, create_info=0x7f81c81453e0, table_list=0x7f819c0156b0, alter_info=0x7f81c81452f0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/src/10.6/sql/sql_table.cc:10521
      #14 0x0000559b42381fad in Sql_cmd_alter_table::execute (this=0x7f819c0160f8, thd=0x7f819c000db8) at /data/src/10.6/sql/sql_alter.cc:550
      #15 0x0000559b421c9562 in mysql_execute_command (thd=0x7f819c000db8) at /data/src/10.6/sql/sql_parse.cc:5983
      #16 0x0000559b421cf708 in mysql_parse (thd=0x7f819c000db8, rawbuf=0x7f819c0155c0 "ALTER TABLE t2 PARTITION BY HASH(pk)", length=36, parser_state=0x7f81c8146480) at /data/src/10.6/sql/sql_parse.cc:8016
      #17 0x0000559b421bbb92 in dispatch_command (command=COM_QUERY, thd=0x7f819c000db8, packet=0x7f819c00b879 "", packet_length=36, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1897
      #18 0x0000559b421ba537 in do_command (thd=0x7f819c000db8, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1406
      #19 0x0000559b42376d71 in do_handle_one_connection (connect=0x559b465ec328, put_in_cache=true) at /data/src/10.6/sql/sql_connect.cc:1410
      #20 0x0000559b42376acd in handle_one_connection (arg=0x559b466152a8) at /data/src/10.6/sql/sql_connect.cc:1312
      #21 0x0000559b428e5ed9 in pfs_spawn_thread (arg=0x559b465ebe78) at /data/src/10.6/storage/perfschema/pfs.cc:2201
      #22 0x00007f81cf416609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #23 0x00007f81cefea293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Attachments

        Issue Links

          Activity

            This is probably due to MDEV-25506 part 3, which refactored DDL operations that could involve deleting .ibd files or internally dropping tables (such as an old copy of a table in ALTER TABLE). We will first lock all tables, then lock the data dictionary, and finally attempt to modify the dictionary tables and commit. Should a further lock conflict occur during the final phase of modifying the dictionary tables, we will immediately fail with a lock wait timeout error. It looks like the assertion needs to be relaxed due to this.

            marko Marko Mäkelä added a comment - This is probably due to MDEV-25506 part 3, which refactored DDL operations that could involve deleting .ibd files or internally dropping tables (such as an old copy of a table in ALTER TABLE ). We will first lock all tables, then lock the data dictionary, and finally attempt to modify the dictionary tables and commit. Should a further lock conflict occur during the final phase of modifying the dictionary tables, we will immediately fail with a lock wait timeout error. It looks like the assertion needs to be relaxed due to this.

            The timeout is reported while ALTER TABLE t1 PARTITION BY HASH(pk) is dropping the original table (which carries a #sql-backup- name). The subsequent ALTER TABLE t2 would attempt to use the same #sql-backup- name, even though the previous ALTER TABLE t1 failed to drop that table.

            The timeout occurs due to a lock conflict on table t1. The conflicting lock had been created by the transaction in con1. The ROLLBACK TO SAVEPOINT will not release the lock on table t1 inside InnoDB. InnoDB was not even asked about that: there was no call to innobase_rollback_to_savepoint_can_release_mdl() before ALTER TABLE t1 PARTITION BY HASH(pk) was executed.

            This is a metadata locking (MDL) bug that likely was worked around by the background DROP TABLE queue that MDEV-25506 part 3 removed. The MDL for t1 was wrongly released here, without even consulting InnoDB:

            10.6 82c07b178abfa9395f1d67747148a431d6eb6ae7

            (rr) bt
            #0  MDL_context::release_lock (this=0x7f02c4001c28, duration=MDL_TRANSACTION, ticket=0x7f02c4922b40)
                at /mariadb/10.6/sql/mdl.cc:2815
            #1  0x0000555f98026220 in MDL_context::release_locks_stored_before (this=0x7f02c4001c28, 
                duration=MDL_TRANSACTION, sentinel=0x7f02c4013ce0) at /mariadb/10.6/sql/mdl.cc:2873
            #2  MDL_context::rollback_to_savepoint (this=0x7f02c4001c28, mdl_savepoint=
                  @0x7f02c401a0d0: {m_stmt_ticket = 0x0, m_trans_ticket = 0x7f02c4013ce0})
                at /mariadb/10.6/sql/mdl.cc:3037
            #3  0x0000555f9802e318 in trans_rollback_to_savepoint (thd=thd@entry=0x7f02c4001af8, 
                name=<optimized out>) at /mariadb/10.6/sql/transaction.cc:709
            #4  0x0000555f97eeb65c in mysql_execute_command (thd=thd@entry=0x7f02c4001af8)
                at /mariadb/10.6/sql/sql_parse.cc:5680
            #5  0x0000555f97ee5e5e in mysql_parse (thd=thd@entry=0x7f02c4001af8, rawbuf=<optimized out>, 
                length=<optimized out>, parser_state=<optimized out>, parser_state@entry=0x7f02e406c580)
                at /mariadb/10.6/sql/sql_parse.cc:8016
            #6  0x0000555f97ee375a in dispatch_command (command=<optimized out>, command@entry=COM_QUERY, 
                thd=thd@entry=0x7f02c4001af8, packet=packet@entry=0x7f02c400bc99 "ROLLBACK TO SAVEPOINT sp", 
                packet_length=3288416368, packet_length@entry=24, blocking=true)
                at /mariadb/10.6/sql/sql_parse.cc:1897
            

            marko Marko Mäkelä added a comment - The timeout is reported while ALTER TABLE t1 PARTITION BY HASH(pk) is dropping the original table (which carries a #sql-backup- name). The subsequent ALTER TABLE t2 would attempt to use the same #sql-backup- name, even though the previous ALTER TABLE t1 failed to drop that table. The timeout occurs due to a lock conflict on table t1 . The conflicting lock had been created by the transaction in con1 . The ROLLBACK TO SAVEPOINT will not release the lock on table t1 inside InnoDB. InnoDB was not even asked about that: there was no call to innobase_rollback_to_savepoint_can_release_mdl() before ALTER TABLE t1 PARTITION BY HASH(pk) was executed. This is a metadata locking (MDL) bug that likely was worked around by the background DROP TABLE queue that MDEV-25506 part 3 removed. The MDL for t1 was wrongly released here, without even consulting InnoDB: 10.6 82c07b178abfa9395f1d67747148a431d6eb6ae7 (rr) bt #0 MDL_context::release_lock (this=0x7f02c4001c28, duration=MDL_TRANSACTION, ticket=0x7f02c4922b40) at /mariadb/10.6/sql/mdl.cc:2815 #1 0x0000555f98026220 in MDL_context::release_locks_stored_before (this=0x7f02c4001c28, duration=MDL_TRANSACTION, sentinel=0x7f02c4013ce0) at /mariadb/10.6/sql/mdl.cc:2873 #2 MDL_context::rollback_to_savepoint (this=0x7f02c4001c28, mdl_savepoint= @0x7f02c401a0d0: {m_stmt_ticket = 0x0, m_trans_ticket = 0x7f02c4013ce0}) at /mariadb/10.6/sql/mdl.cc:3037 #3 0x0000555f9802e318 in trans_rollback_to_savepoint (thd=thd@entry=0x7f02c4001af8, name=<optimized out>) at /mariadb/10.6/sql/transaction.cc:709 #4 0x0000555f97eeb65c in mysql_execute_command (thd=thd@entry=0x7f02c4001af8) at /mariadb/10.6/sql/sql_parse.cc:5680 #5 0x0000555f97ee5e5e in mysql_parse (thd=thd@entry=0x7f02c4001af8, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, parser_state@entry=0x7f02e406c580) at /mariadb/10.6/sql/sql_parse.cc:8016 #6 0x0000555f97ee375a in dispatch_command (command=<optimized out>, command@entry=COM_QUERY, thd=thd@entry=0x7f02c4001af8, packet=packet@entry=0x7f02c400bc99 "ROLLBACK TO SAVEPOINT sp", packet_length=3288416368, packet_length@entry=24, blocking=true) at /mariadb/10.6/sql/sql_parse.cc:1897

            Raising to a blocker not as much due to the assertion failure but because of ALTER being performed even though it returns an error.

            elenst Elena Stepanova added a comment - Raising to a blocker not as much due to the assertion failure but because of ALTER being performed even though it returns an error.

            Would you agree with my proposed fix? With that, we would never release any MDL after ROLLBACK TO SAVEPOINT if the transaction had acquired any locks in InnoDB. Only in that way can the MDL perform its intended task of detecting conflicts between DML and DDL.

            I think that a follow-up performance improvement could be filed, to enhance the handlerton::savepoint_rollback_can_release_mdl interface, so that the storage engines can indicate which MDL are OK to release.

            marko Marko Mäkelä added a comment - Would you agree with my proposed fix ? With that, we would never release any MDL after ROLLBACK TO SAVEPOINT if the transaction had acquired any locks in InnoDB. Only in that way can the MDL perform its intended task of detecting conflicts between DML and DDL. I think that a follow-up performance improvement could be filed, to enhance the handlerton::savepoint_rollback_can_release_mdl interface, so that the storage engines can indicate which MDL are OK to release.

            commit 161e4bfafd2 is ok to push

            serg Sergei Golubchik added a comment - commit 161e4bfafd2 is ok to push

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.