Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25919

InnoDB reports misleading lock wait timeout on DDL operations

Details

    Description

      In the third and last part of MDEV-25506, operations that delete .ibd files were refactored so that the deletion would take place after transaction commit.

      As part of that refactoring, the lock conflict handling of internal transactions was revised so that if the data dictionary is being locked, any conflict would result in an immediate lock wait timeout error. Such lock conflicts should only be able to exist on the InnoDB persistent statistics tables or some internal tables that are related to the InnoDB FULLTEXT INDEX implementation.

      We should try to abandon the concept of 'dictionary transaction' and refactor the InnoDB internal SQL parser so that all table handles will be looked up before the parser is invoked. In this way, no data dictionary latch will be required during the execution, and lock waits can be handled in the normal fashion, without fearing any server hang (such as MDEV-15020).

      Making all transactions equal will remove the need to use separate internal transactions for InnoDB DDL operations.

      Note: Before MDEV-24589 and MDEV-25506, some InnoDB DDL operations were neither atomic nor rollback-safe, and mixing DDL and DML in one transaction would have been out of the question.

      Attachments

        Issue Links

          Activity

            After all, it can be useful to keep the concept of ‘dictionary transaction’, so that such transactions can be rolled back during an early phase of startup. Other transactions that do not hold locks on dictionary tables can be rolled back in the background while the server is already accepting connections.

            What really needs to change is that the current thread must not hold any dict_sys latch when attempting to acquire an InnoDB table or record lock. We must remove any table lookup from pars_sql() and always pass the table names via pars_bound_id_t.

            marko Marko Mäkelä added a comment - After all, it can be useful to keep the concept of ‘dictionary transaction’, so that such transactions can be rolled back during an early phase of startup. Other transactions that do not hold locks on dictionary tables can be rolled back in the background while the server is already accepting connections. What really needs to change is that the current thread must not hold any dict_sys latch when attempting to acquire an InnoDB table or record lock. We must remove any table lookup from pars_sql() and always pass the table names via pars_bound_id_t .

            As part of this fix, we may reduce the use of trx_t::dict_operation_lock_mode. Also MDEV-23484 would reduce the use of the field. Maybe the field can be removed altogether.

            marko Marko Mäkelä added a comment - As part of this fix, we may reduce the use of trx_t::dict_operation_lock_mode . Also MDEV-23484 would reduce the use of the field. Maybe the field can be removed altogether.

            Random failures of the test parts.partition_special_innodb were causing some headache until I realized that

            --error ER_LOCK_WAIT_TIMEOUT
            ALTER TABLE t1 ADD PARTITION PARTITIONS 2;
            

            will invoke ha_innobase::delete_table() without holding MDL_EXCLUSIVE. Therefore, background tasks that might access the newly created partitions must explicitly be disabled in this case; we cannot rely on MDL.

            marko Marko Mäkelä added a comment - Random failures of the test parts.partition_special_innodb were causing some headache until I realized that --error ER_LOCK_WAIT_TIMEOUT ALTER TABLE t1 ADD PARTITION PARTITIONS 2; will invoke ha_innobase::delete_table() without holding MDL_EXCLUSIVE . Therefore, background tasks that might access the newly created partitions must explicitly be disabled in this case; we cannot rely on MDL.

            We also removed dict_table_t::stats_bg_flag with proper use of MDL. The background tasks that would update persistent statistics for InnoDB tables or defragment them will acquire MDL on the table name, which will ensure that no DDL may be executed concurrently on the tables.

            The refactored waiting logic should essentially fix MDEV-15020.

            marko Marko Mäkelä added a comment - We also removed dict_table_t::stats_bg_flag with proper use of MDL. The background tasks that would update persistent statistics for InnoDB tables or defragment them will acquire MDL on the table name, which will ensure that no DDL may be executed concurrently on the tables. The refactored waiting logic should essentially fix MDEV-15020 .

            For the record, I just got this test to fail once on a 10.5-based branch. I had added the test into my local development tree months ago, and only now it failed for the first time:

            --source include/have_innodb.inc
             
            CREATE TABLE t (a INT) ENGINE=InnoDB STATS_PERSISTENT=1;
            --source include/restart_mysqld.inc
            connect (con,localhost,root);
            send ALTER TABLE mysql.innodb_index_stats FORCE, ALGORITHM=INPLACE;
            connection default;
            SELECT * FROM t;
            connection con;
            reap;
            connection default;
            DROP TABLE t;
            

            The failure looked like this:

            10.5 7d360060cb6ea2558820b343ef77d59584b0b805

            mysqltest: At line 8: query 'SELECT * FROM t' failed: 2013: Lost connection to MySQL server during query
            mariadbd: /mariadb/10.5m/storage/innobase/handler/handler0alter.cc:10008: bool commit_try_rebuild(Alter_inplace_info *, ha_innobase_inplace_ctx *, TABLE *, const TABLE *, trx_t *, const char *): Assertion `user_table->get_ref_count() == 1' failed.
            

            marko Marko Mäkelä added a comment - For the record, I just got this test to fail once on a 10.5-based branch. I had added the test into my local development tree months ago, and only now it failed for the first time: --source include/have_innodb.inc   CREATE TABLE t (a INT ) ENGINE=InnoDB STATS_PERSISTENT=1; --source include/restart_mysqld.inc connect (con,localhost,root); send ALTER TABLE mysql.innodb_index_stats FORCE , ALGORITHM=INPLACE; connection default ; SELECT * FROM t; connection con; reap; connection default ; DROP TABLE t; The failure looked like this: 10.5 7d360060cb6ea2558820b343ef77d59584b0b805 mysqltest: At line 8: query 'SELECT * FROM t' failed: 2013: Lost connection to MySQL server during query … mariadbd: /mariadb/10.5m/storage/innobase/handler/handler0alter.cc:10008: bool commit_try_rebuild(Alter_inplace_info *, ha_innobase_inplace_ctx *, TABLE *, const TABLE *, trx_t *, const char *): Assertion `user_table->get_ref_count() == 1' failed.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.