Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34253

ALTER .. FOREIGN KEY stuck in "Committing alter table to storage engine" does not obey innodb_lock_wait_timeout

Details

    Description

      The test case below is non-deterministic, run with --repeat=N. It usually fails for me in a few attempts.

      --source include/have_innodb.inc
       
      CREATE TABLE A (f INT, KEY (f)) ENGINE=InnoDB;
      CREATE TABLE B (f INT, FOREIGN KEY fk1(f) REFERENCES A(f)) ENGINE=InnoDB;
       
      ALTER TABLE A ADD FOREIGN KEY fk2(f) REFERENCES B(f);
      ALTER TABLE B ADD FOREIGN KEY fk(f) REFERENCES B(f);
       
      SET FOREIGN_KEY_CHECKS = OFF, LOCK_WAIT_TIMEOUT= 2, INNODB_LOCK_WAIT_TIMEOUT= 1;
       
      --connect (con1,localhost,root,,)
      SET FOREIGN_KEY_CHECKS = OFF, LOCK_WAIT_TIMEOUT= 2, INNODB_LOCK_WAIT_TIMEOUT= 1;
       
      --connection con1
      --send
        ALTER TABLE B ADD FOREIGN KEY fk3 (f) REFERENCES A (f);
      --connection default
      --send
        ALTER TABLE A ADD FOREIGN KEY fk (f) REFERENCES A (f);
      --connection con1
      --reap
      --connection default
      --error ER_FK_FAIL_ADD_SYSTEM,ER_LOCK_WAIT_TIMEOUT
      --reap
       
      DROP TABLE A, B;
      

      Two last concurrent ALTERs sometimes mutually lock up. It happened in earlier releases, too. However, in 10.6.17 release the threads were getting stalled and the deadlock was temporary, resulting in LOCK_WAIT_TIMEOUT error after innodb_lock_wait_timeout was exhausted. In 10.6.18+ instead the threads keep looping at high CPU usage, switching back and forth from "Committing alter table to storage engine" and "Waiting for table metadata lock", they no longer obey the timeout and stay in this stuck state seemingly forever. Killing the queries doesn't help either, they just switch to "Killed" in "Committing alter table to storage engine" state, but keep looping.

      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      | Id | User | Host            | db   | Command | Time | State                                    | Info                                                   | Progress |
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      |  9 | root | localhost       | test | Query   |    3 | Waiting for table metadata lock          | ALTER TABLE A ADD FOREIGN KEY fk (f) REFERENCES A (f)  |    0.000 |
      | 10 | root | localhost       | test | Query   |    3 | Committing alter table to storage engine | ALTER TABLE B ADD FOREIGN KEY fk3 (f) REFERENCES A (f) |    0.000 |
      | 11 | root | localhost:44368 | test | Query   |    0 | starting                                 | show processlist                                       |    0.000 |
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      3 rows in set (0.001 sec)
      MariaDB [test]> show processlist;
      +----+------+-----------------+------+---------+------+---------------------------------+--------------------------------------------------------+----------+
      | Id | User | Host            | db   | Command | Time | State                           | Info                                                   | Progress |
      +----+------+-----------------+------+---------+------+---------------------------------+--------------------------------------------------------+----------+
      |  9 | root | localhost       | test | Query   |    4 | Waiting for table metadata lock | ALTER TABLE A ADD FOREIGN KEY fk (f) REFERENCES A (f)  |    0.000 |
      | 10 | root | localhost       | test | Query   |    4 | Waiting for table metadata lock | ALTER TABLE B ADD FOREIGN KEY fk3 (f) REFERENCES A (f) |    0.000 |
      | 11 | root | localhost:44368 | test | Query   |    0 | starting                        | show processlist                                       |    0.000 |
      +----+------+-----------------+------+---------+------+---------------------------------+--------------------------------------------------------+----------+
      3 rows in set (0.000 sec)
       
      MariaDB [test]> show processlist;
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      | Id | User | Host            | db   | Command | Time | State                                    | Info                                                   | Progress |
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      |  9 | root | localhost       | test | Query   |    8 | Committing alter table to storage engine | ALTER TABLE A ADD FOREIGN KEY fk (f) REFERENCES A (f)  |    0.000 |
      | 10 | root | localhost       | test | Query   |    8 | Waiting for table metadata lock          | ALTER TABLE B ADD FOREIGN KEY fk3 (f) REFERENCES A (f) |    0.000 |
      | 11 | root | localhost:44368 | test | Query   |    0 | starting                                 | show processlist                                       |    0.000 |
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
       
      Query OK, 0 rows affected (0.000 sec)
       
      MariaDB [test]> kill query 9;
      Query OK, 0 rows affected (0.000 sec)
       
      MariaDB [test]> show processlist;
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      | Id | User | Host            | db   | Command | Time | State                                    | Info                                                   | Progress |
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      |  9 | root | localhost       | test | Killed  |   57 | Committing alter table to storage engine | ALTER TABLE A ADD FOREIGN KEY fk (f) REFERENCES A (f)  |    0.000 |
      | 10 | root | localhost       | test | Query   |   57 | Committing alter table to storage engine | ALTER TABLE B ADD FOREIGN KEY fk3 (f) REFERENCES A (f) |    0.000 |
      | 11 | root | localhost:44368 | test | Query   |    0 | starting                                 | show processlist                                       |    0.000 |
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      3 rows in set (0.000 sec)
       
      MariaDB [test]> kill query 10;
      Query OK, 0 rows affected (0.000 sec)
       
      MariaDB [test]> show processlist;
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      | Id | User | Host            | db   | Command | Time | State                                    | Info                                                   | Progress |
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      |  9 | root | localhost       | test | Killed  |   86 | Committing alter table to storage engine | ALTER TABLE A ADD FOREIGN KEY fk (f) REFERENCES A (f)  |    0.000 |
      | 10 | root | localhost       | test | Killed  |   86 | Committing alter table to storage engine | ALTER TABLE B ADD FOREIGN KEY fk3 (f) REFERENCES A (f) |    0.000 |
      | 11 | root | localhost:44368 | test | Query   |    0 | starting                                 | show processlist                                       |    0.000 |
      +----+------+-----------------+------+---------+------+------------------------------------------+--------------------------------------------------------+----------+
      3 rows in set (0.000 sec)
      

      The change was apparently introduced by this commit:

      commit b2654ba82651630dba0dd2012f45b77299a43548
      Author: Marko Mäkelä
      Date:   Thu Feb 1 15:48:46 2024 +0200
       
          MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN KEY child table lock on DDL
      

      Attachments

        Issue Links

          Activity

            I don’t think that this can be fixed in a storage engine. The locking related to FOREIGN KEY constraints was discussed several months ago related to MDEV-33104, which forced me to rewrite the fix of MDEV-32899. The current implementation is yet another work-around of a deficiency that was originally worked around by InnoDB maintaining counters dict_table_t::n_foreign_key_checks_running and deferring some DROP TABLE operations until they are safe to execute. These work-arounds were removed in MDEV-21175 and MDEV-25506 so that the DDL operations can follow the same ACID principles as DML.

            I think that the only sustainable solution is to implement the necessary and sufficient metadata locking around FOREIGN KEY in the SQL layer. The current work-arounds in InnoDB should be revised or removed accordingly.

            marko Marko Mäkelä added a comment - I don’t think that this can be fixed in a storage engine. The locking related to FOREIGN KEY constraints was discussed several months ago related to MDEV-33104 , which forced me to rewrite the fix of MDEV-32899 . The current implementation is yet another work-around of a deficiency that was originally worked around by InnoDB maintaining counters dict_table_t::n_foreign_key_checks_running and deferring some DROP TABLE operations until they are safe to execute. These work-arounds were removed in MDEV-21175 and MDEV-25506 so that the DDL operations can follow the same ACID principles as DML. I think that the only sustainable solution is to implement the necessary and sufficient metadata locking around FOREIGN KEY in the SQL layer. The current work-arounds in InnoDB should be revised or removed accordingly.
            elenst Elena Stepanova added a comment - - edited

            If it could be broken in the storage engine, it should be fixable in the storage engine, at least to the level of how it was before.

            I won't make claims regarding the effect it could have on real users (although it doesn't appear to be limited to foreign_key_checks=off, even though it was needed for the provided test case), but for tests it is disastrous. Unless it is fixed very soon to at least obey a timeout as it did before, I will have to disable all foreign key operations completely in all concurrent tests. Given how many new problems are introduced in this area all the time and used to be found by the concurrent tests, the results of such test shortage is easy to predict, but I have no choice.

            elenst Elena Stepanova added a comment - - edited If it could be broken in the storage engine, it should be fixable in the storage engine, at least to the level of how it was before. I won't make claims regarding the effect it could have on real users (although it doesn't appear to be limited to foreign_key_checks=off, even though it was needed for the provided test case), but for tests it is disastrous. Unless it is fixed very soon to at least obey a timeout as it did before, I will have to disable all foreign key operations completely in all concurrent tests. Given how many new problems are introduced in this area all the time and used to be found by the concurrent tests, the results of such test shortage is easy to predict, but I have no choice.

            It is totally the bug in the engine. dict_acquire_mdl_shared() is trying to get an MDL lock, the MDL subsystem returns MDL_wait::VICTIM, but instead of aborting dict_acquire_mdl_shared() keeps retrying indefinitely. It should either abort immediately, as it was told, or at least respect the timeout.

            serg Sergei Golubchik added a comment - It is totally the bug in the engine. dict_acquire_mdl_shared() is trying to get an MDL lock, the MDL subsystem returns MDL_wait::VICTIM, but instead of aborting dict_acquire_mdl_shared() keeps retrying indefinitely . It should either abort immediately, as it was told, or at least respect the timeout.

            MDL_wait looks like an internal implementation detail of the subsystem, which is not exposed to the callers of member functions of MDL_context, such as storage engines.

            The functions MDL_context::acquire_lock() and MDL_context::try_acquire_lock(), which are accessible to a storage engine, do not distinguish between reasons for failing to acquire a lock.

            It seems to me that fixing this bug requires a change to both MDL_context::try_acquire_lock() and dict_acquire_mdl_shared() so that the internal states MDL_wait::VICTIM and MDL_wait::KILLED will result in aborting the operation.

            marko Marko Mäkelä added a comment - MDL_wait looks like an internal implementation detail of the subsystem, which is not exposed to the callers of member functions of MDL_context , such as storage engines. The functions MDL_context::acquire_lock() and MDL_context::try_acquire_lock() , which are accessible to a storage engine, do not distinguish between reasons for failing to acquire a lock. It seems to me that fixing this bug requires a change to both MDL_context::try_acquire_lock() and dict_acquire_mdl_shared() so that the internal states MDL_wait::VICTIM and MDL_wait::KILLED will result in aborting the operation.

            People

              sanja Oleksandr Byelkin
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.