Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32899

InnoDB is holding shared dict_sys.latch while waiting for FOREIGN KEY child table lock on DDL

Details

    Description

      In order to fix the race conditions MDEV-26217 and MDEV-26554, some code was added so that InnoDB could hold a shared dict_sys.latch while waiting for an exclusive lock on tables that are connected by FOREIGN KEY statements. This is not acceptable, because a lock wait can be blocked for a long time (worst case, indefinitely if innodb_lock_wait_timeout=100000000). If another thread tries to acquire an exclusive dict_sys.latch, it will block any other threads from acquiring a shared dict_sys.latch until the table lock wait has been resolved.

      This bug can be fixed by changing lock_table_for_trx() so that whenever the caller is holding a shared dict_sys.latch, it will be released and reacquired around the call to lock_wait(). In this way, the lock object will be created or released while the table is protected by the shared dict_sys.latch. It is safe to temporarily release the dict_sys.latch, because tables on which lock objects exist cannot be evicted or dropped. In the callers, we have to take special care to ensure that dict_table_t::referenced_set is safe to traverse if dict_sys.latch was temporarily released.

      Attachments

        Issue Links

          Activity

            I reverted this due to the regression MDEV-33104.

            marko Marko Mäkelä added a comment - I reverted this due to the regression MDEV-33104 .

            To avoid reintroducing a bug like MDEV-33104, we must revise lock_table_children() so that it will successfully acquire MDL on each child table before waiting for an InnoDB table lock. The initial (reverted) version of this was holding a table reference while waiting for an InnoDB table lock. Concurrently, a DDL operation might want to drop or rebuild the table while holding an MDL_EXCLUSIVE as well as an InnoDB table lock.

            marko Marko Mäkelä added a comment - To avoid reintroducing a bug like MDEV-33104 , we must revise lock_table_children() so that it will successfully acquire MDL on each child table before waiting for an InnoDB table lock. The initial (reverted) version of this was holding a table reference while waiting for an InnoDB table lock. Concurrently, a DDL operation might want to drop or rebuild the table while holding an MDL_EXCLUSIVE as well as an InnoDB table lock.

            A metadata lock can be acquired by invoking dict_acquire_mdl_shared<false>() in lock_table_children() while holding shared dict_sys.latch. Because that function will temporarily release dict_sys.latch while waiting for MDL, we had better rescan table->referenced_set after each call, in case a constraint or a child table had been dropped meanwhile. We will have to keep track of the tables on which dict_acquire_mdl_shared<false>() was already invoked.

            marko Marko Mäkelä added a comment - A metadata lock can be acquired by invoking dict_acquire_mdl_shared<false>() in lock_table_children() while holding shared dict_sys.latch . Because that function will temporarily release dict_sys.latch while waiting for MDL, we had better rescan table->referenced_set after each call, in case a constraint or a child table had been dropped meanwhile. We will have to keep track of the tables on which dict_acquire_mdl_shared<false>() was already invoked.

            origin/10.6-MDEV-32899 c851e172ea043985fc8d3cec46368004a174892d 2024-01-23T17:10:37+02:00
            performed well in RQG testing. No new bad effects.

            mleich Matthias Leich added a comment - origin/10.6- MDEV-32899 c851e172ea043985fc8d3cec46368004a174892d 2024-01-23T17:10:37+02:00 performed well in RQG testing. No new bad effects.

            origin/10.6-MDEV-32899 f50940ee0b81b9c963bd114c54788e515220bc7e 2024-02-01T15:48:46+02:00
            performed well in RQG testing. No new problems.

            mleich Matthias Leich added a comment - origin/10.6- MDEV-32899 f50940ee0b81b9c963bd114c54788e515220bc7e 2024-02-01T15:48:46+02:00 performed well in RQG testing. No new problems.
            debarun Debarun Banerjee added a comment - https://github.com/MariaDB/server/pull/3021 looks good to me.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.