Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30103

FK constraint fails upon referenced table eviction

Details

    Description

      With the following sync point added:

      diff --git a/storage/innobase/row/row0ins.cc b/storage/innobase/row/row0ins.cc
      --- a/storage/innobase/row/row0ins.cc	(revision 05cd10b74cffe882d04c6dd3333ef0beec2b41c2)
      +++ b/storage/innobase/row/row0ins.cc	(date 1669546784733)
      @@ -1944,6 +1944,7 @@
       			dict_table_t*	referenced_table
       						= foreign->referenced_table;
       
      +			DEBUG_SYNC_C("row_ins_check_fk_ref_table_loaded");
       			if (referenced_table == NULL) {
       
       				ref_table = dict_table_open_on_name(
      

      The following test fails to insert a row in a child table, however the opposite is expected:

      --source include/have_innodb.inc
       
      create table t1 (a int, key(a), b int) engine=innodb;
      create table t2 (a int key references t1(a), b int) engine=innodb;
      insert t1 values (1, 1);
      set autocommit =0;
      set global innodb_evict_tables_on_commit_debug=1;
       
       
      begin;
      insert t1 values (20000, 1);
       
      --connect bug, localhost, root
      set debug_sync="row_ins_check_fk_ref_table_loaded signal open wait_for go";
      send
      insert t2 values(1,1);
       
      --connect purge, localhost, root
      FLUSH TABLES;
       
      --connection default
      set debug_sync="now wait_for open";
      commit;
      set debug_sync="now signal go";
      --connection bug
      reap;
      --connection default
       
      # Cleanup
      set debug_sync= reset;
      drop table t2, t1;
      

      actual result, rev. 05cd10b74c

      mysqltest: At line 26: query 'reap' failed: ER_NO_REFERENCED_ROW_2 (1452): 
      Cannot add or update a child row: a foreign key constraint fails (`test`.`t2`, CONSTRAINT 
      `t2_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t1` (`a`))
      

      The effect may be much more severe with real eviction, but I didn't succeed to simulate it with just innodb_evict_tables_on_commit_debug

      Attachments

        Issue Links

          Activity

            The test is constructed similarly to MDEV-30021, however the effect and the supposed fix are different.

            The difference between real eviction and innodb_evict_tables_on_commit_debug is that the latter also acquires lock_sys in addition to dict_sys.
            If lock_sys is not acquired, then any consequence is possible, like crash or data corruption.

            This problem is similar to the one fixed in MDEV-26217, where the after-commit debug eviction was also augmented with lock_sys acquisition, but dict_sys_t::evict_table_LRU was probably forgotten.

            There are a few ways of fixing it:

            1. Add lock_sys acquisition to dict_sys_t::evict_table_LRU and adjust FK checks

            this will follow MDEV-26217 which actually works around the missing related tables prelocking.
            dict_sys::delete() call should be protected with lock_sys, together with table locks check (lock_table_has_locks).
            The good news is that it's enough to lock when n_ref_count is 0, and if the table has foreign/referential relations.

            n_ref_count is already protected by dict_sys lock. There can be other cases when lock_sys is required, but I'm not aware of them.

            Then, the referential constraint algorithm should be modified. Now lock_table (as of MDEV-26217) checks referential table pointer inside
            dict_foreign_t under lock_sys and atomically adds a table lock. If a table was NULL though, it reports a deadlock, and this effect can also be seen.
            We hardly can call a table eviction a deadlock. though.

            row_ins_check_foreign_constraint locking should be modified as following:
            1. Try calling lock_table. It'll acquire lock_sys, so either acquire the lock, or fk_table will be NULL
            2. If fk_table is NULL, the error returned was DEADLOCK. Try to call dict_table_open_on_name.
            3. If succeeds, we'll have n_ref_count > 0. Call lock_table again.

            Then the race with table eviction will be eliminated

            2. Make correct MDL protection.

            What's worth to mention, the MDL prelocking itself is not suffitient in this regard. handler::open should also be
            called to acquire table data dictionary (dict_table_t). This does not happen, for example, when referenced constraint
            is read-only (RESTRICT/NO ACTION), so it's currently also vulnerable.

            And this is how this bug differs from MDEV-30021, by the way: even the correct prelocking could break there, as FLUSH
            just deleted the relations from datadict.

            Opening a stub for read-only references was an optimization in scope of user-reported MDEV-22180. The reported analysis
            doesn't look correct, as only first table opening leads to file reads and parsing. I guess that only cold opens were counted there,
            and the fix should be reverted.

            Additionally, all referenced tables should be prelocked, in order to call ha_open and increase tables' reference counts.

            I think it's true that only query-affected tables could be prelocked, but it shouldn't affect the performance in a long run,
            except for the common benefit of eliminating a lock.

            Anyway, event if it could be in question, an optimization could be done: don't load frm and don't parse it for referential tables, but call ha_open.

            nikitamalyavin Nikita Malyavin added a comment - The test is constructed similarly to MDEV-30021 , however the effect and the supposed fix are different. The difference between real eviction and innodb_evict_tables_on_commit_debug is that the latter also acquires lock_sys in addition to dict_sys. If lock_sys is not acquired, then any consequence is possible, like crash or data corruption. This problem is similar to the one fixed in MDEV-26217 , where the after-commit debug eviction was also augmented with lock_sys acquisition, but dict_sys_t::evict_table_LRU was probably forgotten. There are a few ways of fixing it: 1. Add lock_sys acquisition to dict_sys_t::evict_table_LRU and adjust FK checks this will follow MDEV-26217 which actually works around the missing related tables prelocking. dict_sys::delete() call should be protected with lock_sys , together with table locks check ( lock_table_has_locks ). The good news is that it's enough to lock when n_ref_count is 0 , and if the table has foreign/referential relations. n_ref_count is already protected by dict_sys lock. There can be other cases when lock_sys is required, but I'm not aware of them. Then, the referential constraint algorithm should be modified. Now lock_table (as of MDEV-26217 ) checks referential table pointer inside dict_foreign_t under lock_sys and atomically adds a table lock. If a table was NULL though, it reports a deadlock, and this effect can also be seen. We hardly can call a table eviction a deadlock. though. row_ins_check_foreign_constraint locking should be modified as following: 1. Try calling lock_table. It'll acquire lock_sys, so either acquire the lock, or fk_table will be NULL 2. If fk_table is NULL, the error returned was DEADLOCK. Try to call dict_table_open_on_name. 3. If succeeds, we'll have n_ref_count > 0. Call lock_table again. Then the race with table eviction will be eliminated 2. Make correct MDL protection. What's worth to mention, the MDL prelocking itself is not suffitient in this regard. handler::open should also be called to acquire table data dictionary ( dict_table_t ). This does not happen, for example, when referenced constraint is read-only (RESTRICT/NO ACTION), so it's currently also vulnerable. And this is how this bug differs from MDEV-30021 , by the way: even the correct prelocking could break there, as FLUSH just deleted the relations from datadict. Opening a stub for read-only references was an optimization in scope of user-reported MDEV-22180 . The reported analysis doesn't look correct, as only first table opening leads to file reads and parsing. I guess that only cold opens were counted there, and the fix should be reverted. Additionally, all referenced tables should be prelocked, in order to call ha_open and increase tables' reference counts. I think it's true that only query-affected tables could be prelocked, but it shouldn't affect the performance in a long run, except for the common benefit of eliminating a lock. Anyway, event if it could be in question, an optimization could be done: don't load frm and don't parse it for referential tables, but call ha_open .

            So far, I went approach 1 and I should describe why here.

            MDEV-22180 fix really somewhat improves the speed. According to my measurements, it is about 1.5x increase.
            I've made some deeper analysis with perf, with 12k foreign tables and single referenced table UPDATE.
            According to the results, about 98% of time is spent in prepare_fk_prelocking_list and MDL_context::find_ticket both for normal open and for optimized open with open_strategy==OPEN_STUB.
            The reason for so much time taken, is that both functions work for O(tables to open), and both are called for each table.
            Indeed, once they were changed to noop, the time taken decreased from ~3.5 seconds to ~0.15.
            So the solution would be to change a list traversal to a hash lookup. Such a change could only be scheduled to a major version.

            At the same time, innodb_evict_tables_on_commit_debug is a debug-only option. Tables with FKs (both foreign and referenced) are never evicted on release. They could be dropped under TRUNCATE or DROP TABLE, but these operations are already improved by prelocking foreign tables, which can't happen during lock_release called when committing, which is doing the debug-only eviction.

            I don't want to make additional prelocking for DMLs in a minor version, since it will also increase the time taken to open tables.

            So, approach #1 is chosen.
            Innodb doesn't evict tables with FKs during dict_sys_t::evict_table_LRU, so no change is required there now. The only thing should be done, is to improve referential constraint table locking, to avoid false DEADLOCK/TABLE_NOT_FOUND reports.

            nikitamalyavin Nikita Malyavin added a comment - So far, I went approach 1 and I should describe why here. MDEV-22180 fix really somewhat improves the speed. According to my measurements, it is about 1.5x increase. I've made some deeper analysis with perf, with 12k foreign tables and single referenced table UPDATE. According to the results, about 98% of time is spent in prepare_fk_prelocking_list and MDL_context::find_ticket both for normal open and for optimized open with open_strategy==OPEN_STUB. The reason for so much time taken, is that both functions work for O(tables to open), and both are called for each table. Indeed, once they were changed to noop, the time taken decreased from ~3.5 seconds to ~0.15. So the solution would be to change a list traversal to a hash lookup. Such a change could only be scheduled to a major version. At the same time, innodb_evict_tables_on_commit_debug is a debug-only option. Tables with FKs (both foreign and referenced) are never evicted on release. They could be dropped under TRUNCATE or DROP TABLE, but these operations are already improved by prelocking foreign tables, which can't happen during lock_release called when committing, which is doing the debug-only eviction. I don't want to make additional prelocking for DMLs in a minor version, since it will also increase the time taken to open tables. So, approach #1 is chosen. Innodb doesn't evict tables with FKs during dict_sys_t::evict_table_LRU , so no change is required there now. The only thing should be done, is to improve referential constraint table locking, to avoid false DEADLOCK/TABLE_NOT_FOUND reports.
            nikitamalyavin Nikita Malyavin added a comment - - edited

            Please review commit 025cbbb3e, branch bb-10.6-MDEV-29181.

            The fix there supposes that table eviction doesn't remove the dict_foreign_t element during eviction, but only NULLs the related table and its related index, so it needs an MDEV-30021 fix, which does it.

            nikitamalyavin Nikita Malyavin added a comment - - edited Please review commit 025cbbb3e , branch bb-10.6- MDEV-29181 . The fix there supposes that table eviction doesn't remove the dict_foreign_t element during eviction, but only NULLs the related table and its related index, so it needs an MDEV-30021 fix, which does it.

            I think that the added function lock_fk_table() currently implements double-checked locking in an unsafe manner. If the underlying dict_table_t pointers were wrapped in Atomic_relaxed, then I think it should be safe.

            marko Marko Mäkelä added a comment - I think that the added function lock_fk_table() currently implements double-checked locking in an unsafe manner. If the underlying dict_table_t pointers were wrapped in Atomic_relaxed , then I think it should be safe.
            nikitamalyavin Nikita Malyavin added a comment - - edited

            marko you are right about double-checked locking misuse.
            The fix is on the same branch.

            Link to the top commit: https://github.com/MariaDB/server/commit/14d7c451ce726fbd7d8bf2c5e051c6d4e95be76e

            nikitamalyavin Nikita Malyavin added a comment - - edited marko you are right about double-checked locking misuse. The fix is on the same branch. Link to the top commit: https://github.com/MariaDB/server/commit/14d7c451ce726fbd7d8bf2c5e051c6d4e95be76e

            I think that it would be simpler to rely on dict_sys.latch (and possibly MDL) for concurrency protection and use either regular or std::memory_order_relaxed loads and stores of dict_foreign_t::foreign_table and dict_foreign_t::referenced_table. It is hard to reason about release-acquire ordering for this complex data structures and interactions, and it does not help that we hardly run any stress tests outside x86-64, which implements Total Store Ordering. We could easily break things on ARM, POWER, RISC-V.

            In the added function lock_fk_table() we seem to need a relaxed load, to avoid acquiring dict_sys.latch in the optimistic case.

            If you change the pointers to atomic, please make sure to eliminate redundant loads, for example, in wsrep_append_foreign_key().

            I wonder if it would be correct for lock_table() to validate the fktable pointer at the very end, after a lock on the table was successfully acquired.

            marko Marko Mäkelä added a comment - I think that it would be simpler to rely on dict_sys.latch (and possibly MDL) for concurrency protection and use either regular or std::memory_order_relaxed loads and stores of dict_foreign_t::foreign_table and dict_foreign_t::referenced_table . It is hard to reason about release-acquire ordering for this complex data structures and interactions, and it does not help that we hardly run any stress tests outside x86-64, which implements Total Store Ordering. We could easily break things on ARM, POWER, RISC-V. In the added function lock_fk_table() we seem to need a relaxed load, to avoid acquiring dict_sys.latch in the optimistic case. If you change the pointers to atomic, please make sure to eliminate redundant loads, for example, in wsrep_append_foreign_key() . I wonder if it would be correct for lock_table() to validate the fktable pointer at the very end, after a lock on the table was successfully acquired.

            People

              nikitamalyavin Nikita Malyavin
              nikitamalyavin Nikita Malyavin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.