Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21175

Remove dict_table_t::n_foreign_key_checks_running from InnoDB

Details

    Description

      InnoDB should clean n_foreign_key_checks usage and assume that SQL is already holding
      appropriate MDL. But an Insert into the child table doesn't seem to acquire MDL on the parent. When vice-versa is happening that there is MDL on the child when insert is happening
      for the parent table.

      The following test case could repeat the issue:

      --source include/have_innodb.inc
      create table t1(f1 int not null, primary key(f1))engine=innodb;
      create table t2(f1 int not null primary key,
                      f2 int not null, foreign key(f2) references t1(f1))engine=innodb;
      set DEBUG_SYNC='ib_after_row_insert_step SIGNAL alter WAIT_FOR dml';
      --send insert into t1 values(1)
      connect(con1,localhost,root,,,);
      set DEBUG_SYNC="now WAIT_FOR alter";
      alter table t2 force;
      set DEBUG_SYNC="now SIGNAL dml";
      connection default;
      reap;
      drop table t2, t1;
      disconnect con1;
      

      In above test case, ALTER TABLE t2 FORCE waits for MDL to open the table.

      Another test case:

      --source include/have_innodb.inc
      create table t1(f1 int not null, primary key(f1))engine=innodb;
      create table t2(f1 int not null primary key,
                      f2 int not null, foreign key(f2) references t1(f1))engine=innodb;
      set DEBUG_SYNC='ib_after_row_insert_step SIGNAL alter WAIT_FOR dml';
      --send insert into t2 values(1, 2)
      connect(con1,localhost,root,,,);
      set DEBUG_SYNC="now WAIT_FOR alter";
      alter table t1 force;
      set DEBUG_SYNC="now SIGNAL dml";
      connection default;
      reap;
      drop table t2, t1;
      disconnect con1;
      

      In this case, alter waits at commit_inplace_alter_table(). That too for innodb table lock.

      #6  0x00005555565df591 in lock_wait_suspend_thread (thr=0x7fffb8030768)
          at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/lock/lock0wait.cc:351
      #7  0x00005555566bb378 in row_mysql_handle_errors (new_err=0x7ffff01c3844,
          trx=0x7ffff0cc3268, thr=0x7fffb8030768, savept=0x0)
          at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/row/row0mysql.cc:740
      #8  0x00005555565cd2f7 in lock_table_for_trx (table=0x7fffc41317e8,
          trx=0x7ffff0cc3268, mode=LOCK_X)
          at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/lock/lock0lock.cc:4070
      #9  0x00005555566ad12d in row_merge_lock_table (trx=0x7ffff0cc3268,
          table=0x7fffc41317e8, mode=LOCK_X)
          at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/row/row0merge.cc:3698
      #10 0x0000555556568d42 in ha_innobase::commit_inplace_alter_table (this=
          0x7fffb8020c08, altered_table=0x7ffff01c4fb0,
          ha_alter_info=0x7ffff01c4f20, commit=true)
          at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/handler/handler0alter.cc:10699
      #11 0x000055555611bf0a in handler::ha_commit_inplace_alter_table (
          this=0x7fffb8020c08, altered_table=0x7ffff01c4fb0,
          ha_alter_info=0x7ffff01c4f20, commit=true
      

      First case is acquiring MDL on child table (un-necessary locking)
      Second case should do MDL on parent table.

      Attachments

        Issue Links

          Activity

            thiru Thirunarayanan Balathandayuthapani created issue -

            Related to this, I believe that an UPDATE or DELETE (or REPLACE or  INSERT…ON DUPLICATE KEY UPDATE) on a parent table may require accessing a child table, to reject or to cascade the operation. Furthermore, I believe that any modification of the child table may require accessing the parent table.

            marko Marko Mäkelä added a comment - Related to this, I believe that an UPDATE or DELETE (or REPLACE or   INSERT…ON DUPLICATE KEY UPDATE ) on a parent table may require accessing a child table, to reject or to cascade the operation. Furthermore, I believe that any modification of the child table may require accessing the parent table.
            marko Marko Mäkelä made changes -
            Field Original Value New Value

            I think that we should aim for replacing InnoDB table locks with MDL.
            I believe that this problem would have to be resolved also for implementing FOREIGN KEY support in the SQL layer (MDEV-12483).

            marko Marko Mäkelä added a comment - I think that we should aim for replacing InnoDB table locks with MDL. I believe that this problem would have to be resolved also for implementing FOREIGN KEY support in the SQL layer ( MDEV-12483 ).
            marko Marko Mäkelä made changes -
            elenst Elena Stepanova made changes -
            Fix Version/s 10.5 [ 23123 ]
            marko Marko Mäkelä made changes -

            As part of this, some code from row_ins_check_foreign_constraints() should be removed. This should greatly speed up large INSERT operations or ALTER TABLE on tables that are connected by FOREIGN KEY:

            			dict_table_t*	referenced_table
            						= foreign->referenced_table;
             
            			if (referenced_table == NULL) {
             
            				ref_table = dict_table_open_on_name(
            					foreign->referenced_table_name_lookup,
            					FALSE, FALSE, DICT_ERR_IGNORE_NONE);
            			}
             
            			if (0 == trx->dict_operation_lock_mode) {
            				got_s_lock = TRUE;
             
            				row_mysql_freeze_data_dictionary(trx);
            			}
             
            			if (referenced_table) {
            				foreign->foreign_table->inc_fk_checks();
            			}
            

            Slightly related note: InnoDB used to break its own transactional table locks until a follow-up fix to MDEV-13407.

            marko Marko Mäkelä added a comment - As part of this, some code from row_ins_check_foreign_constraints() should be removed. This should greatly speed up large INSERT operations or ALTER TABLE on tables that are connected by FOREIGN KEY : dict_table_t* referenced_table = foreign->referenced_table;   if (referenced_table == NULL) {   ref_table = dict_table_open_on_name( foreign->referenced_table_name_lookup, FALSE, FALSE, DICT_ERR_IGNORE_NONE); }   if (0 == trx->dict_operation_lock_mode) { got_s_lock = TRUE;   row_mysql_freeze_data_dictionary(trx); }   if (referenced_table) { foreign->foreign_table->inc_fk_checks(); } Slightly related note: InnoDB used to break its own transactional table locks until a follow-up fix to MDEV-13407 .
            serg Sergei Golubchik made changes -

            It looks like fixing the prelocking code would allow us to remove dict_operation_lock (which in 10.5 was renamed to dict_sys.latch) altogether. The S-latch is currently only used in two cases in 10.5:

            • in some ALTER TABLE and FOREIGN KEY handling code, to work around the missing MDL
            • on ROLLBACK of recovered transactions.

            Fixing the ROLLBACK might be a little tricky, because for normal non-recovered transactions we should already be covered by MDL. (MDEV-16678 fixed purge of history to use MDL). If it is OK to attempt to acquire MDL while already holding a compatible MDL, then also that should be trivial to fix.

            marko Marko Mäkelä added a comment - It looks like fixing the prelocking code would allow us to remove dict_operation_lock (which in 10.5 was renamed to dict_sys.latch ) altogether. The S-latch is currently only used in two cases in 10.5: in some ALTER TABLE and FOREIGN KEY handling code, to work around the missing MDL on ROLLBACK of recovered transactions. Fixing the ROLLBACK might be a little tricky, because for normal non-recovered transactions we should already be covered by MDL. ( MDEV-16678 fixed purge of history to use MDL). If it is OK to attempt to acquire MDL while already holding a compatible MDL, then also that should be trivial to fix.
            ccalender Chris Calender (Inactive) made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            manjot Manjot Singh (Inactive) made changes -

            marko, I cannot fix prelocking blindly. Can you please add some assert to InnoDB that would fire unless prelocking works correctly? In other words, when assert is always true, you can remove dict_table_t::n_foreign_key_checks_running.

            After adding the assert, please reassign back to me.

            serg Sergei Golubchik added a comment - marko , I cannot fix prelocking blindly. Can you please add some assert to InnoDB that would fire unless prelocking works correctly? In other words, when assert is always true, you can remove dict_table_t::n_foreign_key_checks_running. After adding the assert, please reassign back to me.
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Marko Mäkelä [ marko ]
            manjot Manjot Singh (Inactive) made changes -
            julien.fritsch Julien Fritsch made changes -
            Affects Version/s 10.2 [ 14601 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.2 [ 14601 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            serg, I think that I will remove the InnoDB work-arounds and add an assertion that should catch the problems in row_drop_table_for_mysql(), so that you can fix the upper layer. This will likely also require fixing MDEV-21602 (make the error handling of CREATE…SELECT invoke ROLLBACK before invoking handler::delete_table).

            For TRUNCATE TABLE we might have to retain row_mysql_drop_list, but we will see about that (if we ever want to address MDEV-16306).

            marko Marko Mäkelä added a comment - serg , I think that I will remove the InnoDB work-arounds and add an assertion that should catch the problems in row_drop_table_for_mysql() , so that you can fix the upper layer. This will likely also require fixing MDEV-21602 (make the error handling of CREATE…SELECT invoke ROLLBACK before invoking handler::delete_table ). For TRUNCATE TABLE we might have to retain row_mysql_drop_list , but we will see about that (if we ever want to address MDEV-16306 ).
            marko Marko Mäkelä made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            serg, I pushed bb-10.5-MDEV-21175 as a starting point. I disabled all the failing tests with specific reasons:

            diff --git a/mysql-test/suite/binlog/disabled.def b/mysql-test/suite/binlog/disabled.def
            index 424e5549541..1576e21af52 100644
            --- a/mysql-test/suite/binlog/disabled.def
            +++ b/mysql-test/suite/binlog/disabled.def
            @@ -11,3 +11,5 @@
             ##############################################################################
             binlog_truncate_innodb : BUG#11764459 2010-10-20 anitha Originally disabled due to BUG#42643. Product bug fixed, but test changes needed
             binlog_spurious_ddl_errors : BUG#11761680 2013-01-18 astha Fixed on mysql-5.6 and trunk
            +binlog_stm_mix_innodb_myisam : WIP: MDEV-21175/MDEV-21602
            +binlog_row_mix_innodb_myisam : WIP: MDEV-21175/MDEV-21602
            diff --git a/mysql-test/suite/innodb/disabled.def b/mysql-test/suite/innodb/disabled.def
            index 4484417afce..6944c1bf034 100644
            --- a/mysql-test/suite/innodb/disabled.def
            +++ b/mysql-test/suite/innodb/disabled.def
            @@ -12,3 +12,5 @@
             
             create-index-debug : MDEV-13680 InnoDB may crash when btr_page_alloc() fails
             innodb_force_recovery_rollback : MDEV-22889 InnoDB occasionally breaks ACID
            +innodb_mysql : WIP: MDEV-21175/MDEV-21602
            +xa_recovery : WIP: MDEV-21175/MDEV-21602/MDEV-22769 (MDEV-742 regression)
            diff --git a/mysql-test/suite/rpl/disabled.def b/mysql-test/suite/rpl/disabled.def
            index 640c4b56cd0..78a3e962e97 100644
            --- a/mysql-test/suite/rpl/disabled.def
            +++ b/mysql-test/suite/rpl/disabled.def
            @@ -20,3 +20,9 @@ rpl_slave_grp_exec: MDEV-10514
             rpl_auto_increment_update_failure  : disabled for now
             rpl_current_user          : waits for MDEV-22374 fix
             rpl_parallel2             : waits for MDEV-23089
            +rpl_mixed_mixing_engines : WIP: MDEV-21175/MDEV-21602
            +rpl_non_direct_mixed_mixing_engines : WIP: MDEV-21175/MDEV-21602
            +rpl_non_direct_row_mixing_engines : WIP: MDEV-21175/MDEV-21602
            +rpl_non_direct_stm_mixing_engines : WIP: MDEV-21175/MDEV-21602
            +rpl_row_mixing_engines : WIP: MDEV-21175/MDEV-21602
            +rpl_stm_mixing_engines : WIP: MDEV-21175/MDEV-21602
            

            Once you have addressed these, I think that this will require a round of RQG testing.

            marko Marko Mäkelä added a comment - serg , I pushed bb-10.5-MDEV-21175 as a starting point. I disabled all the failing tests with specific reasons: diff --git a/mysql-test/suite/binlog/disabled.def b/mysql-test/suite/binlog/disabled.def index 424e5549541..1576e21af52 100644 --- a/mysql-test/suite/binlog/disabled.def +++ b/mysql-test/suite/binlog/disabled.def @@ -11,3 +11,5 @@ ############################################################################## binlog_truncate_innodb : BUG#11764459 2010-10-20 anitha Originally disabled due to BUG#42643. Product bug fixed, but test changes needed binlog_spurious_ddl_errors : BUG#11761680 2013-01-18 astha Fixed on mysql-5.6 and trunk +binlog_stm_mix_innodb_myisam : WIP: MDEV-21175/MDEV-21602 +binlog_row_mix_innodb_myisam : WIP: MDEV-21175/MDEV-21602 diff --git a/mysql-test/suite/innodb/disabled.def b/mysql-test/suite/innodb/disabled.def index 4484417afce..6944c1bf034 100644 --- a/mysql-test/suite/innodb/disabled.def +++ b/mysql-test/suite/innodb/disabled.def @@ -12,3 +12,5 @@ create-index-debug : MDEV-13680 InnoDB may crash when btr_page_alloc() fails innodb_force_recovery_rollback : MDEV-22889 InnoDB occasionally breaks ACID +innodb_mysql : WIP: MDEV-21175/MDEV-21602 +xa_recovery : WIP: MDEV-21175/MDEV-21602/MDEV-22769 (MDEV-742 regression) diff --git a/mysql-test/suite/rpl/disabled.def b/mysql-test/suite/rpl/disabled.def index 640c4b56cd0..78a3e962e97 100644 --- a/mysql-test/suite/rpl/disabled.def +++ b/mysql-test/suite/rpl/disabled.def @@ -20,3 +20,9 @@ rpl_slave_grp_exec: MDEV-10514 rpl_auto_increment_update_failure : disabled for now rpl_current_user : waits for MDEV-22374 fix rpl_parallel2 : waits for MDEV-23089 +rpl_mixed_mixing_engines : WIP: MDEV-21175/MDEV-21602 +rpl_non_direct_mixed_mixing_engines : WIP: MDEV-21175/MDEV-21602 +rpl_non_direct_row_mixing_engines : WIP: MDEV-21175/MDEV-21602 +rpl_non_direct_stm_mixing_engines : WIP: MDEV-21175/MDEV-21602 +rpl_row_mixing_engines : WIP: MDEV-21175/MDEV-21602 +rpl_stm_mixing_engines : WIP: MDEV-21175/MDEV-21602 Once you have addressed these, I think that this will require a round of RQG testing.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Sergei Golubchik [ serg ]
            marko Marko Mäkelä made changes -

            In my prototype, I forgot to remove some more code, but strangely no tests were failing due to that:

            diff --git a/storage/innobase/row/row0ins.cc b/storage/innobase/row/row0ins.cc
            --- a/storage/innobase/row/row0ins.cc
            +++ b/storage/innobase/row/row0ins.cc
            @@ -1395,16 +1395,6 @@ row_ins_foreign_check_on_constraint(
             	err = row_update_cascade_for_mysql(thr, cascade,
             					   foreign->foreign_table);
             
            -	/* Release the data dictionary latch for a while, so that we do not
            -	starve other threads from doing CREATE TABLE etc. if we have a huge
            -	cascaded operation running. */
            -
            -	row_mysql_unfreeze_data_dictionary(thr_get_trx(thr));
            -
            -	DEBUG_SYNC_C("innodb_dml_cascade_dict_unfreeze");
            -
            -	row_mysql_freeze_data_dictionary(thr_get_trx(thr));
            -
             	mtr_start(mtr);
             
             	/* Restore pcur position */
            

            If the tables are sufficiently protected by MDL, there is no need to hold the data dictionary latch to prevent concurrent DDL operations. At the start of this function, my fix already removed an assertion that we must be holding the latch.

            There is no public test (not even in MySQL) that would exercise the DEBUG_SYNC point.

            marko Marko Mäkelä added a comment - In my prototype, I forgot to remove some more code, but strangely no tests were failing due to that: diff --git a/storage/innobase/row/row0ins.cc b/storage/innobase/row/row0ins.cc --- a/storage/innobase/row/row0ins.cc +++ b/storage/innobase/row/row0ins.cc @@ -1395,16 +1395,6 @@ row_ins_foreign_check_on_constraint( err = row_update_cascade_for_mysql(thr, cascade, foreign->foreign_table); - /* Release the data dictionary latch for a while, so that we do not - starve other threads from doing CREATE TABLE etc. if we have a huge - cascaded operation running. */ - - row_mysql_unfreeze_data_dictionary(thr_get_trx(thr)); - - DEBUG_SYNC_C("innodb_dml_cascade_dict_unfreeze"); - - row_mysql_freeze_data_dictionary(thr_get_trx(thr)); - mtr_start(mtr); /* Restore pcur position */ If the tables are sufficiently protected by MDL, there is no need to hold the data dictionary latch to prevent concurrent DDL operations. At the start of this function, my fix already removed an assertion that we must be holding the latch. There is no public test (not even in MySQL) that would exercise the DEBUG_SYNC point.
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            serg Sergei Golubchik added a comment - - edited

            marko, I don't understand. Tests that you've disabled and that crash in your prototype branch are tests like

            CREATE TABLE t1 (a int, b int) engine=myisam;
            INSERT INTO t1 values (1,1),(1,2);
            --error ER_DUP_ENTRY
            CREATE TABLE t2 (primary key (a)) engine=innodb select * from t1;
            

            How is that related to the problem at hand, to MDL or foreign key referenced tables?

            And the test from the bug description, that shows that an fk-referenced table does not have a proper MDL, this does does not crash in your prototype branch.

            serg Sergei Golubchik added a comment - - edited marko , I don't understand. Tests that you've disabled and that crash in your prototype branch are tests like CREATE TABLE t1 (a int , b int ) engine=myisam; INSERT INTO t1 values (1,1),(1,2); --error ER_DUP_ENTRY CREATE TABLE t2 ( primary key (a)) engine=innodb select * from t1; How is that related to the problem at hand, to MDL or foreign key referenced tables? And the test from the bug description, that shows that an fk-referenced table does not have a proper MDL, this does does not crash in your prototype branch.
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Marko Mäkelä [ marko ]

            Related to what I wrote earlier, MDEV-23484 removed unnecessary operations on dict_operation_lock on ROLLBACK.

            serg, my earlier comment in this ticket tried to explain that also MDEV-21602 is breaking some rules. Did you read the commit message that it linked to?

            MDEV-21602 CREATE TABLE…PRIMARY KEY…SELECT workaround
            causes DROP TABLE to ignore locks

            The error handling of CREATE…SELECT would invoke
            handler::delete_table() while still holding locks
            on the table, due to not having invoked handlerton::rollback
            first. InnoDB used to work around this as well.
            In MDEV-742 this was worked around further by breaking
            MDL, causing MDEV-22733.

            Unfortunately, thiru did not include enough details to reproduce the originally reported problem. At least, we would have to know the source code revision. Based on the filing date, I would guess that it is somewhere around 5a00792c69065140a34315e3c5aec00bdbef8447.

            marko Marko Mäkelä added a comment - Related to what I wrote earlier, MDEV-23484 removed unnecessary operations on dict_operation_lock on ROLLBACK . serg , my earlier comment in this ticket tried to explain that also MDEV-21602 is breaking some rules. Did you read the commit message that it linked to? MDEV-21602 CREATE TABLE…PRIMARY KEY…SELECT workaround causes DROP TABLE to ignore locks The error handling of CREATE…SELECT would invoke handler::delete_table() while still holding locks on the table, due to not having invoked handlerton::rollback first. InnoDB used to work around this as well. In MDEV-742 this was worked around further by breaking MDL, causing MDEV-22733 . Unfortunately, thiru did not include enough details to reproduce the originally reported problem. At least, we would have to know the source code revision. Based on the filing date, I would guess that it is somewhere around 5a00792c69065140a34315e3c5aec00bdbef8447.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Sergei Golubchik [ serg ]
            marko Marko Mäkelä made changes -
            thiru Thirunarayanan Balathandayuthapani made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Daniel Bartholomew [ dbart ]
            serg Sergei Golubchik made changes -
            Assignee Daniel Bartholomew [ dbart ] Sergei Golubchik [ serg ]
            serg Sergei Golubchik made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            marko Marko Mäkelä made changes -

            I think that the field can be safely removed in the third part of the MDEV-25506 fix (rewriting DROP operations).

            marko Marko Mäkelä added a comment - I think that the field can be safely removed in the third part of the MDEV-25506 fix (rewriting DROP operations).
            marko Marko Mäkelä made changes -

            This was fixed in MDEV-25506 part 3.

            marko Marko Mäkelä added a comment - This was fixed in MDEV-25506 part 3.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2021-06-09 15:43:06.0 2021-06-09 15:43:06.984
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.2 [ 25800 ]
            Fix Version/s 10.5 [ 23123 ]
            Assignee Sergei Golubchik [ serg ] Marko Mäkelä [ marko ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 101477 ] MariaDB v4 [ 157028 ]
            marko Marko Mäkelä made changes -
            Jaskaran Jaskaran Singh made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658 196904 109080 184471
            Zendesk active tickets 201658

            People

              marko Marko Mäkelä
              thiru Thirunarayanan Balathandayuthapani
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.