[MDEV-21175] Remove dict_table_t::n_foreign_key_checks_running from InnoDB Created: 2019-11-29  Updated: 2023-12-22  Resolved: 2021-06-09

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Insert
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.6.2

Type: Bug Priority: Major
Reporter: Thirunarayanan Balathandayuthapani Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-21283 InnoDB: MySQL is trying to drop datab... Closed
blocks MDEV-25500 Assertion `table_share->tmp_table != ... Closed
is blocked by MDEV-21602 CREATE TABLE…PRIMARY KEY…SELECT worka... Closed
Relates
relates to MDEV-12483 Add foreign keys support for partitio... Stalled
relates to MDEV-18421 Server crash or assertion `foreign->f... Stalled
relates to MDEV-20612 Improve InnoDB lock_sys scalability Closed
relates to MDEV-22180 Planner opens unnecessary tables when... Closed
relates to MDEV-23484 Rollback unnecessarily acquires dict_... Closed
relates to MDEV-25506 Atomic DDL: .frm file is removed and ... Closed
relates to MDEV-26554 Table-rebuilding DDL on parent table ... Closed
relates to MDEV-33104 Assertion `table.get_ref_count() <= 1... Closed
relates to MDEV-16678 Use MDL for innodb background threads... Closed
relates to MDEV-22788 SUMMARY: AddressSanitizer: heap-use-a... Confirmed
relates to MDEV-24225 Long semaphore wait in dict0dict.cc l... Closed

 Description   

InnoDB should clean n_foreign_key_checks usage and assume that SQL is already holding
appropriate MDL. But an Insert into the child table doesn't seem to acquire MDL on the parent. When vice-versa is happening that there is MDL on the child when insert is happening
for the parent table.

The following test case could repeat the issue:

--source include/have_innodb.inc
create table t1(f1 int not null, primary key(f1))engine=innodb;
create table t2(f1 int not null primary key,
                f2 int not null, foreign key(f2) references t1(f1))engine=innodb;
set DEBUG_SYNC='ib_after_row_insert_step SIGNAL alter WAIT_FOR dml';
--send insert into t1 values(1)
connect(con1,localhost,root,,,);
set DEBUG_SYNC="now WAIT_FOR alter";
alter table t2 force;
set DEBUG_SYNC="now SIGNAL dml";
connection default;
reap;
drop table t2, t1;
disconnect con1;

In above test case, ALTER TABLE t2 FORCE waits for MDL to open the table.

Another test case:

--source include/have_innodb.inc
create table t1(f1 int not null, primary key(f1))engine=innodb;
create table t2(f1 int not null primary key,
                f2 int not null, foreign key(f2) references t1(f1))engine=innodb;
set DEBUG_SYNC='ib_after_row_insert_step SIGNAL alter WAIT_FOR dml';
--send insert into t2 values(1, 2)
connect(con1,localhost,root,,,);
set DEBUG_SYNC="now WAIT_FOR alter";
alter table t1 force;
set DEBUG_SYNC="now SIGNAL dml";
connection default;
reap;
drop table t2, t1;
disconnect con1;

In this case, alter waits at commit_inplace_alter_table(). That too for innodb table lock.

#6  0x00005555565df591 in lock_wait_suspend_thread (thr=0x7fffb8030768)
    at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/lock/lock0wait.cc:351
#7  0x00005555566bb378 in row_mysql_handle_errors (new_err=0x7ffff01c3844,
    trx=0x7ffff0cc3268, thr=0x7fffb8030768, savept=0x0)
    at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/row/row0mysql.cc:740
#8  0x00005555565cd2f7 in lock_table_for_trx (table=0x7fffc41317e8,
    trx=0x7ffff0cc3268, mode=LOCK_X)
    at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/lock/lock0lock.cc:4070
#9  0x00005555566ad12d in row_merge_lock_table (trx=0x7ffff0cc3268,
    table=0x7fffc41317e8, mode=LOCK_X)
    at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/row/row0merge.cc:3698
#10 0x0000555556568d42 in ha_innobase::commit_inplace_alter_table (this=
    0x7fffb8020c08, altered_table=0x7ffff01c4fb0,
    ha_alter_info=0x7ffff01c4f20, commit=true)
    at /home/thiru/mariarepo/10.5/10.5-work/storage/innobase/handler/handler0alter.cc:10699
#11 0x000055555611bf0a in handler::ha_commit_inplace_alter_table (
    this=0x7fffb8020c08, altered_table=0x7ffff01c4fb0,
    ha_alter_info=0x7ffff01c4f20, commit=true

First case is acquiring MDL on child table (un-necessary locking)
Second case should do MDL on parent table.



 Comments   
Comment by Marko Mäkelä [ 2019-11-29 ]

Related to this, I believe that an UPDATE or DELETE (or REPLACE or  INSERT…ON DUPLICATE KEY UPDATE) on a parent table may require accessing a child table, to reject or to cascade the operation. Furthermore, I believe that any modification of the child table may require accessing the parent table.

Comment by Marko Mäkelä [ 2019-11-29 ]

I think that we should aim for replacing InnoDB table locks with MDL.
I believe that this problem would have to be resolved also for implementing FOREIGN KEY support in the SQL layer (MDEV-12483).

Comment by Marko Mäkelä [ 2020-02-01 ]

As part of this, some code from row_ins_check_foreign_constraints() should be removed. This should greatly speed up large INSERT operations or ALTER TABLE on tables that are connected by FOREIGN KEY:

			dict_table_t*	referenced_table
						= foreign->referenced_table;
 
			if (referenced_table == NULL) {
 
				ref_table = dict_table_open_on_name(
					foreign->referenced_table_name_lookup,
					FALSE, FALSE, DICT_ERR_IGNORE_NONE);
			}
 
			if (0 == trx->dict_operation_lock_mode) {
				got_s_lock = TRUE;
 
				row_mysql_freeze_data_dictionary(trx);
			}
 
			if (referenced_table) {
				foreign->foreign_table->inc_fk_checks();
			}

Slightly related note: InnoDB used to break its own transactional table locks until a follow-up fix to MDEV-13407.

Comment by Marko Mäkelä [ 2020-04-23 ]

It looks like fixing the prelocking code would allow us to remove dict_operation_lock (which in 10.5 was renamed to dict_sys.latch) altogether. The S-latch is currently only used in two cases in 10.5:

  • in some ALTER TABLE and FOREIGN KEY handling code, to work around the missing MDL
  • on ROLLBACK of recovered transactions.

Fixing the ROLLBACK might be a little tricky, because for normal non-recovered transactions we should already be covered by MDL. (MDEV-16678 fixed purge of history to use MDL). If it is OK to attempt to acquire MDL while already holding a compatible MDL, then also that should be trivial to fix.

Comment by Sergei Golubchik [ 2020-05-06 ]

marko, I cannot fix prelocking blindly. Can you please add some assert to InnoDB that would fire unless prelocking works correctly? In other words, when assert is always true, you can remove dict_table_t::n_foreign_key_checks_running.

After adding the assert, please reassign back to me.

Comment by Marko Mäkelä [ 2020-07-24 ]

serg, I think that I will remove the InnoDB work-arounds and add an assertion that should catch the problems in row_drop_table_for_mysql(), so that you can fix the upper layer. This will likely also require fixing MDEV-21602 (make the error handling of CREATE…SELECT invoke ROLLBACK before invoking handler::delete_table).

For TRUNCATE TABLE we might have to retain row_mysql_drop_list, but we will see about that (if we ever want to address MDEV-16306).

Comment by Marko Mäkelä [ 2020-07-24 ]

serg, I pushed bb-10.5-MDEV-21175 as a starting point. I disabled all the failing tests with specific reasons:

diff --git a/mysql-test/suite/binlog/disabled.def b/mysql-test/suite/binlog/disabled.def
index 424e5549541..1576e21af52 100644
--- a/mysql-test/suite/binlog/disabled.def
+++ b/mysql-test/suite/binlog/disabled.def
@@ -11,3 +11,5 @@
 ##############################################################################
 binlog_truncate_innodb : BUG#11764459 2010-10-20 anitha Originally disabled due to BUG#42643. Product bug fixed, but test changes needed
 binlog_spurious_ddl_errors : BUG#11761680 2013-01-18 astha Fixed on mysql-5.6 and trunk
+binlog_stm_mix_innodb_myisam : WIP: MDEV-21175/MDEV-21602
+binlog_row_mix_innodb_myisam : WIP: MDEV-21175/MDEV-21602
diff --git a/mysql-test/suite/innodb/disabled.def b/mysql-test/suite/innodb/disabled.def
index 4484417afce..6944c1bf034 100644
--- a/mysql-test/suite/innodb/disabled.def
+++ b/mysql-test/suite/innodb/disabled.def
@@ -12,3 +12,5 @@
 
 create-index-debug : MDEV-13680 InnoDB may crash when btr_page_alloc() fails
 innodb_force_recovery_rollback : MDEV-22889 InnoDB occasionally breaks ACID
+innodb_mysql : WIP: MDEV-21175/MDEV-21602
+xa_recovery : WIP: MDEV-21175/MDEV-21602/MDEV-22769 (MDEV-742 regression)
diff --git a/mysql-test/suite/rpl/disabled.def b/mysql-test/suite/rpl/disabled.def
index 640c4b56cd0..78a3e962e97 100644
--- a/mysql-test/suite/rpl/disabled.def
+++ b/mysql-test/suite/rpl/disabled.def
@@ -20,3 +20,9 @@ rpl_slave_grp_exec: MDEV-10514
 rpl_auto_increment_update_failure  : disabled for now
 rpl_current_user          : waits for MDEV-22374 fix
 rpl_parallel2             : waits for MDEV-23089
+rpl_mixed_mixing_engines : WIP: MDEV-21175/MDEV-21602
+rpl_non_direct_mixed_mixing_engines : WIP: MDEV-21175/MDEV-21602
+rpl_non_direct_row_mixing_engines : WIP: MDEV-21175/MDEV-21602
+rpl_non_direct_stm_mixing_engines : WIP: MDEV-21175/MDEV-21602
+rpl_row_mixing_engines : WIP: MDEV-21175/MDEV-21602
+rpl_stm_mixing_engines : WIP: MDEV-21175/MDEV-21602

Once you have addressed these, I think that this will require a round of RQG testing.

Comment by Marko Mäkelä [ 2020-08-14 ]

In my prototype, I forgot to remove some more code, but strangely no tests were failing due to that:

diff --git a/storage/innobase/row/row0ins.cc b/storage/innobase/row/row0ins.cc
--- a/storage/innobase/row/row0ins.cc
+++ b/storage/innobase/row/row0ins.cc
@@ -1395,16 +1395,6 @@ row_ins_foreign_check_on_constraint(
 	err = row_update_cascade_for_mysql(thr, cascade,
 					   foreign->foreign_table);
 
-	/* Release the data dictionary latch for a while, so that we do not
-	starve other threads from doing CREATE TABLE etc. if we have a huge
-	cascaded operation running. */
-
-	row_mysql_unfreeze_data_dictionary(thr_get_trx(thr));
-
-	DEBUG_SYNC_C("innodb_dml_cascade_dict_unfreeze");
-
-	row_mysql_freeze_data_dictionary(thr_get_trx(thr));
-
 	mtr_start(mtr);
 
 	/* Restore pcur position */

If the tables are sufficiently protected by MDL, there is no need to hold the data dictionary latch to prevent concurrent DDL operations. At the start of this function, my fix already removed an assertion that we must be holding the latch.

There is no public test (not even in MySQL) that would exercise the DEBUG_SYNC point.

Comment by Sergei Golubchik [ 2020-10-03 ]

marko, I don't understand. Tests that you've disabled and that crash in your prototype branch are tests like

CREATE TABLE t1 (a int, b int) engine=myisam;
INSERT INTO t1 values (1,1),(1,2);
--error ER_DUP_ENTRY
CREATE TABLE t2 (primary key (a)) engine=innodb select * from t1;

How is that related to the problem at hand, to MDL or foreign key referenced tables?

And the test from the bug description, that shows that an fk-referenced table does not have a proper MDL, this does does not crash in your prototype branch.

Comment by Marko Mäkelä [ 2020-10-05 ]

Related to what I wrote earlier, MDEV-23484 removed unnecessary operations on dict_operation_lock on ROLLBACK.

serg, my earlier comment in this ticket tried to explain that also MDEV-21602 is breaking some rules. Did you read the commit message that it linked to?

MDEV-21602 CREATE TABLE…PRIMARY KEY…SELECT workaround
causes DROP TABLE to ignore locks

The error handling of CREATE…SELECT would invoke
handler::delete_table() while still holding locks
on the table, due to not having invoked handlerton::rollback
first. InnoDB used to work around this as well.
In MDEV-742 this was worked around further by breaking
MDL, causing MDEV-22733.

Unfortunately, thiru did not include enough details to reproduce the originally reported problem. At least, we would have to know the source code revision. Based on the filing date, I would guess that it is somewhere around 5a00792c69065140a34315e3c5aec00bdbef8447.

Comment by Marko Mäkelä [ 2021-05-29 ]

I think that the field can be safely removed in the third part of the MDEV-25506 fix (rewriting DROP operations).

Comment by Marko Mäkelä [ 2021-06-09 ]

This was fixed in MDEV-25506 part 3.

Generated at Thu Feb 08 09:05:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.