[MDEV-24049] InnoDB: Failing assertion: node->is_open() in fil_space_t::flush_low Created: 2020-10-28  Updated: 2020-10-29  Resolved: 2020-10-29

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - InnoDB
Affects Version/s: 10.5
Fix Version/s: 10.5.7

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Problem/Incident
is caused by MDEV-23855 InnoDB log checkpointing causes regre... Closed

 Description   

--source include/have_innodb.inc
--source include/have_partition.inc
 
CREATE TABLE t1 (
  pk INT,
  a TEXT,
  b TEXT,
  c TEXT,
  d TEXT,
  PRIMARY KEY(pk),
  FULLTEXT INDEX (a)
) ENGINE=InnoDB;
CREATE TABLE t2 (f INT) ENGINE=InnoDB;
ALTER TABLE t1 ADD FULLTEXT INDEX (b), ADD FULLTEXT INDEX (c);
FLUSH TABLE t1 WITH READ LOCK;
UNLOCK TABLES;
ALTER TABLE t1 ADD FULLTEXT INDEX (d);
ALTER TABLE t1 PARTITION BY HASH(pk) PARTITIONS 13;
FLUSH TABLE t2 WITH READ LOCK;
 
# Cleanup
UNLOCK TABLES;
DROP TABLE t1, t2;

10.5 2cec0523

2020-10-29 00:04:22 0x7ffb2cb08700  InnoDB: Assertion failure in file /data/src/10.5/storage/innobase/fil/fil0fil.cc line 517
InnoDB: Failing assertion: node->is_open()
 
#5  0x00007ffb3374e859 in __GI_abort () at abort.c:79
#6  0x00005590f1bae1a0 in ut_dbg_assertion_failed (expr=0x5590f231d714 "node->is_open()", file=0x5590f231d4b0 "/data/src/10.5/storage/innobase/fil/fil0fil.cc", line=517) at /data/src/10.5/storage/innobase/ut/ut0dbg.cc:60
#7  0x00005590f1cba900 in fil_space_t::flush_low (this=0x7ffaf0225b48) at /data/src/10.5/storage/innobase/fil/fil0fil.cc:517
#8  0x00005590f1b3514b in fil_space_t::flush (this=0x7ffaf0225b48) at /data/src/10.5/storage/innobase/include/fil0fil.h:1466
#9  0x00005590f1b34520 in row_quiesce_table_start (table=0x7ffaf0205b88, trx=0x7ffb2ddb21e8) at /data/src/10.5/storage/innobase/row/row0quiesce.cc:548
#10 0x00005590f192f638 in ha_innobase::external_lock (this=0x7ffaf005e1d0, thd=0x7ffaf0000db8, lock_type=0) at /data/src/10.5/storage/innobase/handler/ha_innodb.cc:15656
#11 0x00005590f14b04e2 in handler::ha_external_lock (this=0x7ffaf005e1d0, thd=0x7ffaf0000db8, lock_type=0) at /data/src/10.5/sql/handler.cc:6669
#12 0x00005590f15f66d7 in lock_external (thd=0x7ffaf0000db8, tables=0x7ffaf00fa638, count=1) at /data/src/10.5/sql/lock.cc:393
#13 0x00005590f15f637e in mysql_lock_tables (thd=0x7ffaf0000db8, sql_lock=0x7ffaf00fa618, flags=4096) at /data/src/10.5/sql/lock.cc:338
#14 0x00005590f15f6232 in mysql_lock_tables (thd=0x7ffaf0000db8, tables=0x7ffaf00146d0, count=1, flags=4096) at /data/src/10.5/sql/lock.cc:301
#15 0x00005590f1090922 in lock_tables (thd=0x7ffaf0000db8, tables=0x7ffaf0013ff8, count=1, flags=4096) at /data/src/10.5/sql/sql_base.cc:5472
#16 0x00005590f108fda0 in open_and_lock_tables (thd=0x7ffaf0000db8, options=..., tables=0x7ffaf0013ff8, derived=false, flags=4096, prelocking_strategy=0x7ffb2cb06cf0) at /data/src/10.5/sql/sql_base.cc:5183
#17 0x00005590f131f478 in open_and_lock_tables (thd=0x7ffaf0000db8, tables=0x7ffaf0013ff8, derived=false, flags=4096, prelocking_strategy=0x7ffb2cb06cf0) at /data/src/10.5/sql/sql_base.h:276
#18 0x00005590f13205d8 in flush_tables_with_read_lock (thd=0x7ffaf0000db8, all_tables=0x7ffaf0013ff8) at /data/src/10.5/sql/sql_reload.cc:576
#19 0x00005590f113b8cb in mysql_execute_command (thd=0x7ffaf0000db8) at /data/src/10.5/sql/sql_parse.cc:5391
#20 0x00005590f1143dbe in mysql_parse (thd=0x7ffaf0000db8, rawbuf=0x7ffaf0013f20 "FLUSH TABLE t2 WITH READ LOCK", length=29, parser_state=0x7ffb2cb07510, is_com_multi=false, is_next_command=false) at /data/src/10.5/sql/sql_parse.cc:8010
#21 0x00005590f1130112 in dispatch_command (command=COM_QUERY, thd=0x7ffaf0000db8, packet=0x7ffaf0009099 "FLUSH TABLE t2 WITH READ LOCK", packet_length=29, is_com_multi=false, is_next_command=false) at /data/src/10.5/sql/sql_parse.cc:1871
#22 0x00005590f112e906 in do_command (thd=0x7ffaf0000db8) at /data/src/10.5/sql/sql_parse.cc:1352
#23 0x00005590f12dae63 in do_handle_one_connection (connect=0x5590f3f6aae8, put_in_cache=true) at /data/src/10.5/sql/sql_connect.cc:1410
#24 0x00005590f12dabc6 in handle_one_connection (arg=0x5590f3f84568) at /data/src/10.5/sql/sql_connect.cc:1312
#25 0x00005590f183814f in pfs_spawn_thread (arg=0x5590f3f6a728) at /data/src/10.5/storage/perfschema/pfs.cc:2201
#26 0x00007ffb33c77609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#27 0x00007ffb3384b293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Debug, release and ASAN builds are affected the same way.
Reproducible on 10.5. Not reproducible on 10.4.
The failure appeared in 10.5 after this commit:

commit 118e258aaac5da75a2ac4556201aaea3688fac67
Author: Marko Mäkelä <marko.makela@mariadb.com>
Date:   Mon Oct 26 16:04:12 2020 +0200
 
    MDEV-23855: Shrink fil_space_t



 Comments   
Comment by Marko Mäkelä [ 2020-10-29 ]

This bug is related to the removal of fil_system.LRU in MDEV-23855, which was motivated by reducing contention on fil_system.mutex.

Under rr record, it took several attempts to reproduce this. The file for t2.ibd had been closed during the execution of

ALTER TABLE t1 PARTITION BY HASH(pk) PARTITIONS 13;

which would rebuild the table and its numerous internal FTS_ tables.

For some reason, no matter what I try, innodb_open_files will always be 421 on my system. Here is a simpler test case:

--source include/have_innodb.inc
--source include/have_partition.inc
 
CREATE TABLE t2 (f INT) ENGINE=InnoDB;
CREATE TABLE t1 (pk INT PRIMARY KEY, t TEXT) ENGINE=InnoDB
PARTITION BY HASH(pk) PARTITIONS 420;
FLUSH TABLE t2 WITH READ LOCK;
UNLOCK TABLES;
DROP TABLE t1, t2;

Comment by Marko Mäkelä [ 2020-10-29 ]

I believe that this should be the simplest possible fix:

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index d234fed4e89..c515cd82c48 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -514,7 +514,11 @@ inline void fil_space_t::flush_low()
   for (fil_node_t *node= UT_LIST_GET_FIRST(chain); node;
        node= UT_LIST_GET_NEXT(chain, node))
   {
-    ut_a(node->is_open());
+    if (!node->is_open())
+    {
+      ut_ad(!is_in_unflushed_spaces);
+      continue;
+    }
     IF_WIN(if (node->is_raw_disk) continue,);
     os_file_flush(node->handle);
   }

Comment by Marko Mäkelä [ 2020-10-29 ]

It turns out that there is some rather convoluted logic around innodb_open_files. If you set it to anything smaller than 10, then we will roll dice:

	if (innobase_open_files < 10) {
		innobase_open_files = 300;
		if (srv_file_per_table && tc_size > 300 && tc_size < open_files_limit) {
			innobase_open_files = tc_size;
		}
	}
 
	if (innobase_open_files > open_files_limit) {
		ib::warn() << "innodb_open_files " << innobase_open_files
			   << " should not be greater"
			   << " than the open_files_limit " << open_files_limit;
		if (innobase_open_files > tc_size) {
			innobase_open_files = tc_size;
		}
	}

Due to this, the minimum deterministic start-up value of innodb_open_files is 10. Here is a minimal test case:

diff --git a/mysql-test/suite/innodb/t/flush.opt b/mysql-test/suite/innodb/t/flush.opt
new file mode 100644
index 00000000000..99dbd8f47c3
--- /dev/null
+++ b/mysql-test/suite/innodb/t/flush.opt
@@ -0,0 +1 @@
+--innodb-open-files=10
diff --git a/mysql-test/suite/innodb/t/flush.test b/mysql-test/suite/innodb/t/flush.test
new file mode 100644
index 00000000000..e03728825cf
--- /dev/null
+++ b/mysql-test/suite/innodb/t/flush.test
@@ -0,0 +1,19 @@
+--source include/have_innodb.inc
+
+SELECT @@GLOBAL.innodb_open_files;
+
+CREATE TABLE t0 (a INT) ENGINE=InnoDB;
+# Ensure that the created table t0 is clean.
+FLUSH TABLE t0 WITH READ LOCK;
+UNLOCK TABLES;
+CREATE TABLE t1 (a INT) ENGINE=InnoDB;
+CREATE TABLE t2 (a INT) ENGINE=InnoDB;
+CREATE TABLE t3 (a INT) ENGINE=InnoDB;
+CREATE TABLE t4 (a INT) ENGINE=InnoDB;
+CREATE TABLE t5 (a INT) ENGINE=InnoDB;
+CREATE TABLE t6 (a INT) ENGINE=InnoDB;
+CREATE TABLE t7 (a INT) ENGINE=InnoDB;
+# Now, the data file for t0 should not be open anymore.
+FLUSH TABLE t0 WITH READ LOCK;
+UNLOCK TABLES;
+DROP TABLE t0, t1, t2, t3, t4, t5, t6, t7;

Generated at Thu Feb 08 09:27:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.