[MDEV-31152] Assertion `!bpage' failed in buf_flush_try_neighbors Created: 2023-04-29  Updated: 2023-05-12  Resolved: 2023-05-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.11, 11.0, 11.1, 11.2
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: not-10.7, regression, sporadic


 Description   

I am creating this as a placeholder for a bug which I have seen a number of times lately (could be a new regression). Testcase reduction is complex as the bug is highly sporadic. Best testcase is still 8k lines long. No rr trace. core dump available. marko FYI.

11.2.0 5aebc0e1955580cea07a00da20097216afb12b6b (Debug)

mysqld: /test/11.2_dbg/storage/innobase/buf/buf0flu.cc:1085: ulint buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, ulint, ulint): Assertion `!bpage' failed.

11.2.0 5aebc0e1955580cea07a00da20097216afb12b6b (Debug)

Core was generated by `/test/MD220423-mariadb-11.2.0-linux-x86_64-dbg/bin/mysqld --no-defaults --max_a'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22834900121152)
    at ./nptl/pthread_kill.c:44
[Current thread is 1 (Thread 0x14c4aa1c3640 (LWP 3083819))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22834900121152) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=22834900121152) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=22834900121152, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x000014c4ca14f476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x000014c4ca1357f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000014c4ca13571b in __assert_fail_base (fmt=0x14c4ca2ea150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55e52e5e0aef "!bpage", file=0x55e52e5df810 "/test/11.2_dbg/storage/innobase/buf/buf0flu.cc", line=1085, function=<optimized out>) at ./assert/assert.c:92
#6  0x000014c4ca146e96 in __GI___assert_fail (assertion=0x55e52e5e0aef "!bpage", file=0x55e52e5df810 "/test/11.2_dbg/storage/innobase/buf/buf0flu.cc", line=1085, function=0x55e52e5e0228 "ulint buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, ulint, ulint)") at ./assert/assert.c:101
#7  0x000055e52e07d5b3 in buf_flush_try_neighbors (space=space@entry=0x14c4084ec650, page_id=page_id@entry={m_id = 1168231104513}, bpage=bpage@entry=0x14c4bb4da020, contiguous=contiguous@entry=true, evict=evict@entry=false, n_flushed=n_flushed@entry=69, n_to_flush=300) at /test/11.2_dbg/storage/innobase/buf/buf0flu.cc:1085
#8  0x000055e52e07ffce in buf_do_flush_list_batch (lsn=18446744073709551615, max_n=300) at /test/11.2_dbg/storage/innobase/buf/buf0flu.cc:1433
#9  buf_flush_list_holding_mutex (max_n=max_n@entry=300, lsn=lsn@entry=18446744073709551615) at /test/11.2_dbg/storage/innobase/buf/buf0flu.cc:1507
#10 0x000055e52e082615 in buf_flush_page_cleaner () at /test/11.2_dbg/storage/innobase/buf/buf0flu.cc:2486
#11 0x000055e52e082a91 in std::__invoke_impl<void, void (*)()> (__f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:60
#12 std::__invoke<void (*)()> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
#13 std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:253
#14 std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:260
#15 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
#16 0x000014c4ca4fa2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x000014c4ca1a1b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#18 0x000014c4ca233a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81



 Comments   
Comment by Roel Van de Paar [ 2023-04-29 ]

I was able to reduce one testcase down to 61 lines. It is still very convoluted with the original query outcomes (";# ....") that may no longer be correct (testcase reduction regularly changes query outcomes when lines are removed), extraneous idioms and statements etc.

However, what is sure is that this testcase can produce the given crash (it is not clear yet if a C-based client is required (i.e. rather than CLI), it looks like it is). Also ref note below re: innodb_max_dirty_pages_pct.

The bug requires only a single SQL input thread. Note that this testcase is highly sporadic (perhaps 1 in 10000, if that).

Options used during the run (most of these would not be required to reproduce the issue. Note smaller buffer pool.

--max_allowed_packet=33554432 --maximum-bulk_insert_buffer_size=1M --maximum-join_buffer_size=1M --maximum-max_heap_table_size=1M --maximum-max_join_size=1M --maximum-myisam_max_sort_file_size=1M --maximum-myisam_mmap_size=1M --maximum-myisam_sort_buffer_size=1M --maximum-optimizer_trace_max_mem_size=1M --maximum-preload_buffer_size=1M --maximum-query_alloc_block_size=1M --maximum-query_prealloc_size=1M --maximum-range_alloc_block_size=1M --maximum-read_buffer_size=1M --maximum-read_rnd_buffer_size=1M --maximum-sort_buffer_size=1M --maximum-tmp_table_size=1M --maximum-transaction_alloc_block_size=1M --maximum-transaction_prealloc_size=1M --log-output=none --sql_mode= --sql_mode= --deadlock-timeout-short=10 --deadlock-timeout-long=10 --deadlock-search-depth-short=10 --deadlock-search-depth-long=33 --innodb_change_buffering=inserts --innodb_fast_shutdown=1 --innodb_file_per_table=1 --log-bin --binlog_format=ROW --log_bin_trust_function_creators=1 --innodb-buffer-pool-size=300M --loose-debug_assert_on_not_freed_memory=1 --innodb_use_native_aio=1

Note the innodb_max_dirty_pages_pct setting towards the end is almost definitely related to the assert. Testcase:

create TABLE t1 ( id int unsigned auto_increment, primary key(id) ) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"' COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';#ERROR: 2014 - Commands out of sync; you can't run this command now
CREATE TABLE IF NOT EXISTS help_category ( help_category_id smallint unsigned not null, name  char(64) not null, parent_category_id smallint unsigned null, url char(128) not null, primary key (help_category_id), unique index (name) ) engine=InnoDB CHARACTER SET utf8   comment='help categories';#ERROR: 2014 - Commands out of sync; you can't run this command now
INSERT INTO t1 VALUES (8484+0.75);#NOERROR
DROP TABLE IF EXISTS t1;#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the  ACTIVE state
CREATE TABLE t2 (a int);#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the  ACTIVE state
INSERT INTO t2 VALUES ('m7l,Fe]LbxoGHr~miZgfL"Q;y&R=RXH![6TY^UG!FX8o[gRVe%,?:a0~7azQ:C=nHGnIi<#h{SCyIl*zPF]Oo+lGI,CI$4=$~Li:JG7*/u;z>n>[x&:1*2R1:J!Pmg`tuDXqY*>:ds:9<0zI]jI2Ve@V+^`Z:BQ=l"Dht0:%Jv8Q&S>-i`P_7CK9QmGWC<K<dSeap;>Ow[}(n"):mz,"fA?C;VEIW=dMgT>i7R`[T>%uQNERuHlW1Xl&S"H>ACUQ~Ssyi~4M>^Z)[U9?)CBY+BrR4Qr~3>0haKv8,(,J(D;e:WM)41jZ}g],q/','hE_1z9A:>jfBaR@K5:D{SqY@t*5!JM`n)a59OS=8aZbVr;J$q0D;^jKpFvjFa>3l5U:WiB%]TmK.#;#3&Sf?"g5lQw#v+}%#X<BF}Y/L^@~o]=5KBEgPOkdrsk?*u!@w"]UVXBM^p9bbT~`&nZzT6}QPm-H(rSeuT-D+QOQBKT9(es;xT1;Pk:L)`0pr71!TX%`u"f`,/v[jQrWK?3xsQ{]b;ssx3GQq6V:Y_ZiJ@ioE[u9Mmrl>q)D6*aqveYAL~04F%lIv+tzdIC,=vJFJY@U=Q3*(Cmhud{#ewm8@lk=y3L30[_UMJ)7AEn~vMm~~omoJ;LR/.pO8XbPq6r1uxGIC2&m>nv*jX/>yN+$".gse9<)KDnq1CPh*aYM~Pki;+[{^ooSD4Wgq,9,*zs)d$0@xy;.W;bCgulpNfc_HO4#;PbV}I-','UIJ/,1"VZ;[SOnDi>n3O]#mb9q^*kd"U%a/=x6WEd6!3r^6NI&>4-e)Kylj$5L4!^.SAEmfU+OuM>}^N*pN:uFjnim*rB?g1F(mF1/,obz&b_"-+D2');#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the  IDLE state
CREATE TABLE t1 (s1 VARCHAR (60) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_520_CI) ENGINE=FEDERATED COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the  IDLE state
CREATE TEMPORARY TABLE t1 (c1 INT) ENGINE=RocksDB;#ERROR: 2014 - Commands out of sync; you can't run this command now
CREATE TABLE bug11754376 (c INT) ENGINE=FEDERATED COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"'#ERROR: 2014 - Commands out of sync; you can't run this command now
CREATE TABLE m3(c1 BIT NULL, c2 BINARY(25) NOT NULL, c3 MEDIUMINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 BIT NOT NULL UNIQUE KEY,c6 DEC(10,8) NOT NULL DEFAULT 3.141592);#ERROR: 2014 - Commands out of sync; you can't run this command now
DROP TABLE t1;#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the  IDLE state
CREATE TABLE IF NOT EXISTS t3 (c1 INT(254) UNSIGNED ZEROFILL,c2 BIGINT,c3 SET('a','b','c') CHARACTER SET 'latin1' COLLATE 'latin1_bin', PRIMARY KEY(c1)) ENGINE=InnoDB;#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the  IDLE state
insert INTO t1  values (0xF2808080);#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the  IDLE state
create table test_user select * from mysql.user;#ERROR: 1163 - Storage engine MEMORY doesn't support BLOB/TEXT columns
OPTIMIZE NO_WRITE_TO_BINLOG TABLE t3;#NOERROR
INSERT INTO t1  SELECT * FROM t1 ;#NOERROR#NOERROR
TRUNCATE t2;#NOERROR
drop table t1,t2;#NOERROR
CREATE TABLE t1 (c INT NOT NULL AUTO_INCREMENT PRIMARY KEY, c2 INT, INDEX idx(c2)) ENGINE=Spider ROW_FORMAT=COMPRESSED;#ERROR: 2014 - Commands out of sync; you can't run this command now
CREATE TABLE t5 (c1 MEDIUMINT NOT NULL);#ERROR: 1050 - Table 't5' already exists
SELECT * FROM t3 INTO OUTFILE 'out5';#NOERROR
ALTER TABLE t3 CHANGE c1 c1 NUMERIC NOT NULL;#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the  ACTIVE state#NOERROR
ALTER TABLE t3 ENGINE=RocksDB;#NOERROR
CREATE TABLE ti (a BIGINT UNSIGNED, b MEDIUMINT UNSIGNED, c CHAR(35) NOT NULL, d VARCHAR(97) NOT NULL, e VARCHAR(65), f VARBINARY(46), g BLOB NOT NULL, h MEDIUMBLOB, id BIGINT NOT NULL, KEY(b), KEY(e), PRIMARY KEY(id)) engine=innodb;#NOERROR
CREATE TABLE merge_table_2 (message CHAR(20)) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"'#NOERROR
CREATE TABLE pct_cached_evict (a TEXT) ENGINE=RocksDB;#NOERROR
INSERT INTO t1 VALUES (8030,repeat('a', 2000)),(8030,repeat('b', 2000)),(8030,repeat('c', 2000)),(8030,repeat('d', 2000)),(8030,repeat('e', 2000)),(8030,repeat('f', 2000)),(8030,repeat('g', 2000)),(8030,repeat('h', 2000)),(8030,repeat('i', 2000)),(8030,repeat('j', 2000));#ERROR: 2014 - Commands out of sync; you can't run this command now
ALTER TABLE t3 MODIFY c3 TIMESTAMP(3);#ERROR: 2014 - Commands out of sync; you can't run this command now
CREATE TABLE IF NOT EXISTS v1 SELECT 2 as a;#NOERROR
CREATE TABLE st1 like t1;#NOERROR
DROP DATABASE test;#ERROR: 1010 - Error dropping database (can't rmdir './test', errno: 39 "Directory not empty")
create table t1 (a varchar(1) character set latin1 collate latin1_general_ci);#NOERROR
INSERT INTO t1 VALUES (2);#NOERROR
DROP TABLE t1;#ERROR: 1051 - Unknown table 'test.t1'#NOERROR
CREATE TABLE ti (a INT, b BIGINT NOT NULL, c CHAR(15), d VARBINARY(94), e VARCHAR(14) NOT NULL, f VARCHAR(19) NOT NULL, g BLOB NOT NULL, h MEDIUMBLOB, id BIGINT NOT NULL, KEY(b), KEY(e), PRIMARY KEY(id)) ENGINE=RocksDB;#ERROR: 2014 - Commands out of sync; you can't run this command now
CREATE TABLE t ( i INT AUTO_INCREMENT, KEY(i) ) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';#NOERROR
CREATE TABLE t1 (a INT) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"' COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"' CHECKSUM=1 ROW_FORMAT=DYNAMIC;#ERROR: 2014 - Commands out of sync; you can't run this command now
CREATE TABLE `table_in_connected_db`(i INT);#NOERROR
DROP TABLE t1;#NOERROR
CREATE TABLE t1 (c1 SMALLINT NOT NULL);#ERROR: 1050 - Table 't1' already exists
CREATE TABLE t5(c1 INT, c2 char(20)) ENGINE=FEDERATED COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';#NOERROR
CREATE TABLE t3 (c1 INT NOT NULL);#NOERROR
INSERT INTO t1 VALUES(111111111111111111111111111111111111);#NOERROR
ALTER TABLE t1 MODIFY COLUMN c1 VARCHAR(255) NOT NULL DEFAULT '';#ERROR: 1054 - Unknown column 'c1' in 't1'
CREATE TABLE t606 (c1 INTEGER);#NOERROR
CREATE TABLE x (a INT) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';#ERROR: 2014 - Commands out of sync; you can't run this command now
INSERT INTO t1 VALUES (16148);#ERROR: 2014 - Commands out of sync; you can't run this command now
ALTER TABLE ti ENGINE=Aria;#ERROR: 2014 - Commands out of sync; you can't run this command now
drop table if exists t1,t2;#NOERROR
CREATE TABLE t1 (c int) ENGINE=InnoDB;#NOERROR
CREATE TABLE t170 (c1 INTEGER);#NOERROR
create table test.RocksDBt3s2 (a int primary key, b int) engine=RocksDB;#ERROR: 2014 - Commands out of sync; you can't run this command now
ANALYZE TABLE t1,t2,t3,t4;#NOERROR
DROP TABLE t1,t2,t3,t4;#ERROR: 1051 - Unknown table 'test.t2,test.t4'
CREATE TABLE t1 (id INT NOT NULL, ref INT NOT NULL, INDEX (id)) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"' COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';#ERROR: 1286 - Unknown storage engine 'Spider'
SET @@global.innodb_max_dirty_pages_pct = FALSE;#NOERROR#NOERROR
DROP TABLE t, ti; ;

The testcase is continuing to reduce on a dedicated server runnning 100 mariadbd's simultaneously, each trying to reduce the testcase further. We are at 61 58 56 lines now.

Update (6 May): the best reduced testcase was still 56 lines, but was able to remove most comments etc. Testcase:

--no-defaults --max_allowed_packet=33554432 --maximum-bulk_insert_buffer_size=1M --maximum-join_buffer_size=1M --maximum-max_heap_table_size=1M --maximum-max_join_size=1M --maximum-myisam_max_sort_file_size=1M --maximum-myisam_mmap_size=1M --maximum-myisam_sort_buffer_size=1M --maximum-optimizer_trace_max_mem_size=1M --maximum-preload_buffer_size=1M --maximum-query_alloc_block_size=1M --maximum-query_prealloc_size=1M --maximum-range_alloc_block_size=1M --maximum-read_buffer_size=1M --maximum-read_rnd_buffer_size=1M --maximum-sort_buffer_size=1M --maximum-tmp_table_size=1M --maximum-transaction_alloc_block_size=1M --maximum-transaction_prealloc_size=1M --log-output=none --sql_mode= --sql_mode= --deadlock-timeout-short=10 --deadlock-timeout-long=10 --deadlock-search-depth-short=10 --deadlock-search-depth-long=33 --innodb_change_buffering=inserts --innodb_fast_shutdown=1 --innodb_file_per_table=1 --log-bin --binlog_format=ROW --log_bin_trust_function_creators=1 --innodb-buffer-pool-size=300M --loose-debug_assert_on_not_freed_memory=1 --innodb_use_native_aio=1

create TABLE t1 ( id int unsigned auto_increment, primary key(id) ) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"' COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';
CREATE TABLE IF NOT EXISTS help_category ( help_category_id smallint unsigned not null, name  char(64) not null, parent_category_id smallint unsigned null, url char(128) not null, primary key (help_category_id), unique index (name) ) engine=InnoDB CHARACTER SET utf8   comment='help categories';
INSERT INTO t1 VALUES (8484+0.75);
DROP TABLE IF EXISTS t1;
CREATE TABLE t2 (a int);
CREATE TABLE t1 (s1 VARCHAR (60) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_520_CI) ENGINE=FEDERATED COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';
CREATE TEMPORARY TABLE t1 (c1 INT) ENGINE=RocksDB;
CREATE TABLE bug11754376 (c INT) ENGINE=FEDERATED COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';#ERROR: 2014 - Commands out of sync; you can't run this command now
CREATE TABLE m3(c1 BIT NULL, c2 BINARY(25) NOT NULL, c3 MEDIUMINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 BIT NOT NULL UNIQUE KEY,c6 DEC(10,8) NOT NULL DEFAULT 3.141592);
DROP TABLE t1;
CREATE TABLE IF NOT EXISTS t3 (c1 INT(254) UNSIGNED ZEROFILL,c2 BIGINT,c3 SET('a','b','c') CHARACTER SET 'latin1' COLLATE 'latin1_bin', PRIMARY KEY(c1)) ENGINE=InnoDB;
insert INTO t1  values (0xF2808080);
create table test_user select * from mysql.user;
OPTIMIZE NO_WRITE_TO_BINLOG TABLE t3;
INSERT INTO t1  SELECT * FROM t1 ;
TRUNCATE t2;
drop table t1,t2;
CREATE TABLE t1 (c INT NOT NULL AUTO_INCREMENT PRIMARY KEY, c2 INT, INDEX idx(c2)) ENGINE=Spider ROW_FORMAT=COMPRESSED;
CREATE TABLE t5 (c1 MEDIUMINT NOT NULL);
SELECT * FROM t3 INTO OUTFILE 'out5';
ALTER TABLE t3 CHANGE c1 c1 NUMERIC NOT NULL;
ALTER TABLE t3 ENGINE=RocksDB;
CREATE TABLE ti (a BIGINT UNSIGNED, b MEDIUMINT UNSIGNED, c CHAR(35) NOT NULL, d VARCHAR(97) NOT NULL, e VARCHAR(65), f VARBINARY(46), g BLOB NOT NULL, h MEDIUMBLOB, id BIGINT NOT NULL, KEY(b), KEY(e), PRIMARY KEY(id)) engine=innodb;
CREATE TABLE merge_table_2 (message CHAR(20)) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';#NOERROR
CREATE TABLE pct_cached_evict (a TEXT) ENGINE=RocksDB;
INSERT INTO t1 VALUES (8030,repeat('a', 2000)),(8030,repeat('b', 2000)),(8030,repeat('c', 2000)),(8030,repeat('d', 2000)),(8030,repeat('e', 2000)),(8030,repeat('f', 2000)),(8030,repeat('g', 2000)),(8030,repeat('h', 2000)),(8030,repeat('i', 2000)),(8030,repeat('j', 2000));
ALTER TABLE t3 MODIFY c3 TIMESTAMP(3);
CREATE TABLE IF NOT EXISTS v1 SELECT 2 as a;
CREATE TABLE st1 like t1;
DROP DATABASE test;
create table t1 (a varchar(1) character set latin1 collate latin1_general_ci);
DROP TABLE t1;
CREATE TABLE ti (a INT, b BIGINT NOT NULL, c CHAR(15), d VARBINARY(94), e VARCHAR(14) NOT NULL, f VARCHAR(19) NOT NULL, g BLOB NOT NULL, h MEDIUMBLOB, id BIGINT NOT NULL, KEY(b), KEY(e), PRIMARY KEY(id)) ENGINE=RocksDB;
CREATE TABLE t ( i INT AUTO_INCREMENT, KEY(i) ) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';
CREATE TABLE t1 (a INT) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"' COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"' CHECKSUM=1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `table_in_connected_db`(i INT);
DROP TABLE t1;
CREATE TABLE t1 (c1 SMALLINT NOT NULL);
CREATE TABLE t5(c1 INT, c2 char(20)) ENGINE=FEDERATED COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';
CREATE TABLE t3 (c1 INT NOT NULL);
INSERT INTO t1 VALUES(111111111111111111111111111111111111);
ALTER TABLE t1 MODIFY COLUMN c1 VARCHAR(255) NOT NULL DEFAULT '';
CREATE TABLE t606 (c1 INTEGER);
CREATE TABLE x (a INT) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';
INSERT INTO t1 VALUES (16148);
ALTER TABLE ti ENGINE=Aria;
drop table if exists t1,t2;
CREATE TABLE t1 (c int) ENGINE=InnoDB;
CREATE TABLE t170 (c1 INTEGER);
create table test.RocksDBt3s2 (a int primary key, b int) engine=RocksDB;
ANALYZE TABLE t1,t2,t3,t4;
DROP TABLE t1,t2,t3,t4;
CREATE TABLE t1 (id INT NOT NULL, ref INT NOT NULL, INDEX (id)) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"' COMMENT='WRAPPER "mysql", srv "srv", TABLE "t"';
SET @@global.innodb_max_dirty_pages_pct = FALSE;
DROP TABLE t, ti; ;
SELECT SLEEP(2);

Comment by Roel Van de Paar [ 2023-05-06 ]

Observed while testing with (already partially reduced) related SQL.

11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203 (Debug)

2023-05-06 10:20:40 8 [Note] InnoDB: Cannot close file ./test/t2#P#p1992.ibd because of 1 pending operations
2023-05-06 10:20:45 8 [Note] InnoDB: Cannot close file ./test/t2#P#p4567.ibd because of 1 pending operations
2023-05-06 10:20:50 8 [Note] InnoDB: Cannot close file ./test/t2#P#p6148.ibd because of 1 pending operations

Comment by Roel Van de Paar [ 2023-05-06 ]

Based on recommendation from Marko, started testing against 11.0 to see if the issue exists there too given the 11.1+ merge blocks and recent changes to the page flushing, and some regressions. I now found the issue exists there also:

11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203 (Debug)

mysqld: /test/11.0_dbg/storage/innobase/buf/buf0flu.cc:1085: ulint buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, ulint, ulint): Assertion `!bpage' failed.

11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203 (Debug)

Core was generated by `/test/MD070423-mariadb-11.0.2-linux-x86_64-dbg/bin/mysqld --no-defaults --max_a'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=23232408036928)
    at ./nptl/pthread_kill.c:44
[Current thread is 1 (Thread 0x1521376da640 (LWP 1534439))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=23232408036928) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=23232408036928) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=23232408036928, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x0000152157442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00001521574287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000015215742871b in __assert_fail_base (fmt=0x1521575dd150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55c7e802810f "!bpage", file=0x55c7e8026e30 "/test/11.0_dbg/storage/innobase/buf/buf0flu.cc", line=1085, function=<optimized out>) at ./assert/assert.c:92
#6  0x0000152157439e96 in __GI___assert_fail (assertion=0x55c7e802810f "!bpage", file=0x55c7e8026e30 "/test/11.0_dbg/storage/innobase/buf/buf0flu.cc", line=1085, function=0x55c7e8027848 "ulint buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, ulint, ulint)") at ./assert/assert.c:101
#7  0x000055c7e7ac3627 in buf_flush_try_neighbors (space=space@entry=0x1520bfe63bc0, page_id=page_id@entry={m_id = 33788507717633}, bpage=bpage@entry=0x15213c471bc0, contiguous=contiguous@entry=true, evict=evict@entry=false, n_flushed=n_flushed@entry=3, n_to_flush=104) at /test/11.0_dbg/storage/innobase/buf/buf0flu.cc:1085
#8  0x000055c7e7ac6042 in buf_do_flush_list_batch (lsn=18446744073709551615, max_n=104) at /test/11.0_dbg/storage/innobase/buf/buf0flu.cc:1433
#9  buf_flush_list_holding_mutex (max_n=max_n@entry=104, lsn=lsn@entry=18446744073709551615) at /test/11.0_dbg/storage/innobase/buf/buf0flu.cc:1507
#10 0x000055c7e7ac8689 in buf_flush_page_cleaner () at /test/11.0_dbg/storage/innobase/buf/buf0flu.cc:2486
#11 0x000055c7e7ac8b05 in std::__invoke_impl<void, void (*)()> (__f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:60
#12 std::__invoke<void (*)()> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
#13 std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:253
#14 std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:260
#15 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
#16 0x00001521578dc2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x0000152157494b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#18 0x0000152157526a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I will try and reproduce it using the latest 11.0 next.

Comment by Roel Van de Paar [ 2023-05-06 ]

New assert (in two similar stacks) observed while testing with (already partially reduced) related SQL:

11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203 (Debug)

fil_system.n_open == n_open|SIGABRT|ut_dbg_assertion_failed|fil_validate|fil_validate_skip|fil_aio_callback
fil_system.n_open == n_open|SIGABRT|ut_dbg_assertion_failed|fil_validate|fil_validate_skip|fil_space_t::io  

A matching bug search in JIRA provides some possibly related candidates for this particular assert.
At times (but not always) when this assert happens, the following error is seen:

11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203 (Debug)

42/log/master.err:2023-05-06 10:29:10 8 [ERROR] InnoDB: Operating system error number 13 in a file operation.
42/log/master.err:2023-05-06 10:29:10 8 [ERROR] InnoDB: The error means mariadbd does not have the access rights to the directory.

Comment by Roel Van de Paar [ 2023-05-06 ]

Testing seems to show that this issue does not exist in 10.7.8 bc656c4fa54c12ceabd857e8ae134f8979d82944 (Debug, build 7 April), whereas it does exist in a 11.0 build from the same date. 10.7 Is also not running into any of the other issues mentioned here.

Comment by Roel Van de Paar [ 2023-05-06 ]

Reproduced in 10.11 also.

10.11.3 17caff92b51af08928911feb5c4ae23e4448575d (Debug)

mysqld: /test/10.11_dbg/storage/innobase/buf/buf0flu.cc:1086: ulint buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, ulint, ulint): Assertion `!bpage' failed.

10.11.3 17caff92b51af08928911feb5c4ae23e4448575d (Debug)

Core was generated by `/test/MD070423-mariadb-10.11.3-linux-x86_64-dbg/bin/mysqld --no-defaults --max_'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=23448719660608)
    at ./nptl/pthread_kill.c:44
[Current thread is 1 (Thread 0x1553949b1640 (LWP 2601347))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=23448719660608) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=23448719660608) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=23448719660608, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00001553b4842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00001553b48287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00001553b482871b in __assert_fail_base (fmt=0x1553b49dd150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5611705b0d77 "!bpage", file=0x5611705afa58 "/test/10.11_dbg/storage/innobase/buf/buf0flu.cc", line=1086, function=<optimized out>) at ./assert/assert.c:92
#6  0x00001553b4839e96 in __GI___assert_fail (assertion=0x5611705b0d77 "!bpage", file=0x5611705afa58 "/test/10.11_dbg/storage/innobase/buf/buf0flu.cc", line=1086, function=0x5611705b0490 "ulint buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, ulint, ulint)") at ./assert/assert.c:101
#7  0x000056117004d501 in buf_flush_try_neighbors (space=space@entry=0x15530f824300, page_id=page_id@entry={m_id = 22419729285121}, bpage=bpage@entry=0x1553a3b52ff0, contiguous=contiguous@entry=true, evict=evict@entry=false, n_flushed=n_flushed@entry=114, n_to_flush=139) at /test/10.11_dbg/storage/innobase/buf/buf0flu.cc:1086
#8  0x000056117004ffa0 in buf_do_flush_list_batch (lsn=18446744073709551615, max_n=139) at /test/10.11_dbg/storage/innobase/buf/buf0flu.cc:1434
#9  buf_flush_list_holding_mutex (max_n=max_n@entry=139, lsn=lsn@entry=18446744073709551615) at /test/10.11_dbg/storage/innobase/buf/buf0flu.cc:1508
#10 0x0000561170052613 in buf_flush_page_cleaner () at /test/10.11_dbg/storage/innobase/buf/buf0flu.cc:2499
#11 0x0000561170052a8f in std::__invoke_impl<void, void (*)()> (__f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:60
#12 std::__invoke<void (*)()> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
#13 std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:253
#14 std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:260
#15 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
#16 0x00001553b4cdc2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00001553b4894b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#18 0x00001553b4926a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Comment by Roel Van de Paar [ 2023-05-06 ]

The two fil_system.n_open == n_open assert stacks also appear in 10.11 (whereas they did also not appear in 10.7), potentially connecting the issues.

Comment by Roel Van de Paar [ 2023-05-06 ]

A new build of 10.11 as well as a new build of 11.0 do not seem to show the !bpage assert, however I was able to reproduce a thread hang in TRUNCATE using the same partially reduced SQL as used in previous comments (not the one given above, gut a broader set of combined various reductions):

11.0.2 63df43a0e830c6b54178574e4f72f21d19ed765e (Debug)

2023-05-06 14:59:47 4 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync
2023-05-06 15:01:31 4 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync

11.0.2 63df43a0e830c6b54178574e4f72f21d19ed765e (Debug)

11.0.2-dbg>SHOW PROCESSLIST;
+----+------+-----------+------+---------+------+----------+------------------+----------+
| Id | User | Host      | db   | Command | Time | State    | Info             | Progress |
+----+------+-----------+------+---------+------+----------+------------------+----------+
|  4 | root | localhost | test | Query   |  443 | NULL     | TRUNCATE t2      |    0.000 |
|  8 | root | localhost | test | Query   |    0 | starting | SHOW PROCESSLIST |    0.000 |
+----+------+-----------+------+---------+------+----------+------------------+----------+
2 rows in set (0.000 sec)

Additionally, this issue seems reducible (in progress).

Comment by Roel Van de Paar [ 2023-05-06 ]

Related:

11.0.2 63df43a0e830c6b54178574e4f72f21d19ed765e (Debug)

11.0.2-dbg>CREATE TABLE t (c INT) DATA DIRECTORY='/not/existing' ENGINE=InnoDB;
ERROR 1005 (HY000): Can't create table `test`.`t` (errno: 168 "Unknown (generic) error from engine")

11.0.2 63df43a0e830c6b54178574e4f72f21d19ed765e (Debug)

2023-05-06 15:49:33 4 [ERROR] InnoDB: Operating system error number 13 in a file operation.
2023-05-06 15:49:33 4 [ERROR] InnoDB: The error means mariadbd does not have the access rights to the directory.

Comment by Marko Mäkelä [ 2023-05-08 ]

10.11 17caff92b51af08928911feb5c4ae23e4448575d did not include a merge of some fixups for MDEV-26827 and MDEV-26055.

11.0 63df43a0e830c6b54178574e4f72f21d19ed765e does include those changes.

11.0 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203, where some assertions failed in fil_validate(), does not include a merge of MDEV-31080.

Comment by Roel Van de Paar [ 2023-05-12 ]

Thank you for the assistance Marko. Further testing on the rr instance (using 10.11.3 41dbc32665e6e821c7dd469517c877dbe5a775ee) has not shown the bug again, in spite of 150k+ trials, so it looks like one of those (or other) patches also resolved this bug.

Generated at Thu Feb 08 10:21:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.