[MDEV-26381] InnoDB: Failing assertion: fsize != os_offset_t(-1) on CREATE TABLE with many partitions / ENOMEM (out of kernel memory) from fstat handling Created: 2021-08-17  Updated: 2023-12-13

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 11.0, 11.2
Fix Version/s: 10.4, 10.5, 10.6, 11.0, 11.2

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Daniel Black
Resolution: Unresolved Votes: 2
Labels: None

Attachments: File MDEV-26381_1.sql     File MDEV-26381_2.sql     File MDEV-26381_3.sql     File MDEV-26381_4.sql     File MDEV-26381_OTHER_1.sql     File MDEV-26381_OTHER_1.threads.dbg.log     File MDEV-26381_OTHER_1.threads_partial_unlock.dbg.log     File MDEV-26381_OTHER_2.sql     File MDEV-26381_OTHER_2.threads.opt.log    
Issue Links:
Issue split
split to MDEV-26873 Partial server hang when using many t... Closed
Relates
relates to MDEV-26068 Server crashes when innodb-data-file-... Closed
relates to MDEV-26293 InnoDB: Failing assertion: space->is_... Closed
relates to MDEV-25215 Excessive logging "Cannot close file ... Closed

 Description   

The attached (3 line) testcase ( MDEV-26381_1.sql ) is not finalized. Whilst I am confident that the said testcase can produce the assertion as-is (it was reduced to this based on seeing the assert or not), reproducing the crash itself is not straightforward. It may be highly sporadic, or more likely dependent on some system condition (like running out of file descriptors, some OS shortcoming, or similar). As discussed elsewhere, the issue is likely not confined to partitioning but is rather related to the accessing of many .ibd files whilst running into a [bad] file descriptor issue,. There is a core file and Marko will analyze that one. [DONE]

Possibly related: MDEV-20648 , MDEV-26293 (likely)

mysqld options from original run. Not all necessary.

--max_allowed_packet=33554432 --maximum-bulk_insert_buffer_size=1M --maximum-join_buffer_size=1M --maximum-max_heap_table_size=1M --maximum-max_join_size=1M --maximum-myisam_max_sort_file_size=1M --maximum-myisam_mmap_size=1M --maximum-myisam_sort_buffer_size=1M --maximum-optimizer_trace_max_mem_size=1M --maximum-preload_buffer_size=1M --maximum-query_alloc_block_size=1M --maximum-query_prealloc_size=1M --maximum-range_alloc_block_size=1M --maximum-read_buffer_size=1M --maximum-read_rnd_buffer_size=1M --maximum-sort_buffer_size=1M --maximum-tmp_table_size=1M --maximum-transaction_alloc_block_size=1M --maximum-transaction_prealloc_size=1M --log-output=none --sql_mode=ONLY_FULL_GROUP_BY --log_bin --innodb_file_per_table=1 --innodb_flush_method=O_DIRECT  --table-cache=2 --symbolic-links=0 --innodb-log-checksums=0

10.6.4 42b9daaea7ece6155558f7c3bf638e8e175a7ff7 (Optimized)

InnoDB: Failing assertion: fsize != os_offset_t(-1)

10.6.4 42b9daaea7ece6155558f7c3bf638e8e175a7ff7 (Optimized)

Core was generated by `/test/MD260721-mariadb-10.6.4-linux-x86_64-opt/bin/mysqld --no-defaults --max_a'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x14e10c449700 (LWP 3154497))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000014e10d840859 in __GI_abort () at abort.c:79
#2  0x000055b95dd66b11 in ut_dbg_assertion_failed (expr=expr@entry=0x55b95e9746b6 "fsize != os_offset_t(-1)", file=file@entry=0x55b95e973e70 "/test/10.6_opt/storage/innobase/fil/fil0fil.cc", line=line@entry=591) at /test/10.6_opt/storage/innobase/ut/ut0dbg.cc:60
#3  0x000055b95dd77806 in fil_space_extend_must_retry (space=space@entry=0x14e070cdb488, node=0x14e070cdb5e8, size=size@entry=6, success=success@entry=0x14e10c444bdf) at /test/10.6_opt/storage/innobase/fil/fil0fil.cc:591
#4  0x000055b95dd77b45 in fil_space_extend (space=space@entry=0x14e070cdb488, size=6) at /test/10.6_opt/storage/innobase/fil/fil0fil.cc:705
#5  0x000055b95dd7ab4d in fsp_try_extend_data_file_with_pages (space=0x14e070cdb488, page_no=<optimized out>, header=0x14e0ec018480, mtr=0x14e10c445000) at /test/10.6_opt/storage/innobase/fsp/fsp0fsp.cc:625
#6  0x000055b95e58544d in fseg_create (space=space@entry=0x14e070cdb488, byte_offset=byte_offset@entry=74, mtr=mtr@entry=0x14e10c445000, has_done_reservation=has_done_reservation@entry=false, block=block@entry=0x14e0ec0bcb80) at /test/10.6_opt/storage/innobase/fsp/fsp0fsp.cc:1698
#7  0x000055b95e4f7474 in btr_create (type=1, space=0x14e070cdb488, index_id=4850, index=index@entry=0x14e070a11db0, mtr=mtr@entry=0x14e10c445000) at /test/10.6_opt/storage/innobase/btr/btr0btr.cc:1031
#8  0x000055b95e5477ad in dict_create_index_tree_step (node=0x14e07813a200) at /test/10.6_opt/storage/innobase/dict/dict0crea.cc:768
#9  0x000055b95e548177 in dict_create_index_step (thr=thr@entry=0x14e07105f5c0) at /test/10.6_opt/storage/innobase/dict/dict0crea.cc:1249
#10 0x000055b95e467488 in que_thr_step (thr=0x14e07105f5c0) at /test/10.6_opt/storage/innobase/que/que0que.cc:663
#11 que_run_threads_low (thr=<optimized out>) at /test/10.6_opt/storage/innobase/que/que0que.cc:709
#12 que_run_threads (thr=thr@entry=0x14e07105f5c0) at /test/10.6_opt/storage/innobase/que/que0que.cc:729
#13 0x000055b95e494d10 in row_create_index_for_mysql (index=<optimized out>, trx=0x14e10c750330, field_lengths=field_lengths@entry=0x0, mode=FIL_ENCRYPTION_DEFAULT, key_id=<optimized out>) at /test/10.6_opt/storage/innobase/row/row0mysql.cc:2394
#14 0x000055b95e3eb653 in create_table_info_t::create_table (this=0x14e10c445920, create_fk=<optimized out>) at /test/10.6_opt/storage/innobase/handler/ha_innodb.cc:12603
#15 0x000055b95e3ef791 in ha_innobase::create (this=<optimized out>, name=0x14e10c4460e0 "./test/t2#P#p4560", form=<optimized out>, create_info=<optimized out>, file_per_table=<optimized out>, trx=0x14e10c750330) at /test/10.6_opt/storage/innobase/handler/ha_innodb.cc:13135
#16 0x000055b95e0d49c4 in handler::ha_create (this=<optimized out>, name=name@entry=0x14e10c4460e0 "./test/t2#P#p4560", form=form@entry=0x14e10c446420, info_arg=info_arg@entry=0x14e10c447c70) at /test/10.6_opt/sql/handler.cc:5405
#17 0x000055b95e32ab40 in ha_partition::create (this=0x14e079158bd0, name=0x14e10c447780 "./test/t2", table_arg=0x14e10c446420, create_info=0x14e10c447c70) at /test/10.6_opt/sql/ha_partition.cc:786
#18 0x000055b95e0d49c4 in handler::ha_create (this=<optimized out>, name=<optimized out>, form=0x14e10c446420, info_arg=0x14e10c447c70) at /test/10.6_opt/sql/handler.cc:5405
#19 0x000055b95e0d52ca in ha_create_table (thd=thd@entry=0x14e078000c58, path=<optimized out>, db=0x14e078011030 "test", table_name=0x14e078010918 "t2", create_info=create_info@entry=0x14e10c447c70, frm=frm@entry=0x14e10c447770, skip_frm_file=false) at /test/10.6_opt/sql/handler.cc:5870
#20 0x000055b95df3f655 in create_table_impl (thd=thd@entry=0x14e078000c58, ddl_log_state_create=ddl_log_state_create@entry=0x14e10c447a20, ddl_log_state_rm=<optimized out>, orig_db=@0x14e078010968: {str = 0x14e078011030 "test", length = 4}, orig_table_name=@0x14e078010978: {str = 0x14e078010918 "t2", length = 2}, db=@0x14e078010968: {str = 0x14e078011030 "test", length = 4}, table_name=@0x14e078010978: {str = 0x14e078010918 "t2", length = 2}, path=@0x14e10c447760: {str = 0x14e10c447780 "./test/t2", length = 9}, options={m_options = DDL_options_st::OPT_NONE}, create_info=0x14e10c447c70, alter_info=0x14e10c447b80, create_table_mode=0, is_trans=0x14e10c447a1f, key_info=0x14e10c447758, key_count=0x14e10c447754, frm=0x14e10c447770) at /test/10.6_opt/sql/sql_table.cc:4447
#21 0x000055b95df3ffd8 in mysql_create_table_no_lock (thd=thd@entry=0x14e078000c58, ddl_log_state_create=ddl_log_state_create@entry=0x14e10c447a20, ddl_log_state_rm=ddl_log_state_rm@entry=0x14e10c447a40, db=db@entry=0x14e078010968, table_name=table_name@entry=0x14e078010978, create_info=create_info@entry=0x14e10c447c70, alter_info=0x14e10c447b80, is_trans=0x14e10c447a1f, create_table_mode=0, table_list=0x14e078010950) at /test/10.6_opt/sql/sql_table.cc:4546
#22 0x000055b95df403c4 in mysql_create_table (thd=thd@entry=0x14e078000c58, create_table=create_table@entry=0x14e078010950, create_info=create_info@entry=0x14e10c447c70, alter_info=alter_info@entry=0x14e10c447b80) at /test/10.6_opt/sql/sql_table.cc:4658
#23 0x000055b95df41c59 in Sql_cmd_create_table_like::execute (this=<optimized out>, thd=0x14e078000c58) at /test/10.6_opt/sql/sql_table.cc:11778
#24 0x000055b95de9d63e in mysql_execute_command (thd=0x14e078000c58, is_called_from_prepared_stmt=<optimized out>) at /test/10.6_opt/sql/sql_parse.cc:5993
#25 0x000055b95de8da46 in mysql_parse (thd=0x14e078000c58, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/10.6_opt/sql/sql_parse.cc:8026
#26 0x000055b95de99945 in dispatch_command (command=COM_QUERY, thd=0x14e078000c58, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/10.6_opt/sql/sql_class.h:1340
#27 0x000055b95de9b867 in do_command (thd=0x14e078000c58, blocking=blocking@entry=true) at /test/10.6_opt/sql/sql_parse.cc:1404
#28 0x000055b95dfaa017 in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at /test/10.6_opt/sql/sql_connect.cc:1410
#29 0x000055b95dfaa37d in handle_one_connection (arg=arg@entry=0x55b95fbcad18) at /test/10.6_opt/sql/sql_connect.cc:1312
#30 0x000055b95e334948 in pfs_spawn_thread (arg=0x55b960067bf8) at /test/10.6_opt/storage/perfschema/pfs.cc:2201
#31 0x000014e10dd4f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#32 0x000014e10d93d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.6.4 42b9daaea7ece6155558f7c3bf638e8e175a7ff7 (Optimized)

...
2021-07-29  0:00:17 8 [Note] InnoDB: Cannot close file ./test/t2#P#p2604.ibd because of 0 pending operations and pending fsync
2021-07-29  0:00:17 8 [Note] InnoDB: Cannot close file ./test/t2#P#p2605.ibd because of 0 pending operations and pending fsync
2021-07-29  0:00:17 8 [Note] InnoDB: Cannot close file ./test/t2#P#p2606.ibd because of 0 pending operations and pending fsync
2021-07-29  0:00:17 8 [ERROR] InnoDB: preallocating 98304 bytes for file ./test/t2#P#p4560.ibd failed with error 9
2021-07-29 00:00:17 0x14e10c449700  InnoDB: Assertion failure in file /test/10.6_opt/storage/innobase/fil/fil0fil.cc line 591
InnoDB: Failing assertion: fsize != os_offset_t(-1)
InnoDB: We intentionally generate a memory trap.



 Comments   
Comment by Roel Van de Paar [ 2021-08-17 ]

The single command in MDEV-26381_2.sql can be used to start generating the "Cannot close file ... pending operations and pending fsync" messages. It happens consistently after approx 30-35 seconds. Query takes 1 min+ on fast machine.

Comment by Marko Mäkelä [ 2021-08-17 ]

I tried repeating this on 10.5 and 10.6 both with Debug and RelWithDebInfo, to no avail. My invocation was like this:

./mtr --parallel=auto --repeat=100 innodb.MDEV-26381_2{,,,,,,,,,,,,,,,,,,,,,,,}

This was like MDEV-26381_2.sql, but added the following around the file:

--source include/have_innodb.inc
--source include/have_partition.inc
DROP TABLE t2;

Please provide an rr replay trace of one failure.

Comment by Marko Mäkelä [ 2021-08-17 ]

I think that my storage (NVMe) is too fast. I realized that my initial attempt was on /dev/shm. But even after symlinking mysql-test/var to something real and after disabling true asynchronous I/O, the test refuses to fail:

./mtr --repeat=100 --mysqld=--skip-innodb-use-native-aio innodb.MDEV-26381_2

Each run takes only 8.5 seconds. If I start 14 tests in parallel, then the time of one concurrent test will double to 16 seconds and the storage will write 600 to 750 MB/s, but I get no crashes. If I let the server use io_uring, the time for one test still remains around 16 seconds.

Comment by Elena Stepanova [ 2021-09-24 ]

marko,

I hope you take into account that the MTR test will never actually fail due to "Cannot close file ... pending operations and pending fsync" – it is a note, not a warning, MTR doesn't pay attention to it when it appears in the error log.

Comment by Marko Mäkelä [ 2021-10-13 ]

Roel, I am still waiting for an rr replay trace of the crash. Please note that the message rate was recently reduced in the fix of MDEV-25215. I do not think that it should affect the assertion failure.

Comment by Roel Van de Paar [ 2021-10-13 ]

Observed in 10.7 debug build from 9 Oct 21, running 200 threads of MDEV-26381_1.sql with 2x added drop/create/use test database (added as MDEV-26381_3.sql ):

10.7.1 5cc9cf9a05582307abefd8f3b57548945da86b1a (Debug)

2021-10-14  9:32:33 0 [Note] /test/MD091021-mariadb-10.7.1-linux-x86_64-dbg/bin/mysqld: ready for connections.
Version: '10.7.1-MariaDB-debug-log'  socket: '/test/MD091021-mariadb-10.7.1-linux-x86_64-dbg/socket.sock'  port: 42436  MariaDB Server
2021-10-14  9:32:46 6 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:32:51 6 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
...repeated...
2021-10-14  9:33:29 6 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:34:11 13 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:34:21 13 [Note] InnoDB: Cannot close file ./test/t2#P#p59.ibd because of pending fsync
2021-10-14  9:34:26 13 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of 1 pending operations
2021-10-14  9:34:32 13 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
...repeated...
2021-10-14  9:34:54 13 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:35:01 13 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:38:11 839 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations and pending fsync
2021-10-14  9:38:16 839 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:38:21 839 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync
2021-10-14  9:39:17 1287 [Note] InnoDB: Cannot close file ./test/t2#P#p126.ibd because of pending fsync
2021-10-14  9:39:22 1287 [Note] InnoDB: Cannot close file ./test/t2#P#p886.ibd because of pending fsync
2021-10-14  9:39:27 1287 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:39:33 1287 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
...repeated...
2021-10-14  9:40:01 1287 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:40:39 1619 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14  9:40:49 1619 [Note] InnoDB: Cannot close file ./test/t2#P#p111.ibd because of pending fsync
2021-10-14  9:40:54 1619 [Note] InnoDB: Cannot close file ./test/t2#P#p1022.ibd because of pending fsync
2021-10-14  9:40:59 1619 [Note] InnoDB: Cannot close file ./test/t2#P#p1806.ibd because of pending fsync
...

What is also interesting is that the way the CLI output renders is different over time; even if this is running 200 threads at the same time, it's clear that threads/situations are affecting each other and output differs over time, for example long periods of ERROR 1050 (42S01) at line x: Table 't2' already exists versus a burst period of ERROR 1146 (42S02) at line x: Table 'test.t2' doesn't exist} and {{ERROR 1146 (42S02) at line x: Table 'test.t1' doesn't exist. It seems this simple test setup is able to stress any possible interaction patterns that exist in InnoDB in this area.

Comment by Roel Van de Paar [ 2021-10-13 ]

Also ran into this with a more original and longer version of the testcase. Added info there.

Comment by Roel Van de Paar [ 2021-10-13 ]

Observed that in 10.7 trunk build as of today (14 Oct 21) at least the messages can be reproduced easily with this SQL (abbreviated here), executed in random order, in many (for example 1300) threads simultaneously.

DROP TABLE IF EXISTS t2;
DROP TABLE IF EXISTS t1;
INSERT INTO t1 VALUES (0);
INSERT INTO t2 VALUES (0);
SELECT * FROM t1;
SELECT * FROM t2;
CREATE TABLE t1 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( PARTITION p0 VALUES IN (0,1,2,3) ENGINE = InnoDB, PARTITION p1 VALUES IN (4,5,6,7) ENGINE = InnoDB, ... ... ... .... .... PARTITION p8190 VALUES IN (32760,32761,32762,32763) ENGINE = InnoDB, PARTITION p8191 VALUES IN (32764,32765,32766,32767) ENGINE = InnoDB);
CREATE TABLE t2 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( PARTITION p0 VALUES IN (0,1,2,3) ENGINE = InnoDB, PARTITION p1 VALUES IN (4,5,6,7) ENGINE = InnoDB, ... ... ... .... .... PARTITION p8190 VALUES IN (32760,32761,32762,32763) ENGINE = InnoDB, PARTITION p8191 VALUES IN (32764,32765,32766,32767) ENGINE = InnoDB);

And that with no extra mysqld options added. Full SQL attached as MDEV-26381_4.sql

10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)

2021-10-14 10:22:08 0 [Note] /test/MD141021-mariadb-10.7.1-linux-x86_64-dbg/bin/mysqld: ready for connections.
Version: '10.7.1-MariaDB-debug'  socket: '/test/MD141021-mariadb-10.7.1-linux-x86_64-dbg/socket.sock'  port: 12249  MariaDB Server
2021-10-14 10:22:35 11 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:22:42 5 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
...repeated...
2021-10-14 10:24:19 5 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:24:29 11 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync
2021-10-14 10:27:06 23 [Note] InnoDB: Cannot close file ./test/t1#P#p74.ibd because of pending fsync
2021-10-14 10:27:13 23 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:27:21 19 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
...repeated...
2021-10-14 10:28:52 19 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:29:39 19 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:31:45 29 [Note] InnoDB: Cannot close file ./test/t1#P#p0.ibd because of pending fsync
2021-10-14 10:32:05 29 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync
2021-10-14 10:32:10 32 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
...
2021-10-14 10:35:38 20 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:36:08 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:36:29 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations and pending fsync
2021-10-14 10:36:40 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations and pending fsync
2021-10-14 10:36:50 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:36:58 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
...
2021-10-14 10:38:57 9 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
2021-10-14 10:39:04 9 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 2 pending operations and pending fsync
2021-10-14 10:39:12 9 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync

Indeed the verbosity of the messages seems reduced here, but they still do appear. This likely also means that the crash (which was already highly sporadic) is now very hard to produce.

Comment by Roel Van de Paar [ 2021-10-13 ]

Note to self/others; whilst messages show with MDEV-26381_4.sql, the assert may be dependent on the mysqld options.

Comment by Roel Van de Paar [ 2021-10-13 ]

10000 threads of MDEV-26381_OTHER_1.sql (a simplified SQL file with less partitions) does not produce the "Cannot close file" messages, but produces these automatically (with no extra mysqld options set):

10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)

2021-10-14 10:47:38 0 [Note] /test/MD141021-mariadb-10.7.1-linux-x86_64-dbg/bin/mysqld: ready for connections.
Version: '10.7.1-MariaDB-debug-log'  socket: '/test/MD141021-mariadb-10.7.1-linux-x86_64-dbg/socket.sock'  port: 12249  MariaDB Server
2021-10-14 10:47:41 508 [Note] Detected table cache mutex contention at instance 1: 21% waits. Additional table cache instance activated. Number of instances after activation: 2.
2021-10-14 10:47:42 689 [Note] Detected table cache mutex contention at instance 2: 28% waits. Additional table cache instance activated. Number of instances after activation: 3.
2021-10-14 10:47:58 2709 [Note] InnoDB: Number of pools: 2
2021-10-14 10:48:28 6327 [Note] InnoDB: Number of pools: 3
2021-10-14 10:49:28 13193 [Note] InnoDB: Number of pools: 4
2021-10-14 10:50:18 19404 [Note] InnoDB: Number of pools: 5

With larger versions of the testcase (many partitions/tables) MDEV-26381_OTHER_2.sql , this quickly ramps up to 11 then stabilizes. Smaller versions stabilize around 4-5 pools.

Comment by Roel Van de Paar [ 2021-10-14 ]

Here are some seemingly interesting observations on the two testcases from the last comment;

  • Neither testcase produces the "Cannot close file" messages. This is very odd at 10k threads, especially for MDEV-26381_OTHER_2.sql which has the large amount of partitions CREATE TABLEs (this time in 5 tables all with the same definition as the one which produces the messages i.e. the same as MDEV-26381_4.sql).
  • After some time, both servers (running both testcases, one each) hang/lock up in 'Commit implicit' (the server/OS is still normally responsive, disk is not full and there is plenty of memory free):
    • MDEV-26381_OTHER_1.sql running against debug with no extra mysqld options.

      10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)

      +-------+------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
      | Id    | User | Host      | db   | Command | Time | State                           | Info                                                                                                 | Progress |
      +-------+------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
      |    60 | root | localhost | test | Query   | 1735 | Opening tables                  | INSERT INTO t1 VALUES (0)                                                                            |    0.000 |
      |    62 | root | localhost | test | Query   | 1745 | Opening tables                  | CREATE TABLE t1 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( |    0.000 |
      |    64 | root | localhost | test | Query   | 1746 | Opening tables                  | CREATE TABLE t1 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( |    0.000 |
      |    74 | root | localhost | test | Query   | 1735 | Opening tables                  | SELECT * FROM t1                                                                                     |    0.000 |
      ...
      | 19735 | root | localhost | test | Query   | 1589 | checking permissions            | DROP TABLE IF EXISTS t1                                                                              |    0.000 |
      ...
      | 20004 | root | localhost | test | Query   | 1491 | Opening tables                  | INSERT INTO t1 VALUES (9)                                                                            |    0.000 |
      +-------+------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
      

      Most threads are in 'Opening tables' or 'checking permissions', as per snashopt above, except these:

      10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)

      |  1721 | root | localhost | test | Query   | 1736 | Commit implicit                 | DROP TABLE IF EXISTS t1                                                                              |    0.000 |
      |  8921 | root | localhost | test | Query   | 1709 | Waiting for table metadata lock | CREATE TABLE t1 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( |    0.000 |
      |  8927 | root | localhost | test | Query   | 1709 | Waiting for table metadata lock | INSERT INTO t1 VALUES (4)                                                                            |    0.000 |
      | 15655 | root | localhost | test | Query   | 1654 | Waiting for table metadata lock | INSERT INTO t1 VALUES (0)                                                                            |    0.000 |
      | 15659 | root | localhost | test | Query   | 1654 | Waiting for table metadata lock | INSERT INTO t1 VALUES (1)                                                                            |    0.000 |
      | 15665 | root | localhost | test | Query   | 1654 | Waiting for table metadata lock | INSERT INTO t1 VALUES (4)                                                                            |    0.000 |
      | 15669 | root | localhost | test | Query   | 1654 | Waiting for table metadata lock | INSERT INTO t1 VALUES (23)                                                                           |    0.000 |
      | 20008 | root | localhost | test | Query   |    0 | starting                        | show processlist                                                                                     |    0.000 |
      

      Interestingly, there are some threads older than the Commit implicit.
      It is the same on optimized:

    • MDEV-26381_OTHER_2.sql running against optimized with all mysqld options from the original description.
      Excluding all 'Opening tables' and 'checking permissions' we have;

      10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)

      +-------+------+-----------+------+---------+------+----------------------+---------------------------+----------+
      | Id    | User | Host      | db   | Command | Time | State                | Info                      | Progress |
      +-------+------+-----------+------+---------+------+----------------------+---------------------------+----------+
      |  5063 | root | localhost | test | Query   | 1497 | Commit implicit      | DROP TABLE IF EXISTS t1   |    0.000 |
      | 20007 | root | localhost | test | Query   |    0 | starting             | show processlist          |    0.000 |
      +-------+------+-----------+------+---------+------+----------------------+---------------------------+----------+
      

Comment by Roel Van de Paar [ 2021-10-14 ]
  • Added hang tag for the hangs in Commit implicit. It is yet to be see if the issues are related.
  • Added MDEV-26381_OTHER_1.threads.dbg.log and MDEV-26381_OTHER_2.threads.opt.log showing full SHOW PROCESSLIST outputs during the hangs.
  • After a long time, the debug server partially unlocked (more statements processing/providing output when failed), whilst part of the server remains locked. Seemed quite odd. Updated processlist at that point in MDEV-26381_OTHER_1.threads_partial_unlock.dbg.log. This time many threads in 'Closing tables' and no threads in 'Commit explicit'. The optimized server remains locked in same way. The debug server possibly became partially unlocked due to an event like this:

    10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)

    2021-10-14 11:41:03 11885 [Note] Detected table cache mutex contention at instance 3: 28% waits. Additional table cache instance activated. Number of instances after activation: 4.
    2021-10-14 11:41:03 14948 [Note] Detected table cache mutex contention at instance 1: 24% waits. Additional table cache instance activated. Number of instances after activation: 5.
    2021-10-14 11:41:03 7467 [Note] Detected table cache mutex contention at instance 3: 30% waits. Additional table cache instance activated. Number of instances after activation: 6.
    2021-10-14 11:41:03 11019 [Note] Detected table cache mutex contention at instance 4: 30% waits. Additional table cache instance activated. Number of instances after activation: 7.
    2021-10-14 11:44:20 15395 [Note] InnoDB: Number of pools: 6
    

  • The optimized server remained at the last error log message and the Commit implicit hang before it was terminated (45 minutes after that last message):

    10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)

    2021-10-14 11:05:03 19264 [Note] InnoDB: Number of pools: 11
    

    10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)

    * |  5063 | root | localhost | test | Query   | 2751 | Commit implicit      | DROP TABLE IF EXISTS t1   |    0.000 |
    

  • A smaller number of threads, with a large number of partitions produces the "Cannot close file" messages more easily. A large number of threads, irrespective of the number of partitions (two dozen versus many), can create the hangs. Setups as per above comments.
  • The partial debug server lock never cleared, even though new transactions are going through.

    10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)

    | 14449 | root | localhost | test | Query   | 3630 | closing tables                  | INSERT INTO t1 VALUES (0)
    

Comment by Roel Van de Paar [ 2021-10-14 ]

In spite of many hours of trying both today and earlier, the assert was never seen again, nor in normal runs. An rr trace can thus regrettably not be provided. Debugging the messages and hang above may give further clues as to remaining race conditions and issues. Thanks

Comment by Marko Mäkelä [ 2021-10-15 ]

Roel, thank your for your efforts. Messages of the following type are not issued by InnoDB. I have occasionally noticed them as well when running load tests.

2021-10-14 11:41:03 11885 [Note] Detected table cache mutex contention at instance 3: 28% waits. Additional table cache instance activated. Number of instances after activation: 4.

Messages like the following are referring to the InnoDB trx_pools, which is used for allocating transaction objects:

2021-10-14 11:05:03 19264 [Note] InnoDB: Number of pools: 11

In MDEV-25215, the string "0 pending operations" was removed from the following type of messages, and rate-limiting was implemented.

2021-10-14 10:32:05 29 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync

Those messages are not entirely harmless, because they are a sign that the configured limit of maximum number of open files may be exceeded.

Roel, it is unclear to me whether the original reported crash is reproducible. Maybe you should try ulimit -n 100 or some small number in the shell where you start mariadbd? If creating an rr replay trace is not possible for this crash, I would be happy with a core dump on some computer that I can access.

Comment by Marko Mäkelä [ 2021-10-15 ]

Based on some analysis of a core dump, the reason for the assertion failure could be that the fstat system call in the following failed with ENOMEM (out of memory in the kernel):

os_offset_t
os_file_get_size(os_file_t file)
{
	struct stat statbuf;
	return fstat(file, &statbuf) ? os_offset_t(-1) : statbuf.st_size;
}

I was suspecting that there could have been a race condition that would have allowed another thread to close the file handle.

I do not think that this failure is any recent regression. InnoDB is notoriously unforgiving in this respect. Also memory allocation in the user address space is basically always expected to succeed.

Roel, the hangs that you observed are not directly related to the originally reported crash. Please file a separate ticket for the hangs, with some stack traces. The culprit for those could lie outside of InnoDB.

Comment by Roel Van de Paar [ 2021-10-15 ]

Thank you marko!

Comment by Roel Van de Paar [ 2021-10-15 ]

To clarify; once this bug is split, the hangs will be handled in the splitoff bug MDEV-26873, and the ENOMEM (out of kernel memory) design will be fixed in this bug.

Comment by Marko Mäkelä [ 2021-10-15 ]

Here is an untested patch for the 10.2 series:

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index 8075defac4c..a41d1c8b712 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -1003,7 +1003,9 @@ fil_space_extend_must_retry(
 		to determine how much we were able to
 		extend it */
 		os_offset_t	fsize = os_file_get_size(node->handle);
-		ut_a(fsize != os_offset_t(-1));
+		if (fsize == -1) {
+			return false;
+		}
 
 		last_page_no = ulint(fsize / page_size)
 			+ file_start_page_no;

I think that we need DBUG_EXECUTE_IF fault injection and a test case that covers this. It remains to be seen whether the callers are really ready to deal with the failure to extend a file.

Comment by Marko Mäkelä [ 2022-02-03 ]

Roel, did the patch work for you?

Comment by Roel Van de Paar [ 2022-02-04 ]

marko The issue was seen with very low frequency before, and I have not seen it in recent runs. I have also not observed the 'because of pending fsync' in recent runs. In other words; I don't know. Is there a possibility to implement the patch above as-is?

Comment by Marko Mäkelä [ 2022-03-07 ]

With which Linux kernel version and file system did you reproduce this? I just checked fs/stat.c and ext4_getattr() in fs/ext4/inode.c in a Linux 5.17 release candidate, and I did not immediately see where they would return ENOMEM.

Comment by Roel Van de Paar [ 2022-03-08 ]

Filesystem: ext4.
Kernel: it would have been the latest "apt sudo update/upgrade" kernel (or the one before that) at the time of logging this bug (17 Aug 21), on a google cloud instance running Ubuntu Focal 20.04.4 LTS which is currently running kernel 5.11.0-1028-gcp.

Comment by Roel Van de Paar [ 2022-03-17 ]

New testcase, which seemingly consistently reproduces the "because of pending fsync" on 10.5 to 10.9 and additional info in MDEV-25215

Comment by Marko Mäkelä [ 2022-05-20 ]

Is this still reproducible?

Comment by Tobias Fischer [ 2022-09-28 ]

We just stumbled over this issue today with MariaDB 10.7.6 and also 10.8 and 10.9 releases.

We tracked it down to the following setup:

  • MariaDB running as docker container on WSL2 with Docker for Windows
  • Configured with a docker volume mount `C:\dev\local-project-temp-dir\mysql:/var/lib/mysql:rw`
  • Running a bunch of SQL migrations with Flyway in a Spring Boot application
  • ALTER, MODIFY or DROP statements fail (causing a deadlock situation) in a migration (let's call it "v3") if the table, these queries alter, was created by a previous migration (let's say "v1") in the same Flyway migration run.
  • The migration succeeds however, if the table in question hasn't been created in the same Flyway migration run.
  • If the local volume is located inside the WSL, this problem doesn't occur.
  • If a Docker volume (created by `docker volume create xxx`) is used instead, the scripts run just fine.

This is the log file from the mariaDB container:

2022-09-28T08:10:57.597081197Z 2022-09-28 10:10:57+02:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.7.4+maria~focal started.
2022-09-28T08:10:57.991234750Z 2022-09-28 10:10:57+02:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2022-09-28T08:10:57.996564582Z 2022-09-28 10:10:57+02:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.7.4+maria~focal started.
2022-09-28T08:10:58.219679191Z 2022-09-28 10:10:58+02:00 [Note] [Entrypoint]: MariaDB upgrade not required
2022-09-28T08:10:58.255829792Z 2022-09-28 10:10:58 0 [Note] mariadbd (server 10.7.4-MariaDB-1:10.7.4+maria~focal) starting as process 1 ...
2022-09-28T08:10:58.261891906Z 2022-09-28 10:10:58 0 [Warning] Setting lower_case_table_names=2 because file system for /var/lib/mysql/ is case insensitive
2022-09-28T08:10:58.282132535Z 2022-09-28 10:10:58 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-09-28T08:10:58.283357965Z 2022-09-28 10:10:58 0 [Note] InnoDB: Number of transaction pools: 1
2022-09-28T08:10:58.283411465Z 2022-09-28 10:10:58 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-09-28T08:10:58.283418859Z 2022-09-28 10:10:58 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
2022-09-28T08:10:58.395555300Z 2022-09-28 10:10:58 0 [Note] InnoDB: Using Linux native AIO
2022-09-28T08:10:58.395729241Z 2022-09-28 10:10:58 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
2022-09-28T08:10:58.396459670Z 2022-09-28 10:10:58 0 [Note] InnoDB: Completed initialization of buffer pool
2022-09-28T08:10:58.446937846Z 2022-09-28 10:10:58 0 [Note] InnoDB: 128 rollback segments are active.
2022-09-28T08:10:58.519288724Z 2022-09-28 10:10:58 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-09-28T08:10:58.521303669Z 2022-09-28 10:10:58 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-09-28T08:10:58.963321691Z 2022-09-28 10:10:58 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-09-28T08:10:58.968652765Z 2022-09-28 10:10:58 0 [Note] InnoDB: 10.7.4 started; log sequence number 159937042; transaction id 186163
2022-09-28T08:10:58.968749185Z 2022-09-28 10:10:58 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-09-28T08:10:58.971548383Z 2022-09-28 10:10:58 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-09-28T08:10:58.983788282Z 2022-09-28 10:10:58 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.
2022-09-28T08:10:58.991933171Z 2022-09-28 10:10:58 0 [Note] Server socket created on IP: '0.0.0.0'.
2022-09-28T08:10:58.992049779Z 2022-09-28 10:10:58 0 [Note] Server socket created on IP: '::'.
2022-09-28T08:10:59.055339824Z 2022-09-28 10:10:59 0 [Note] InnoDB: Buffer pool(s) load completed at 220928 10:10:59
2022-09-28T08:10:59.086882855Z 2022-09-28 10:10:59 0 [Note] mariadbd: ready for connections.
2022-09-28T08:10:59.086951574Z Version: '10.7.4-MariaDB-1:10.7.4+maria~focal'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2022-09-28T08:12:17.779003626Z 2022-09-28 10:12:17 17 [ERROR] InnoDB: preallocating 147456 bytes for file ./mpn4/track.ibd failed with error 2
2022-09-28T08:12:17.779102832Z 2022-09-28 10:12:17 0x7fa520494700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.7.4/storage/innobase/fil/fil0fil.cc line 610
2022-09-28T08:12:17.779121587Z InnoDB: Failing assertion: fsize != os_offset_t(-1)
2022-09-28T08:12:17.779127719Z InnoDB: We intentionally generate a memory trap.
2022-09-28T08:12:17.779132437Z InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
2022-09-28T08:12:17.779137337Z InnoDB: If you get repeated assertion failures or crashes, even
2022-09-28T08:12:17.779141985Z InnoDB: immediately after the mariadbd startup, there may be
2022-09-28T08:12:17.779146574Z InnoDB: corruption in the InnoDB tablespace. Please refer to
2022-09-28T08:12:17.779151012Z InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
2022-09-28T08:12:17.779155531Z InnoDB: about forcing recovery.
2022-09-28T08:12:17.779159899Z 220928 10:12:17 [ERROR] mysqld got signal 6 ;
2022-09-28T08:12:17.779164397Z This could be because you hit a bug. It is also possible that this binary
2022-09-28T08:12:17.779180347Z or one of the libraries it was linked against is corrupt, improperly built,
2022-09-28T08:12:17.779184355Z or misconfigured. This error can also be caused by malfunctioning hardware.
2022-09-28T08:12:17.779188142Z 
2022-09-28T08:12:17.779191719Z To report this bug, see https://mariadb.com/kb/en/reporting-bugs
2022-09-28T08:12:17.779195586Z 
2022-09-28T08:12:17.779199263Z We will try our best to scrape up some info that will hopefully help
2022-09-28T08:12:17.779203380Z diagnose the problem, but since we have already crashed, 
2022-09-28T08:12:17.779238897Z something is definitely wrong and this may fail.
2022-09-28T08:12:17.779243866Z 
2022-09-28T08:12:17.779247924Z Server version: 10.7.4-MariaDB-1:10.7.4+maria~focal
2022-09-28T08:12:17.779251972Z key_buffer_size=134217728
2022-09-28T08:12:17.779255859Z read_buffer_size=131072
2022-09-28T08:12:17.779259616Z max_used_connections=11
2022-09-28T08:12:17.779263343Z max_threads=153
2022-09-28T08:12:17.779267100Z thread_count=11
2022-09-28T08:12:17.779270977Z It is possible that mysqld could use up to 
2022-09-28T08:12:17.779274874Z key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467995 K  bytes of memory
2022-09-28T08:12:17.779278852Z Hope that's ok; if not, decrease some variables in the equation.
2022-09-28T08:12:17.779282729Z 
2022-09-28T08:12:17.779286416Z Thread pointer: 0x7fa4d4000c58
2022-09-28T08:12:17.779290243Z Attempting backtrace. You can use the following information to find out
2022-09-28T08:12:17.779294171Z where mysqld died. If you see no messages after this, something went
2022-09-28T08:12:17.779298118Z terribly wrong...
2022-09-28T08:12:17.779303017Z stack_bottom = 0x7fa520493d98 thread_stack 0x49000
2022-09-28T08:12:17.781611209Z Printing to addr2line failed
2022-09-28T08:12:17.781784559Z mariadbd(my_print_stacktrace+0x32)[0x561b6b988e52]
2022-09-28T08:12:17.782347920Z mariadbd(handle_fatal_signal+0x485)[0x561b6b4632b5]
2022-09-28T08:12:17.784873369Z /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fa535db1420]
2022-09-28T08:12:17.786797421Z /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fa5358b500b]
2022-09-28T08:12:17.786866150Z /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fa535894859]
2022-09-28T08:12:17.788989976Z mariadbd(+0x6721e8)[0x561b6b0ab1e8]
2022-09-28T08:12:17.789205635Z mariadbd(+0x6825bb)[0x561b6b0bb5bb]
2022-09-28T08:12:17.789539300Z mariadbd(+0x682863)[0x561b6b0bb863]
2022-09-28T08:12:17.789862061Z mariadbd(+0x68583d)[0x561b6b0be83d]
2022-09-28T08:12:17.790313677Z mariadbd(+0xe34f06)[0x561b6b86df06]
2022-09-28T08:12:17.790827351Z mariadbd(+0xd2ec09)[0x561b6b767c09]
2022-09-28T08:12:17.791293038Z mariadbd(+0xd2225b)[0x561b6b75b25b]
2022-09-28T08:12:17.791670050Z mariadbd(+0x8966ee)[0x561b6b2cf6ee]
2022-09-28T08:12:17.792054371Z mariadbd(_Z17mysql_alter_tableP3THDPK25st_mysql_const_lex_stringS3_P14HA_CREATE_INFOP10TABLE_LISTP10Alter_infojP8st_orderbb+0x4aad)[0x561b6b2ddf6d]
2022-09-28T08:12:17.792456985Z mariadbd(_ZN19Sql_cmd_alter_table7executeEP3THD+0x3c7)[0x561b6b347617]
2022-09-28T08:12:17.792872695Z mariadbd(_Z21mysql_execute_commandP3THDb+0x110e)[0x561b6b22435e]
2022-09-28T08:12:17.793232864Z mariadbd(_Z11mysql_parseP3THDPcjP12Parser_state+0x1e7)[0x561b6b213f07]
2022-09-28T08:12:17.793580742Z mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjb+0x1095)[0x561b6b2201e5]
2022-09-28T08:12:17.793955719Z mariadbd(_Z10do_commandP3THDb+0x138)[0x561b6b222428]
2022-09-28T08:12:17.794383056Z mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x3b7)[0x561b6b342637]
2022-09-28T08:12:17.794778822Z mariadbd(handle_one_connection+0x5d)[0x561b6b34298d]
2022-09-28T08:12:17.795246299Z mariadbd(+0xc50556)[0x561b6b689556]
2022-09-28T08:12:17.797282040Z /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fa535da5609]
2022-09-28T08:12:17.799272466Z /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fa535991133]
2022-09-28T08:12:17.799322119Z 
2022-09-28T08:12:17.799329503Z Trying to get some variables.
2022-09-28T08:12:17.799331937Z Some pointers may be invalid and cause the dump to abort.
2022-09-28T08:12:17.799334212Z Query (0x7fa4d4010920): -- Table 'track' -------------------------------------------------------------------------------------------------------
2022-09-28T08:12:17.799336225Z 
2022-09-28T08:12:17.799337848Z -- update musical features
2022-09-28T08:12:17.799341345Z alter table track
2022-09-28T08:12:17.799345062Z   add musical_feature_secondary_mood_cluster varchar(50) null after musical_feature_primary_mood_cluster,
2022-09-28T08:12:17.799351815Z   add musical_feature_secondary_instrument   varchar(50) null after musical_feature_dominant_instrument,
2022-09-28T08:12:17.799392220Z   drop column musical_feature_bpm
2022-09-28T08:12:17.799401257Z 
2022-09-28T08:12:17.799412989Z Connection ID (thread ID): 17
2022-09-28T08:12:17.799414953Z Status: NOT_KILLED
2022-09-28T08:12:17.799416646Z 
2022-09-28T08:12:17.799418560Z Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
2022-09-28T08:12:17.799422697Z 
2022-09-28T08:12:17.799424330Z The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
2022-09-28T08:12:17.799426154Z information that should help you find out what is causing the crash.
2022-09-28T08:12:17.799477245Z Writing a core file...
2022-09-28T08:12:17.799536877Z Working directory at /var/lib/mysql
2022-09-28T08:12:17.799568316Z Resource Limits:
2022-09-28T08:12:17.799574297Z Limit                     Soft Limit           Hard Limit           Units     
2022-09-28T08:12:17.799578605Z Max cpu time              unlimited            unlimited            seconds   
2022-09-28T08:12:17.799582552Z Max file size             unlimited            unlimited            bytes     
2022-09-28T08:12:17.799586409Z Max data size             unlimited            unlimited            bytes     
2022-09-28T08:12:17.799590237Z Max stack size            8388608              unlimited            bytes     
2022-09-28T08:12:17.799594004Z Max core file size        0                    unlimited            bytes     
2022-09-28T08:12:17.799597761Z Max resident set          unlimited            unlimited            bytes     
2022-09-28T08:12:17.799601498Z Max processes             unlimited            unlimited            processes 
2022-09-28T08:12:17.799605265Z Max open files            1048576              1048576              files     
2022-09-28T08:12:17.799609743Z Max locked memory         83968000             83968000             bytes     
2022-09-28T08:12:17.799615324Z Max address space         unlimited            unlimited            bytes     
2022-09-28T08:12:17.799621024Z Max file locks            unlimited            unlimited            locks     
2022-09-28T08:12:17.799626555Z Max pending signals       62834                62834                signals   
2022-09-28T08:12:17.799632175Z Max msgqueue size         819200               819200               bytes     
2022-09-28T08:12:17.799638637Z Max nice priority         0                    0                    
2022-09-28T08:12:17.799686116Z Max realtime priority     0                    0                    
2022-09-28T08:12:17.799725771Z Max realtime timeout      unlimited            unlimited            us        
2022-09-28T08:12:17.799736621Z Core pattern: core
2022-09-28T08:12:17.799741841Z 
2022-09-28T08:12:17.799748593Z Fatal signal 11 while backtracing

To me, it looks like MariaDB on the docker image cannot read/write lock/unlock fast enough to a local volume on Windows.

Just as a side note: We encountered similar issues half a year ago and they were related to table renames and we were able to link them to this specific WSL issue: https://github.com/microsoft/WSL/issues/8443
However, this issue seems to be resolved now...

Comment by Marko Mäkelä [ 2022-09-28 ]

tofi86, thank you. That WSL bug was fingered as the cause of MDEV-24189. Can you please provide more details there?

This bug that was filed by Roel feels to me more like a potential Linux kernel bug, while your bug looks like something in Microsoft’s code.

I have seen some Linux kernels misbehave when a system is grossly overloaded. This happens in our internal stress testing environments, which could be running hundreds of database server processes concurrently. Occasionally, server reboots are necessary.

Comment by Daniel Black [ 2023-06-21 ]

Critical aspect of this bug is : preallocating 98304 bytes for file ./test/t2#P#p4560.ibd failed with error 9

The 9 error code is EBADF, so the file descriptor isn't valid. in os_file_set_size this could be straight from fstat rather than posix_fallocate that the error implies.

The os_file_set_size is called by fil_space_extend_must_retry

in fil_mutex_enter_and_prepare_for_io, there is a } else if (!node || node->is_open()). If this condition fails, the assumption is the else condition will open node, however it doesn't. and then can call fil_space_extend_must_retry potentially with a not open file.

Maybe it be forced with:

$ git diff
diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index ebeb1f66101..567e5ff0288 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -930,7 +930,14 @@ fil_mutex_enter_and_prepare_for_io(ulint space_id)
                        anything; if the space does not exist, we handle the
                        situation in the function which called this
                        function */
+                       if (node->is_open() && space->recv_size !=0) {
+                              node->n_pending= 0;
+                               node->close();
+                               goto next;
+                       }
+
                } else {
+next:
                        while (fil_system.n_open >= srv_max_n_open_files) {
                                /* Too many files are open */
                                if (fil_try_to_close_file_in_LRU(count > 1)) {

Comment by Roel Van de Paar [ 2023-09-08 ]

The frequency of the crash is very low across all test runs. It does however happen from time to time and I found 3 occurrences, of which two were in a 11.2[.0] debug build of 22 April 2023 (exact revision unknown, build previously deleted) and one in a 11.0 debug build of 7 April 2023 at revision 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203. Updating versions to match.

fsize != os_offset_t(-1)|SIGABRT|ut_dbg_assertion_failed|fil_space_extend_must_retry|fil_space_extend|fsp_try_extend_data_file_with_pages

In all 3 cases, partitions are involved. The important part of the SQL trace seems to always be something similar to:

CREATE TABLE t1 (a INT NOT NULL, KEY(a)) ENGINE='InnoDB' PARTITION BY HASH(a) PARTITIONS 8192;

Or

CREATE TABLE t2 (a INT, KEY(a)) ENGINE='a' PARTITION BY KEY(a) PARTITIONS 8192;

I expect that running enough iterations of either of these statements (or both) would eventually lead to a crash.

Comment by Roel Van de Paar [ 2023-09-08 ]

I was able to repeat a related issue under MTR in the 11.0 (of 7 April 2023) quoted in the previous comment using this testcase:

--source include/have_innodb.inc
--source include/have_partition.inc
--delimiter $$
BEGIN NOT ATOMIC
  DECLARE CONTINUE HANDLER FOR 1210 BEGIN END;
  LOOP
    DROP TABLE IF EXISTS t1;
    DROP TABLE IF EXISTS t2;
    CREATE TABLE t1 (a INT NOT NULL, KEY(a)) ENGINE='InnoDB' PARTITION BY HASH(a) PARTITIONS 8192;
    CREATE TABLE t2 (a INT, KEY(a)) ENGINE='InnoDB' PARTITION BY KEY(a) PARTITIONS 8192;
    SELECT SLEEP(1);
  END LOOP;
END $$
--delimiter ;

Executed as:

./mtr --parallel 50 --repeat 1000 test

11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203

2023-09-08 14:09:19 0 [Note] /data/NEWBUGS/test/mariadb-11.0.2-linux-x86_64/bin/mariadbd: ready for connections.
Version: '11.0.2-MariaDB-debug-log'  socket: '/data/NEWBUGS/test/mariadb-11.0.2-linux-x86_64/mysql-test/var/tmp/4/mysqld.1.sock'  port: 16280  MariaDB Server
2023-09-08 14:09:23 4 [Note] InnoDB: Cannot close file ./test/t1#P#p412.ibd because of 1 pending operations
2023-09-08 14:09:30 4 [Note] InnoDB: Cannot close file ./test/t1#P#p1250.ibd because of 1 pending operations
2023-09-08 14:09:35 4 [Note] InnoDB: Cannot close file ./test/t1#P#p1884.ibd because of 1 pending operations
2023-09-08 14:09:40 4 [Note] InnoDB: Cannot close file ./test/t1#P#p2448.ibd because of 1 pending operations
2023-09-08 14:09:45 4 [Note] InnoDB: Cannot close file ./test/t1#P#p3074.ibd because of 1 pending operations
2023-09-08 14:09:47 0x14d4519ec640  InnoDB: Assertion failure in file /test/11.0_dbg/storage/innobase/fil/fil0fil.cc line 2981
InnoDB: Failing assertion: fil_system.n_open == n_open
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.

The assert happens 2 seconds after the last cannot close file note.
This assert is also mentioned in MDEV-31347, MDEV-31152, and MDEV-31080.

11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203

Core was generated by `/data/NEWBUGS/test/mariadb-11.0.2-linux-x86_64/bin/mariadbd --defaults-group-su'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22902134982208)
    at ./nptl/pthread_kill.c:44
[Current thread is 1 (Thread 0x14d4519ec640 (LWP 263893))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22902134982208) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=22902134982208) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=22902134982208, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x000014d474442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x000014d4744287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000055f3dfb1592c in ut_dbg_assertion_failed (expr=expr@entry=0x55f3e0102372 "fil_system.n_open == n_open", file=file@entry=0x55f3e01003e0 "/test/11.0_dbg/storage/innobase/fil/fil0fil.cc", line=line@entry=2981) at /test/11.0_dbg/storage/innobase/ut/ut0dbg.cc:60
#6  0x000055f3dfbf5ad0 in fil_validate () at /test/11.0_dbg/storage/innobase/fil/fil0fil.cc:2981
#7  0x000055f3dfbf5b30 in fil_validate_skip () at /test/11.0_dbg/storage/innobase/fil/fil0fil.cc:212
#8  0x000055f3dfbf5b98 in fil_space_t::io (this=0x55f3e20bfc20, type=@0x14d4519eb6e0: {bpage = 0x14d4732ce820, slot = 0x0, node = 0x55f3e20bfd60, type = IORequest::WRITE_ASYNC}, offset=49152, len=len@entry=16384, buf=buf@entry=0x14d4732fc000, bpage=bpage@entry=0x14d4732ce820) at /test/11.0_dbg/storage/innobase/fil/fil0fil.cc:2759
#9  0x000055f3dfb862e8 in buf_dblwr_t::flush_buffered_writes_completed (this=<optimized out>, request=@0x14d4519eb960: {bpage = 0x0, slot = 0x0, node = 0x55f3e2097ab0, type = IORequest::DBLWR_BATCH}) at /test/11.0_dbg/storage/innobase/buf/buf0dblwr.cc:713
#10 0x000055f3dfbf6420 in fil_aio_callback (request=@0x14d4519eb960: {bpage = 0x0, slot = 0x0, node = 0x55f3e2097ab0, type = IORequest::DBLWR_BATCH}) at /test/11.0_dbg/storage/innobase/fil/fil0fil.cc:2866
#11 0x000055f3df9dd26d in io_callback (cb=0x55f3e205d838) at /test/11.0_dbg/storage/innobase/os/os0file.cc:3452
#12 0x000055f3dfc732a1 in tpool::task_group::execute (this=0x55f3e2051568, t=t@entry=0x55f3e205d890) at /test/11.0_dbg/tpool/task_group.cc:70
#13 0x000055f3dfc734f9 in tpool::task::execute (this=0x55f3e205d890) at /test/11.0_dbg/tpool/task.cc:32
#14 0x000055f3dfc71284 in tpool::thread_pool_generic::worker_main (this=0x55f3e1ff7e20, thread_var=0x55f3e1ff9050) at /test/11.0_dbg/tpool/tpool_generic.cc:580
#15 0x000055f3dfc72600 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:74
#16 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
#17 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:253
#18 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:260
#19 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
#20 0x000014d4748dc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#21 0x000014d474494b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#22 0x000014d474526a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The assertion is readily repeatable.

Comment by Roel Van de Paar [ 2023-09-08 ]

It shows the Cannot close file notes after some time for 11.3 also, though this time on innodb_index_stats.ibd only:

11.3.0 7ba9c7fb84b5f28e4736656b57d9508b70ca6369 (Debug)

2023-09-08 15:32:39 0 [Note] /test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/bin/mariadbd: ready for connections.
Version: '11.3.0-MariaDB-debug-log'  socket: '/test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/mariadb-test/var/tmp/28/mysqld.1.sock'  port: 16500  MariaDB Server
2023-09-08 15:32:55 4 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations
2023-09-08 15:40:14 4 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 8 pending operations
2023-09-08 15:41:20 4 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of 3 pending operations
2023-09-08 15:47:25 4 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations

The number of open files is set to the maximum (1048576) so it is unlikely that we ran out of file descriptors. sudo lsof | wc -l gives 455507 (~43% of capacity).
Also, oddly, only ./var/28/log/mysqld.1.err exists (which showed the Cannot close file issue above), there are no other MTR instances that created a log (though all subdirectories are present and have a full data directory in them, but no other files - only directory 28 has the files in the log dir).

/test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/mariadb-test$ ls var/*/log/*err
var/28/log/mysqld.1.err
test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/mariadb-test$

Comment by Roel Van de Paar [ 2023-09-08 ]

From testing, it looks like at least 1) the fil_system.n_open == n_open assertion and 2) the Cannot close file ... t1#P#p412.ibd like notes are resolved post 11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203 or earlier.

It is not clear if the original crash still exists or not.

Also, even on recent builds, there is still the Cannot close file ./mysql/innodb_index_stats.ibd notes, which are somewhat easily reproducible with the MTR testcase here.

And, there is the MTR oddity when running that testcase with --parallel=50 that only a single var subdir seems to end up with log files.

marko Can you please have a look at the innodb_index_stats.ibd notes, and see if you get the same MTR oddness when running the testcase.

(Any MTR 900 second timeouts are normal and due to the loop in the testcase. It likely indicates proper working of the server.)

Comment by Marko Mäkelä [ 2023-09-11 ]

danblack had written the following:

Critical aspect of this bug is : preallocating 98304 bytes for file ./test/t2#P#p4560.ibd failed with error 9

The 9 error code is EBADF, so the file descriptor isn't valid. in os_file_set_size this could be straight from fstat rather than posix_fallocate that the error implies.

The cause for that could be MDEV-31347 or MDEV-31256.

There is an open bug related to InnoDB persistent statistics and partitioned tables, in MDEV-29566. Roel, is everything working for you if you set innodb_stats_persistent=OFF?

Comment by Roel Van de Paar [ 2023-09-16 ]

The MTR single error log/thread oddity turns out to be an MTR bug, logged as MDEV-32185 MTR: --parallel does not work correctly when using LOOP

Comment by Sebastian [ 2023-12-13 ]

We encountered a similiar crash and managed to reproduce it by doing intensive table operations. After investigations, we found out that the problem was the global variable innodb_open_files, which we have set to 2000. After raising it to 16384, the problem disappeared.

The following scenario managed to crash it in ~3-5 min when inodb_open_files was set to 2000.

First created some dummy sql

n_dbs=20
n_tbls=1000
 
for (( i=1; i<=${n_dbs}; i++ )); do
  cat <<EOF >db${i}.sql
DROP DATABASE IF EXISTS db${i};
CREATE DATABASE db${i};
USE db${i};
CREATE TABLE \`cycle\` (
  \`cc_cycle_id\` int(10) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (\`cc_cycle_id\`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE \`account\` (
  \`ac_account_id\` int(10) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (\`ac_account_id\`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
EOF
  for (( j=1; j<=${n_tbls}; j++ )); do
    cat <<EOF >>db${i}.sql
CREATE TABLE \`johndoe${j}\` (
  \`xyz_johndoe_id\` int(10) unsigned NOT NULL AUTO_INCREMENT,
  \`xyz_cc_cycle_id\` int(10) unsigned NOT NULL,
  \`xyz_transaction_number\` varchar(100) NOT NULL,
  \`xyz_record_number\` varchar(45) DEFAULT NULL,
  \`xyz_ac_account_id\` int(10) unsigned NOT NULL,
  \`xyz_buy_sell\` tinyint(4) NOT NULL,
  \`xyz_quantity\` int(10) unsigned NOT NULL,
  \`xyz_in_instrument_id\` int(10) unsigned NOT NULL,
  \`xyz_price\` decimal(18,8) NOT NULL,
  \`xyz_settlement_price\` decimal(18,8) NOT NULL,
  \`xyz_market_value\` decimal(12,2) NOT NULL,
  \`xyz_pmvarn\` decimal(12,2) NOT NULL,
  \`xyz_multiplier\` decimal(12,2) NOT NULL,
  \`xyz_comm\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee1\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee2\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee3\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee4\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee5\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee6\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee7\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee8\` decimal(9,2) DEFAULT NULL,
  \`xyz_fee9\` decimal(9,2) DEFAULT NULL,
  \`xyz_gichg\` decimal(9,2) DEFAULT NULL,
  \`xyz_bkchg\` decimal(9,2) DEFAULT NULL,
  \`xyz_other\` decimal(9,2) DEFAULT NULL,
  \`xyz_batch\` varchar(10) DEFAULT NULL,
  \`xyz_source_file\` varchar(200) NOT NULL,
  \`xyz_source_record_no\` int(11) NOT NULL,
  PRIMARY KEY (\`xyz_johndoe_id\`),
  KEY \`FKa_lop_tt_trade_transaction_id_idx${j}\` (\`xyz_transaction_number\`),
  KEY \`FKa_lop_cc_cycle_id_idx${j}\` (\`xyz_cc_cycle_id\`),
  KEY \`FKa_lop_linn_tt_idx${j}\` (\`xyz_transaction_number\`),
  KEY \`fka_LINN_open_position_account1_idx${j}\` (\`xyz_ac_account_id\`),
  KEY \`fka_LINN_open_position_instrument1_idx${j}\` (\`xyz_in_instrument_id\`),
  CONSTRAINT \`FKa_lop_cc_cycle_id${j}\` FOREIGN KEY (\`xyz_cc_cycle_id\`) REFERENCES \`cycle\` (\`cc_cycle_id\`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT \`fka_LINN_open_position_account1${j}\` FOREIGN KEY (\`xyz_ac_account_id\`) REFERENCES \`account\` (\`ac_account_id\`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
EOF
  done
done

And then applied them in parallel:

creds="-uroot -pjohndoe -h127.0.0.1 -P3306"
touch dump_run
date
for db in $(find . -name "*.sql"); do
  while [[ -f ./dump_run ]]; do
    mysql ${creds} <$db || break
  done &
done

Comment by Marko Mäkelä [ 2023-12-13 ]

Sebieee, which version of MariaDB were you using? Did it include the fixes of the bugs MDEV-31256 and MDEV-31347?

Comment by Sebastian [ 2023-12-13 ]

For the reproducible test case I used the image docker.io/library/mariadb:11.0.2, running with podman on CentOS 7.

Comment by Marko Mäkelä [ 2023-12-13 ]

Sebieee, 11.0.2 is affected by MDEV-31256 and MDEV-31347; their fixed version includes 11.0.3. Can you test something newer?

Generated at Thu Feb 08 09:44:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.