[MDEV-26555] main.innodb_ext_key fatal assertion Created: 2021-09-07  Updated: 2021-11-24  Resolved: 2021-11-24

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Daniel Black
Resolution: Duplicate Votes: 0
Labels: None

Attachments: Text File CMakeCache.txt     File gcc_build_gdb_output.log     Text File gdb-gcc-liburing-0.7-4.fc34.txt     Text File gdb.txt     File gdb_output.log    
Issue Links:
Relates
relates to MDEV-24883 add io_uring support for tpool Closed
relates to MDEV-26674 io_uring related hangs on the Linux k... Closed

 Description   

$ mysql-test/mtr --repeat=30 --parallel=4 main.innodb_ext_key 
Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl  --repeat=30 --parallel=4 main.innodb_ext_key
vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'...
Checking supported features...
MariaDB Version 10.6.5-MariaDB
 - SSL connections supported
 - binaries built with wsrep patch
Collecting tests...
Installing system database...
 
==============================================================================
 
TEST                                  WORKER RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------
 
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
worker[4] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
worker[2] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
worker[3] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
main.innodb_ext_key 'innodb,off,unoptimized' w4 [ pass ]   2918
main.innodb_ext_key 'covering,innodb,off' w3 [ pass ]   2958
main.innodb_ext_key 'innodb,on,unoptimized' w2 [ pass ]   2918
main.innodb_ext_key 'covering,innodb,on' w1 [ pass ]   2972
main.innodb_ext_key 'covering,innodb,off' w3 [ 2 pass ]   2707
main.innodb_ext_key 'innodb,on,unoptimized' w2 [ 2 pass ]   2722
main.innodb_ext_key 'covering,innodb,on' w1 [ 2 pass ]   2616
main.innodb_ext_key 'covering,innodb,off' w3 [ 3 pass ]   2654
main.innodb_ext_key 'covering,innodb,on' w1 [ 3 pass ]   2653
main.innodb_ext_key 'innodb,on,unoptimized' w2 [ 3 pass ]   2733
main.innodb_ext_key 'covering,innodb,off' w3 [ 4 pass ]   2400
main.innodb_ext_key 'covering,innodb,on' w1 [ 4 pass ]   2426
main.innodb_ext_key 'innodb,on,unoptimized' w2 [ 4 pass ]   2507
main.innodb_ext_key 'covering,innodb,off' w3 [ 5 pass ]   2419
main.innodb_ext_key 'covering,innodb,on' w1 [ 5 pass ]   2371
main.innodb_ext_key 'covering,innodb,off' w3 [ 6 pass ]   5280
main.innodb_ext_key 'covering,innodb,on' w1 [ 6 pass ]   5267
main.innodb_ext_key 'covering,innodb,off' w3 [ 7 pass ]   2155
main.innodb_ext_key 'covering,innodb,on' w1 [ 7 pass ]   2143
main.innodb_ext_key 'covering,innodb,on' w1 [ 8 pass ]   2792
main.innodb_ext_key 'covering,innodb,off' w3 [ 8 pass ]   2825
main.innodb_ext_key 'covering,innodb,on' w1 [ 9 pass ]   2069
main.innodb_ext_key 'covering,innodb,off' w3 [ 9 pass ]   2031
main.innodb_ext_key 'covering,innodb,on' w1 [ 10 pass ]   1834
main.innodb_ext_key 'covering,innodb,off' w3 [ 10 pass ]   1848
main.innodb_ext_key 'covering,innodb,on' w1 [ 11 pass ]   3932
main.innodb_ext_key 'covering,innodb,off' w3 [ 11 pass ]   3929
main.innodb_ext_key 'covering,innodb,on' w1 [ 12 pass ]   2203
main.innodb_ext_key 'covering,innodb,off' w3 [ 12 pass ]   2303
main.innodb_ext_key 'covering,innodb,on' w1 [ 13 pass ]   6001
main.innodb_ext_key 'covering,innodb,off' w3 [ 13 pass ]   5990
main.innodb_ext_key 'covering,innodb,on' w1 [ 14 pass ]   1789
main.innodb_ext_key 'covering,innodb,off' w3 [ 14 pass ]   1782
main.innodb_ext_key 'covering,innodb,on' w1 [ 15 pass ]   1881
main.innodb_ext_key 'covering,innodb,off' w3 [ 15 pass ]   1896
main.innodb_ext_key 'covering,innodb,on' w1 [ 16 pass ]   2187
main.innodb_ext_key 'covering,innodb,off' w3 [ 16 pass ]   2261
main.innodb_ext_key 'covering,innodb,on' w1 [ 17 pass ]   2011
main.innodb_ext_key 'covering,innodb,off' w3 [ 17 pass ]   2048
main.innodb_ext_key 'covering,innodb,on' w1 [ 18 pass ]   1838
main.innodb_ext_key 'covering,innodb,off' w3 [ 18 pass ]   1862
main.innodb_ext_key 'covering,innodb,on' w1 [ 19 pass ]   1850
main.innodb_ext_key 'covering,innodb,off' w3 [ 19 pass ]   1761
main.innodb_ext_key 'covering,innodb,on' w1 [ 20 pass ]   1855
main.innodb_ext_key 'covering,innodb,off' w3 [ 20 pass ]   1907
main.innodb_ext_key 'covering,innodb,on' w1 [ 21 pass ]   2080
main.innodb_ext_key 'covering,innodb,off' w3 [ 21 pass ]   2134
main.innodb_ext_key 'covering,innodb,on' w1 [ 22 pass ]   1986
main.innodb_ext_key 'covering,innodb,off' w3 [ 22 pass ]   2019
main.innodb_ext_key 'covering,innodb,on' w1 [ 23 pass ]   1969
main.innodb_ext_key 'covering,innodb,off' w3 [ 23 pass ]   1855
main.innodb_ext_key 'covering,innodb,on' w1 [ 24 pass ]   1899
main.innodb_ext_key 'covering,innodb,off' w3 [ 24 pass ]   1803
main.innodb_ext_key 'covering,innodb,on' w1 [ 25 pass ]   2050
main.innodb_ext_key 'covering,innodb,off' w3 [ 25 pass ]   2034
main.innodb_ext_key 'covering,innodb,on' w1 [ 26 pass ]   2068
main.innodb_ext_key 'covering,innodb,off' w3 [ 26 pass ]   2081
main.innodb_ext_key 'covering,innodb,on' w1 [ 27 pass ]   2056
main.innodb_ext_key 'covering,innodb,off' w3 [ 27 pass ]   2210
main.innodb_ext_key 'covering,innodb,on' w1 [ 28 pass ]   1868
main.innodb_ext_key 'covering,innodb,off' w3 [ 28 pass ]   1855
main.innodb_ext_key 'covering,innodb,on' w1 [ 29 pass ]   1852
main.innodb_ext_key 'covering,innodb,off' w3 [ 29 pass ]   1786
main.innodb_ext_key 'covering,innodb,off' w3 [ 30 pass ]   2099
main.innodb_ext_key 'covering,innodb,on' w1 [ 30 pass ]   2205
 
 
 
main.innodb_ext_key 'innodb,off,unoptimized' w4 [ 2 fail ]
        Test ended at 2021-09-07 22:28:31
 
CURRENT_TEST: main.innodb_ext_key
mysqltest: At line 588: query 'CREATE TABLE t1 (
pk INT,
f1 VARCHAR(3),
f2 VARCHAR(1024),
PRIMARY KEY (pk),
KEY(f2)
) ENGINE=InnoDB CHARSET utf8 ROW_FORMAT= DYNAMIC' failed: <Unknown> (2013): Lost connection to server during query
 
The result from queries just before the failure was:
< snip >
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t1	ref	index_date_updated	index_date_updated	5	const	#	Using index condition
# This used to work from the start:
explain 
select * from t2 force index(index_date_updated) 
where index_date_updated= 10 and index_id < 800;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t2	ref	index_date_updated	index_date_updated	5	const	#	Using index condition
drop table t0,t1,t2;
#
# MDEV-11196: Error:Run-Time Check Failure #2 - Stack around the variable 'key_buff'
# was corrupted, server crashes in opt_sum_query
SET @save_optimizer_use_condition_selectivity=@@optimizer_use_condition_selectivity, @@optimizer_use_condition_selectivity=4;
CREATE TABLE t1 (
pk INT,
f1 VARCHAR(3),
f2 VARCHAR(1024),
PRIMARY KEY (pk),
KEY(f2)
) ENGINE=InnoDB CHARSET utf8 ROW_FORMAT= DYNAMIC;
 
More results from queries before failure can be found in /home/dan/repos/build-mariadb-server-10.6/mysql-test/var/4/log/innodb_ext_key,off.log
 
 
Server [mysqld.1 - pid: 1635380, winpid: 1635380, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2021-09-07 22:28:29 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
210907 22:28:29 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.6.5-MariaDB-log

build from 10.6-40ae9c5d108d52444288e007ce9565919aa91299

Thread 1 (Thread 0x7fa054ff9640 (LWP 1635530)):
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
        pd = <optimized out>
        tid = 1635530
        pid = 1635383
        val = 0
#1  0x00000000008f7bc6 in handle_fatal_signal (sig=6) at /home/dan/repos/mariadb-server-10.6/sql/signal_handler.cc:345
        tm = {tm_sec = 29, tm_min = 28, tm_hour = 22, tm_mday = 7, tm_mon = 8, tm_year = 121, tm_wday = 2, tm_yday = 249, tm_isdst = 0, tm_gmtoff = 36000, tm_zone = 0x2b87e50 "AEST"}
        print_invalid_query_pointer = <optimized out>
        curr_time = 1631017709
        thd = 0x0
#2  <signal handler called>
No locals.
#3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
        set = {__val = {552967, 2147516417, 140326178332088, 46234680, 140326597527936, 13394987, 17938038, 729, 1946169716, 140326597527816, 9223512778638098432, 46234680, 1, 1, 31952248, 46263432}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#4  0x00007fa08f4128a4 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x7fa054ff8a08, sa_sigaction = 0x7fa054ff8a08}, sa_mask = {__val = {140327578090128, 140326379457952, 0, 14, 158, 17937887, 158, 140326597528128, 11548080394786522880, 0, 18446744073709551272, 0, 140327478049406, 140327478049407, 47667776, 140326597528088}}, sa_flags = -1891137592, sa_restorer = 0x7fa054ff8910}
        sigs = {__val = {32, 23245968, 140327478049406, 140327478049407, 140326597528000, 17918744, 140326597527808, 10287617, 140326597528048, 10283534, 109, 140326111055120, 158, 0, 140326111055120, 0}}
#5  0x0000000000c8557f in ib::fatal::~fatal (this=0x7fa054ff8a60) at /home/dan/repos/mariadb-server-10.6/storage/innobase/ut/ut0ut.cc:519
No locals.
#6  0x0000000000c5d720 in srv_monitor_task () at /home/dan/repos/mariadb-server-10.6/storage/innobase/srv/srv0srv.cc:1293
        waited = <optimized out>
        start = <optimized out>
        old_lsn = 8539879
        new_lsn = <optimized out>
        now = <optimized out>
        threshold = <optimized out>
#7  0x0000000000d4310c in tpool::thread_pool_generic::timer_generic::run (this=0x2ea5970) at /home/dan/repos/mariadb-server-10.6/tpool/tpool_generic.cc:313
        expected = false
#8  0x0000000000d439b7 in tpool::task::execute (this=0x2ea59b0) at /home/dan/repos/mariadb-server-10.6/tpool/task.cc:52
No locals.
#9  0x0000000000d41596 in tpool::thread_pool_generic::worker_main (this=0x2d663a0, thread_var=0x2d75a40) at /home/dan/repos/mariadb-server-10.6/tpool/tpool_generic.cc:550
        task = 0x2
#10 0x00007fa08f6afc84 in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#11 0x00007fa08f807299 in start_thread (arg=0x7fa054ff9640) at pthread_create.c:481
        ret = <optimized out>
        pd = 0x7fa054ff9640
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140326597531200, -1804064771434488301, 140327478049406, 140327478049407, 0, 140326597531200, 1852674176714399251, 1853147301372616211}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#12 0x00007fa08f4ec353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
(gdb)  p  dict_sys.latch
$1 = {pfs_psi = 0x7fa08b2eccc0, lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 2147483650}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 2147483648}, <No data fields>}}}

Have noticed in debug and non-debug release



 Comments   
Comment by Daniel Black [ 2021-09-07 ]

CMakeCache.txt:CMAKE_CXX_COMPILER:STRING=/usr/lib64/ccache/clang++
CMakeCache.txt:CMAKE_CXX_COMPILER_AR:FILEPATH=/usr/bin/llvm-ar
CMakeCache.txt:CMAKE_CXX_COMPILER_RANLIB:FILEPATH=/usr/bin/llvm-ranlib
CMakeCache.txt:CMAKE_C_COMPILER:STRING=/usr/lib64/ccache/clang
CMakeCache.txt:CMAKE_C_COMPILER_AR:FILEPATH=/usr/bin/llvm-ar
CMakeCache.txt:CMAKE_C_COMPILER_RANLIB:FILEPATH=/usr/bin/llvm-ranlib
 
$ /usr/lib64/ccache/clang++ --version
clang version 12.0.1 (Fedora 12.0.1-1.fc34)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
 
$  /usr/bin/llvm-ar --version
LLVM (http://llvm.org/):
  LLVM version 12.0.1
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: skylake

Comment by Daniel Black [ 2021-09-07 ]

same with gcc-11.2.1

main.innodb_ext_key 'covering,innodb,on' w3 [ 28 pass ]   1544
main.innodb_ext_key 'covering,innodb,on' w3 [ 29 pass ]   1575
main.innodb_ext_key 'covering,innodb,on' w3 [ 30 pass ]   1245
worker[2] Trying to dump core for [mysqltest - pid: 1654860, winpid: 1654860]
worker[2] Trying to dump core for [mysqld.1 - pid: 1653466, winpid: 1653466]
main.innodb_ext_key 'innodb,off,unoptimized' w2 [ 21 fail ]  timeout after 900 seconds
        Test ended at 2021-09-07 23:31:08
 
Test case timeout after 900 seconds
 
 == /home/dan/repos/build-mariadb-server-10.6-gcc/mysql-test/var/2/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local server through socket '/home/dan/repos/build-mariadb-server-10.6-gcc/mysql-test/var/tmp' (111)
 
 
 - saving '/home/dan/repos/build-mariadb-server-10.6-gcc/mysql-test/var/2/log/main.innodb_ext_key-innodb,off,unoptimized/' to '/home/dan/repos/build-mariadb-server-10.6-gcc/mysql-test/var/log/main.innodb_ext_key-innodb,off,unoptimized/'
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 292.342 of 1011 seconds executing testcases
 
Failure: Failed 1/111 tests, 99.10% were successful.
 
gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Comment by Marko Mäkelä [ 2021-09-08 ]

danblack, can you provide a stack trace of all threads, either from the test case timeout or from a crash with the srv_monitor_task watchdog? I may need more information from that session to determine the entire waits-for chain. The watchdog is only a messenger here, and the 15-minute test case timeout suggests that it was not a false alarm.

I was unable to reproduce this hang myself so far.

Comment by Marko Mäkelä [ 2021-09-08 ]

Sorry, I initially overlooked gdb.txt. The InnoDB threads are the following:

  • Thread 18 (purge_worker_callback) is waiting for an exclusive latch on page 10 (probably the root and leaf page of the SYS_COLUMNS table).
  • Thread 14 (aio_uring::thread_routine) is waiting for something in io_uring_wait_cqe().
  • Thread 12 (buf_flush_page_cleaner) is waiting for buf_dblwr.cond in buf_dblwr_t::flush_buffered_writes().
  • Thread 11 (purge_coordinator_callback) is waiting for the purge worker (Thread 18) to finish.
  • Thread 10 (CREATE TABLE t1) is waiting for an exclusive latch on page 7 (the DICT_HDR page) in dict_hdr_get_new_id().
  • Thread 9 (purge_worker_callback) is waiting for an exclusive latch on page 11 (probably the root and leaf page of the SYS_INDEXES table).
  • Thread 2 (srv_master_callback) is waiting for exclusive dict_sys.latch in dict_sys_t::evict_table_LRU(). This wait could be tripping the watchdog.
  • Thread 1 (srv_monitor_task) is the watchdog that commits suicide due to a long wait on dict_sys.latch.

It would seem to me that the culprit is whatever is blocking Thread 12. The only place where the condition variable would be signaled is buf_dblwr_t::write_completed(), and its only caller is buf_page_write_complete(), whose only caller is fil_aio_callback(), whose only caller is io_callback, registered in os_aio(). I see that Thread 14 is executing something io_uring related.

Can you repeat this with a different version of liburing? Or if you build with libaio instead of io_uring? (Be sure that libaio-dev is available, and cmake -DCMAKE_DISABLE_FIND_PACKAGE_URING=1.)

Comment by Daniel Black [ 2021-09-08 ]

a) Have tried many times on libaio-0.3.111-11.fc34 and any failure isn't repeatable

b) previous liburing was a 2.0 version from git from 6 months ago circa commit d3eb6af47886656e811e8f4e274ac361d0efa5bf on the liburing

I've since purged this out and am back using the distro version liburing-0.7-4.fc34

c) The hang still occured: gdb-gcc-liburing-0.7-4.fc34.txt for backtrace.

Notably in a uring syscall (though a different one)

Thread 2 (Thread 0x7f5c83fff640 (LWP 1708513)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1  0x00007f5c8ee76a32 in __io_uring_get_cqe () from /lib64/liburing.so.1
No symbol table info available.
#2  0x0000555fda356a35 in io_uring_wait_cqe_nr (wait_nr=1, cqe_ptr=0x7f5c83ffec38, ring=0x555fdc8a4cb8) at /usr/include/liburing.h:494
No locals.
#3  io_uring_wait_cqe (cqe_ptr=0x7f5c83ffec38, ring=0x555fdc8a4cb8) at /usr/include/liburing.h:514
No locals.
#4  (anonymous namespace)::aio_uring::thread_routine (aio=0x555fdc8a4cb0) at /home/dan/repos/mariadb-server-10.6/tpool/aio_liburing.cc:137
        ret = <optimized out>
        cqe = 0x7f5c8ef4f990
        iocb = <optimized out>
        res = <optimized out>
#5  0x00007f5c8e83dc84 in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#6  0x00007f5c8e995299 in start_thread (arg=0x7f5c83fff640) at pthread_create.c:481
        ret = <optimized out>
        pd = 0x7f5c83fff640
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140035328308800, -1066475383518023308, 140722199995678, 140722199995679, 0, 140035328308800, 1119735726208299380, 1119711274098602356}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#7  0x00007f5c8e536353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

One more test coming, a build on the libio_uring 2.0 master branch.

Comment by Daniel Black [ 2021-09-08 ]

latest liburing appears to hang like the original gdb backtraces.

fd=0 looks odd here because its mean to be a file descriptor returned by io_uring_setup, which shouldn't be 0.

liburing-a357750b1547a7c5f4caac49bb0c556e4d4bb8c6

Thread 4 (Thread 0x7efede617640 (LWP 1727423) "mariadbd"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1  0x00007efee4c91350 in __sys_io_uring_enter2 (fd=0, to_submit=1, min_complete=3828653581, flags=0, sig=0x8, sz=<optimized out>) at syscall.c:64
No locals.
#2  0x00007efee4c90ca4 in _io_uring_get_cqe (ring=0x5558a7938cb8, cqe_ptr=0x7efede616c38, data=data@entry=0x7efede616c10) at queue.c:122
        flags = 3828653581
        nr_available = <optimized out>
        ret = <optimized out>
        need_enter = <optimized out>
        cq_overflow_flush = <optimized out>
        cqe = <optimized out>
        err = <optimized out>
#3  0x00007efee4c90bb7 in __io_uring_get_cqe (ring=0x8, cqe_ptr=0x0, submit=<optimized out>, wait_nr=3828653581, sigmask=0x0) at queue.c:150
        data = {submit = 0, wait_nr = 1, get_flags = 0, sz = 8, arg = 0x0}
#4  0x00005558a4a0ca45 in io_uring_wait_cqe_nr (wait_nr=1, cqe_ptr=0x7efede616c38, ring=0x5558a7938cb8) at /usr/local/include/liburing.h:745
No locals.
#5  io_uring_wait_cqe (cqe_ptr=0x7efede616c38, ring=0x5558a7938cb8) at /usr/local/include/liburing.h:765
No locals.
#6  (anonymous namespace)::aio_uring::thread_routine (aio=0x5558a7938cb0) at /home/dan/repos/mariadb-server-10.6/tpool/aio_liburing.cc:137
        ret = <optimized out>
        cqe = 0x7efee4d6cf20
        iocb = <optimized out>
        res = <optimized out>
#7  0x00007efee4656c84 in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
#8  0x00007efee47ae299 in start_thread (arg=0x7efede617640) at pthread_create.c:481
        ret = <optimized out>
        pd = 0x7efede617640
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139633117722176, -1501876349270139299, 140736435846142, 140736435846143, 0, 139633117722176, 1645502336706109021, 1645374754647006813}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#9  0x00007efee434f353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Comment by Marko Mäkelä [ 2021-09-17 ]

I am unable to reproduce any hang on my Debian Sid (unstable) system, using cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo:

10.6 106b16a5af7a8e29889e0ef8930bd2901a9ad00e

./mtr --parallel=auto --repeat=100 main.innodb_ext_key{,,,,,,,,,,,,,,,}
main.innodb_ext_key 'covering,innodb,on' w30 [ 100 pass ]    383
main.innodb_ext_key 'covering,innodb,on' w43 [ 100 pass ]    379
--------------------------------------------------------------------------
The servers were restarted 18 times
Spent 4162.514 of 153 seconds executing testcases
 
Completed: All 6400 tests were successful.

This was with liburing1 version 0.7-3, GCC 11.2.0 and

$ cat /proc/version 
Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-4 (2021-08-03)

Comment by Daniel Black [ 2021-09-20 ]

updated /tmp/gdb_output.log with a -O0 -g build of liburing latest. At least its showing fd 8 now, fd 0 in mtr is /dev/null.

5.13.16-200.fc34.x86_64

Stack makes sense that its waiting on a io_uring event. I need to look more tomorrow.

Comment by Marko Mäkelä [ 2021-09-24 ]

Thread 14 in gdb.txt is stuck in io_uring_wait_cqe. This looks related to MDEV-26674 (Debian with Linux 5.14 kernel). I never saw such trouble with the 5.10 kernel that I was using previously.

Comment by Daniel Black [ 2021-09-28 ]

Also reproducible on same build in 5.10 kernel (per MDEV-26674).

5.15rc appears to have fixes.

5.15.0-0.rc2.20210923git58e2cf5d7946.21.fc36.x86_64

$ mysql-test/mtr --repeat=30 --parallel=1 main.innodb_ext_key
 
main.innodb_ext_key 'innodb,off,unoptimized' [ 19 pass ]    378
main.innodb_ext_key 'innodb,off,unoptimized' [ 20 pass ]    386
main.innodb_ext_key 'innodb,off,unoptimized' [ 21 pass ]    383
main.innodb_ext_key 'innodb,off,unoptimized' [ 22 pass ]    396
 
 
main.innodb_ext_key 'innodb,off,unoptimized' [ 23 pass ]  60210
....
 
The servers were restarted 3 times
Spent 106.395 of 146 seconds executing testcases
 
Completed: All 120 tests were successful.

That one very anomalous timing wasn't reproducible.

bigger and longer - repeated twice

$ mysql-test/mtr --repeat=30 --parallel=8 main.innodb_ext_key{,,,,,,,}
 
...
--------------------------------------------------------------------------
The servers were restarted 12 times
Spent 1063.687 of 200 seconds executing testcases
 
Completed: All 960 tests were successful.

Comment by Jan Lindström (Inactive) [ 2021-09-30 ]

Even after rerun. I could not reproduce this with my Ubuntu Hirsute Hippo, 5.11.0-37-generic and 10.6 (I verified that liburing 0.7-3ubuntu1 was in ldd).

Comment by Daniel Black [ 2021-11-24 ]

Duplicate of MDEV-26674

Generated at Thu Feb 08 09:46:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.