Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.6.20
-
None
Description
I got a (seemingly quite rare) local test failure in rpl.rpl_from_mysql80. The failure seems unrelated to replication. The test case does a server shutdown immediately at the start of the test. This shutdown hung for 60 seconds and was then terminated with SIGABRT. The test case completed without any failures, but the SIGABRT gets flagged with "warnings found in the server logs" of course.
rpl.rpl_from_mysql80 'row' w13 [ 39 fail ] Found warnings/errors in server log file!
|
Test ended at 2025-01-24 09:25:48
|
line
|
250124 9:25:44 [ERROR] /kvm/src/my/dev/mariadb2/sql/mariadbd got signal 6 ;
|
Here from the error log we can see that the shutdown hung for 60 seconds:
2025-01-24 9:24:44 0 [Note] InnoDB: Starting shutdown...
|
250124 9:25:44 [ERROR] /kvm/src/my/dev/mariadb2/sql/mariadbd got signal 6 ;
|
At the time of hang, only 5 threads remain running. The timer_handler and the ma_checkpoint_background threads are just waiting normally, these remaining 3 innodb threads seem to be what get stuck:
Thread 3 (Thread 0x7ff2990546c0 (LWP 2447875)):
|
#1 0x00007ff2a54a75d8 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6
|
#2 0x000055ada48e513a in safe_cond_wait (cond=0x55adcea4f760, mp=0x55adcea4f6b0, file=0x55ada4d38c98 "/kvm/src/my/dev/mariadb2/tpool/tpool_structs.h", line=181) at /kvm/src/my/dev/mariadb2/mysys/thr_mutex.c:487
|
#3 0x000055ada44e1e89 in tpool::cache<tpool::aiocb>::wait (this=0x55adcea4f6b0, m=...) at /kvm/src/my/dev/mariadb2/tpool/tpool_structs.h:181
|
#4 0x000055ada44e12ae in tpool::cache<tpool::aiocb>::wait (this=0x55adcea4f6b0) at /kvm/src/my/dev/mariadb2/tpool/tpool_structs.h:189
|
#5 0x000055ada44e064c in io_slots::wait (this=0x55adcea4f6b0) at /kvm/src/my/dev/mariadb2/storage/innobase/os/os0file.cc:116
|
#6 0x000055ada44de254 in os_aio_wait_until_no_pending_reads (declare=true) at /kvm/src/my/dev/mariadb2/storage/innobase/os/os0file.cc:3493
|
#7 0x000055ada470b262 in buf_load () at /kvm/src/my/dev/mariadb2/storage/innobase/buf/buf0dump.cc:665
|
#8 0x000055ada470b480 in buf_dump_load_func () at /kvm/src/my/dev/mariadb2/storage/innobase/buf/buf0dump.cc:712
|
#9 0x000055ada4836eb3 in tpool::task::execute (this=0x55ada5ef77c0 <buf_dump_load_task>) at /kvm/src/my/dev/mariadb2/tpool/task.cc:37
|
|
Thread 2 (Thread 0x7ff29b0606c0 (LWP 2447861)):
|
#1 0x00007ff2a6116c39 in __io_uring_get_cqe () from /lib/x86_64-linux-gnu/liburing.so.2
|
#2 0x000055ada48375d8 in io_uring_wait_cqe_nr (ring=0x55adcea4f3d8, cqe_ptr=0x7ff29b05fb90, wait_nr=1) at /usr/include/liburing.h:1175
|
#3 0x000055ada483776e in io_uring_wait_cqe (ring=0x55adcea4f3d8, cqe_ptr=0x7ff29b05fb90) at /usr/include/liburing.h:1248
|
#4 0x000055ada4838190 in (anonymous namespace)::aio_uring::thread_routine (aio=0x55adcea4f3d0) at /kvm/src/my/dev/mariadb2/tpool/aio_liburing.cc:145
|
#5 0x000055ada4838848 in std::__invoke_impl<void, void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> (__f=@0x55adce8289f0: 0x55ada483816d <(anonymous namespace)::aio_uring::thread_routine((anonymous namespace)::aio_uring*)>) at /usr/include/c++/12/bits/invoke.h:61
|
#6 0x000055ada48387cb in std::__invoke<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> (__fn=@0x55adce8289f0: 0x55ada483816d <(anonymous namespace)::aio_uring::thread_routine((anonymous namespace)::aio_uring*)>) at /usr/include/c++/12/bits/invoke.h:96
|
#7 0x000055ada483873b in std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::_M_invoke<0, 1> (this=0x55adce8289e8) at /usr/include/c++/12/bits/std_thread.h:252
|
#8 0x000055ada48386f4 in std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::operator() (this=0x55adce8289e8) at /usr/include/c++/12/bits/std_thread.h:259
|
#9 0x000055ada48386d8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> > >::_M_run (this=0x55adce8289e0) at /usr/include/c++/12/bits/std_thread.h:210
|
|
Thread 1 (Thread 0x7ff2a5ea1280 (LWP 2447809)):
|
#5 0x00007ff2a54a75d8 in pthread_cond_wait () from /lib/x86_64-linux-gnu/libc.so.6
|
#6 0x000055ada4837052 in tpool::waitable_task::wait (this=0x55ada5ef77c0 <buf_dump_load_task>, lk=...) at /kvm/src/my/dev/mariadb2/tpool/task.cc:63
|
#7 0x000055ada48370a9 in tpool::waitable_task::wait (this=0x55ada5ef77c0 <buf_dump_load_task>) at /kvm/src/my/dev/mariadb2/tpool/task.cc:69
|
#8 0x000055ada470b693 in buf_load_dump_end () at /kvm/src/my/dev/mariadb2/storage/innobase/buf/buf0dump.cc:774
|
#9 0x000055ada447734c in logs_empty_and_mark_files_at_shutdown () at /kvm/src/my/dev/mariadb2/storage/innobase/log/log0log.cc:1065
|
#10 0x000055ada461e131 in innodb_shutdown () at /kvm/src/my/dev/mariadb2/storage/innobase/srv/srv0start.cc:2047
|
#11 0x000055ada4372bfb in innobase_end () at /kvm/src/my/dev/mariadb2/storage/innobase/handler/ha_innodb.cc:4453
|
#12 0x000055ada3f67eed in ha_finalize_handlerton (plugin_=0x55adce9415a0) at /kvm/src/my/dev/mariadb2/sql/handler.cc:595
|
#13 0x000055ada3bc09df in plugin_deinitialize (plugin=0x55adce9415a0, ref_check=true) at /kvm/src/my/dev/mariadb2/sql/sql_plugin.cc:1270
|
#14 0x000055ada3bc0f5a in reap_plugins () at /kvm/src/my/dev/mariadb2/sql/sql_plugin.cc:1346
|
#15 0x000055ada3bc396d in plugin_shutdown () at /kvm/src/my/dev/mariadb2/sql/sql_plugin.cc:2099
|
#16 0x000055ada3a42b1b in clean_up (print_message=true) at /kvm/src/my/dev/mariadb2/sql/mysqld.cc:1952
|
#17 0x000055ada3a4b0fa in mysqld_main (argc=158, argv=0x55adce8153f8) at /kvm/src/my/dev/mariadb2/sql/mysqld.cc:5938
|
#18 0x000055ada3a3f429 in main (argc=26, argv=0x7ffd932d47c8) at /kvm/src/my/dev/mariadb2/sql/main.cc:34
|
The InnoDB shutdown is waiting for buf_load() to complete. buf_load() is doing os_aio_wait_until_no_pending_reads(). It is not clear to me if there is some race due to the shutdown happening so early in the middle of buf_load(), or if there is some generic problem with lost notifications or something in the async read part.
I saw this once, locally on a 16-core AMD VM, after fixing MDEV-35806 that is a more frequent sporadic failure of rpl.rpl_from_mysql80 that will probably usually mask this one:
(cd mysql-test/ && ./mtr --mem rpl.rpl_from_mysql80{,,}{,,,}{,} --parallel=24 --repeat=100)
|
I strongly suspect this will affect other things than rpl.rpl_from_mysql80 sporadic failures, though this test is probably susceptible because it does a server restart so early in the test case where there is higher change that buf_load() has not yet completed.
I attached stdout and mysqld.1.err from the test failure.
Attachments
Issue Links
- duplicates
-
MDEV-35886 MariaDB Server frequently hanging, causing data corruption.
-
- Closed
-
- relates to
-
MDEV-35886 MariaDB Server frequently hanging, causing data corruption.
-
- Closed
-