Details
-
Bug
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
11.4.5
-
None
-
None
-
Ubuntu 24.04.2 LTS
Hardware: Pro WS 665-ACE, AMD Ryzen 9 7950X3D, 192Gb RAM
Description
One of three Galera nodes doesn't start after a reboot. It gets stuck in the --wsrep_recover phase. If I run it manually and send the SIGABRT after some time, I get this log:
# /usr/sbin/mariadbd --user=mysql --wsrep_recover --disable-log-error
|
2025-03-02 18:03:24 0 [Note] slave_connections_needed_for_purge changed to 0 because of Galera. Change it to 1 or higher if this Galera node is also Master in a normal replication setup
|
2025-03-02 18:03:24 0 [Note] Starting MariaDB 11.4.5-MariaDB-ubu2404-log source revision 0771110266ff5c04216af4bf1243c65f8c67ccf4 server_uid scRVhtEliqUIegje+r6p56/AWgY= as process 164899
|
2025-03-02 18:03:24 0 [Note] InnoDB: Compressed tables use zlib 1.3
|
2025-03-02 18:03:24 0 [Note] InnoDB: Number of transaction pools: 1
|
2025-03-02 18:03:24 0 [Note] InnoDB: Using AVX512 instructions
|
2025-03-02 18:03:24 0 [Note] InnoDB: Using liburing
|
2025-03-02 18:03:24 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000GiB, chunk size = 2.000GiB
|
2025-03-02 18:03:24 0 [Note] InnoDB: Completed initialization of buffer pool
|
2025-03-02 18:03:24 0 [Note] InnoDB: Buffered log writes (block size=4096 bytes)
|
2025-03-02 18:03:24 0 [Note] InnoDB: End of log at LSN=4996688328739
|
2025-03-02 18:03:24 0 [Note] InnoDB: Resizing redo log from 2.000GiB to 16.000GiB; LSN=4996688328739
|
2025-03-02 18:03:24 0 [Note] InnoDB: Buffered log writes (block size=4096 bytes)
|
2025-03-02 18:03:24 0 [Note] InnoDB: Opened 3 undo tablespaces
|
2025-03-02 18:03:24 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
|
2025-03-02 18:03:24 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
|
2025-03-02 18:03:24 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
|
2025-03-02 18:03:24 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
|
2025-03-02 18:03:24 0 [Note] InnoDB: log sequence number 4996688328739; transaction id 4889635920
|
2025-03-02 18:03:24 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
|
2025-03-02 18:03:24 0 [Note] Plugin 'FEEDBACK' is disabled.
|
250302 18:13:51 [ERROR] /usr/sbin/mariadbd got signal 6 ;
|
Sorry, we probably made a mistake, and this is a bug.
|
|
Your assistance in bug reporting will enable us to fix this for the next release.
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs about how to report
|
a bug on https://jira.mariadb.org/.
|
|
Please include the information from the server start above, to the end of the
|
information below.
|
|
Server version: 11.4.5-MariaDB-ubu2404-log source revision: 0771110266ff5c04216af4bf1243c65f8c67ccf4
|
|
The information page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/
|
contains instructions to obtain a better version of the backtrace below.
|
Following these instructions will help MariaDB developers provide a fix quicker.
|
|
Attempting backtrace. Include this in the bug report.
|
(note: Retrieving this information may fail)
|
|
Thread pointer: 0x0
|
stack_bottom = 0x0 thread_stack 0x49000
|
/usr/sbin/mariadbd(my_print_stacktrace+0x30)[0x5a36b15b81b0]
|
/usr/sbin/mariadbd(handle_fatal_signal+0x2a1)[0x5a36b1151fa1]
|
/lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x76eefa445330]
|
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x76eefa498d71]
|
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x76eefa49b7ed]
|
/usr/sbin/mariadbd(psi_cond_wait+0x4b)[0x5a36b0ca48fd]
|
/usr/sbin/mariadbd(+0x5f0c75)[0x5a36b0c5bc75]
|
/usr/sbin/mariadbd(+0x6b66a8)[0x5a36b0d216a8]
|
/usr/sbin/mariadbd(_Z11mysqld_mainiPPc+0xf79)[0x5a36b0d24659]
|
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x76eefa42a1ca]
|
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x76eefa42a28b]
|
/usr/sbin/mariadbd(_start+0x25)[0x5a36b0cfe035]
|
Writing a core file...
|
Working directory at /var/lib/mysql
|
Resource Limits (excludes unlimited resources):
|
Limit Soft Limit Hard Limit Units
|
Max stack size 8388608 unlimited bytes
|
Max core file size 0 unlimited bytes
|
Max processes 768922 768922 processes
|
Max open files 33063 33063 files
|
Max locked memory 25222557696 25222557696 bytes
|
Max pending signals 768922 768922 signals
|
Max msgqueue size 819200 819200 bytes
|
Max nice priority 0 0
|
Max realtime priority 0 0
|
Core pattern: core
|
|
Kernel version: Linux version 6.11.0-17-generic (buildd@lcy02-amd64-038) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #17~24.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 20 22:48:29 UTC 2
|
|
Aborted
|
Below is the gdb backtrace taken while it was stuck:
(gdb) bt
|
#0 0x0000705755298d71 in __futex_abstimed_wait_common64 (private=28759, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5c9365566170 <mysql_bin_log+3344>) at ./nptl/futex-internal.c:57
|
#1 __futex_abstimed_wait_common (cancel=true, private=28759, abstime=0x0, clockid=0, expected=0, futex_word=0x5c9365566170 <mysql_bin_log+3344>) at ./nptl/futex-internal.c:87
|
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5c9365566170 <mysql_bin_log+3344>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0)
|
at ./nptl/futex-internal.c:139
|
#3 0x000070575529b7ed in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5c93655660e0 <mysql_bin_log+3200>, cond=0x5c9365566148 <mysql_bin_log+3304>) at ./nptl/pthread_cond_wait.c:503
|
#4 ___pthread_cond_wait (cond=0x5c9365566148 <mysql_bin_log+3304>, mutex=0x5c93655660e0 <mysql_bin_log+3200>) at ./nptl/pthread_cond_wait.c:627
|
#5 0x00005c93641f78fd in psi_cond_wait ()
|
#6 0x00005c93641aec75 in ?? ()
|
#7 0x00005c93642746a8 in ?? ()
|
#8 0x00005c9364277659 in mysqld_main(int, char**) ()
|
#9 0x000070575522a1ca in __libc_start_call_main (main=main@entry=0x5c936420fdc0 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffe0acb8938) at ../sysdeps/nptl/libc_start_call_main.h:58
|
#10 0x000070575522a28b in __libc_start_main_impl (main=0x5c936420fdc0 <main>, argc=4, argv=0x7ffe0acb8938, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe0acb8928) at ../csu/libc-start.c:360
|
#11 0x00005c9364251035 in _start ()
|
If I turn off the binlog, the node joins the Galera cluster, but does not accept any client connections:
# mariadb --connect-timeout=10
|
ERROR 2013 (HY000): Lost connection to server at 'handshake: reading initial communication packet', system error: 110
|
Further investigation revealed that the issue arose because the donor's mysql.gtid_slave_pos table contains 926,146,635 rows. An SST from another node resolved the problem; however, on the problematic node the number of rows in mysql.gtid_slave_pos does not decrease, as the node is not participating in either inbound or outbound replication—only the binary log is enabled.
MariaDB [mysql]> select count(*) from gtid_slave_pos;
+-----------+
| count(*) |
+-----------+
| 926146635 |
+-----------+
1 row in set (45.236 sec)