Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.2.36, 10.3.29
-
None
-
KVM machine with ubuntu bionic cloud image. Dockerised deployment of mariadb 10.3.29 from mariadb repository based on ubuntu bionic docker image:
$ uname -a
Linux os-ctrl-1 4.15.0-109-generic #110-Ubuntu SMP Tue Jun 23 02:39:32 UTC 2020 x8
$ docker exec -it mariadb mysql --version
mysql Ver 15.1 Distrib 10.3.29-MariaDB, for debian-linux-gnu (x86_64) using readline 5.26_64 x86_64 x86_64 GNU/Linux
KVM machine with ubuntu bionic cloud image. Dockerised deployment of mariadb 10.3.29 from mariadb repository based on ubuntu bionic docker image: $ uname -a Linux os-ctrl-1 4.15.0-109-generic #110-Ubuntu SMP Tue Jun 23 02:39:32 UTC 2020 x8 $ docker exec -it mariadb mysql --version mysql Ver 15.1 Distrib 10.3.29-MariaDB, for debian-linux-gnu (x86_64) using readline 5.26_64 x86_64 x86_64 GNU/Linux
Description
Starting with mariadb 10.3.29 when deploying a three node galera cluster, we are seeing crashes with errors like in [1].
Please find attached the log files from all three nodes and their configurations. A core dump of the crash in [1] can also be provided.
[1]
2021-06-01 9:06:16 0 [ERROR] WSREP: Trx 236236 tries to abort slave trx 236238. This could be caused by:
|
1) unsupported configuration options combination, please check documentation.
|
2) a bug in the code.
|
3) a database corruption.
|
Node consistency compromized, need to abort. Restart the node to resync with cluster.
|
210601 9:06:16 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.3.29-MariaDB-1:10.3.29+maria~bionic-log
|
key_buffer_size=67108864
|
read_buffer_size=131072
|
max_used_connections=1
|
max_threads=10002
|
thread_count=12
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 22055535 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x49000
|
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5647d689bb8e]
|
/usr/sbin/mysqld(handle_fatal_signal+0x515)[0x5647d63308c5]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fd6316ed980]
|
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fd631328fb7]
|
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7fd63132a921]
|
/usr/sbin/mysqld(+0x90a914)[0x5647d64ba914]
|
/usr/sbin/mysqld(+0x90f83c)[0x5647d64bf83c]
|
/usr/sbin/mysqld(handle_manager+0x1f3)[0x5647d6142a83]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7fd6316e26db]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fd63140b71f]
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /var/lib/mysql
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes
|
Max core file size unlimited unlimited bytes
|
Max resident set unlimited unlimited bytes
|
Max processes unlimited unlimited processes
|
Max open files 1048576 1048576 files
|
Max locked memory 16777216 16777216 bytes
|
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 64055 64055 signals
|
Max msgqueue size 819200 819200 bytes
|
Max nice priority 0 0
|
Max realtime priority 0 0
|
Max realtime timeout unlimited unlimited us
|
Core pattern: core
|
Attachments
Issue Links
- is duplicated by
-
MDEV-26697 MariaDB keeps rebooting after upgrade from 10.3.23 to 10.3.31
-
- Closed
-
- relates to
-
MDEV-24978 SIGABRT in __libc_message
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Affects Version/s | 10.2.36 [ 25026 ] |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] |
Assignee | Jan Lindström [ jplindst ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Link |
This issue is blocked by |
Link |
This issue relates to |
Labels | need_feedback |
Assignee | Jan Lindström [ jplindst ] | Pramod Mahto [ pramod.mahto@mariadb.com ] |
Assignee | Pramod Mahto [ pramod.mahto@mariadb.com ] | Muhammad Irfan [ muhammad.irfan ] |
Assignee | Muhammad Irfan [ muhammad.irfan ] | Jan Lindström [ jplindst ] |
Labels | need_feedback |
Assignee | Jan Lindström [ jplindst ] | Seppo Jaakola [ seppo ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue relates to |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Link |
This issue is blocked by |
Fix Version/s | 10.2.41 [ 26032 ] | |
Fix Version/s | 10.3.32 [ 26029 ] | |
Fix Version/s | 10.4.22 [ 26031 ] | |
Fix Version/s | 10.5.13 [ 26026 ] | |
Fix Version/s | 10.6.5 [ 26034 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Link |
This issue is duplicated by |
Workflow | MariaDB v3 [ 122368 ] | MariaDB v4 [ 159354 ] |
Link | This issue relates to MENT-1410 [ MENT-1410 ] |
Zendesk Related Tickets | 135202 147109 133910 |
Okay, so we notice this too:
2021-06-02 20:58:51 0 [Note] WSREP: Signalling provider to continue.
2021-06-02 20:58:51 0 [Note] WSREP: SST received: 13af80aa-836a-11eb-8c23-32f57cb601d4:54087553
2021-06-02 20:58:51 2 [Note] WSREP: Receiving IST: 1683 writesets, seqnos 54087553-54089236
2021-06-02 20:58:51 0 [Note] WSREP: Receiving IST... 0.0% ( 0/1683 events) complete.
2021-06-02 20:58:51 0 [Note] Reading of all Master_info entries succeeded
2021-06-02 20:58:51 0 [Note] Added new Master_info '' to hash table
2021-06-02 20:58:51 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.3.29-MariaDB-1:10.3.29+maria~bionic-log' socket: '/run/mysqld/MY_DEPLOYMENT-mysqld.sock' port: 3306 mariadb.org binary distribution
2021-06-02 20:58:51 0 [ERROR] WSREP: Trx 54088471 tries to abort slave trx 54088527. This could be caused by:
1) unsupported configuration options combination, please check documentation.
2) a bug in the code.
3) a database corruption.
Node consistency compromized, need to abort. Restart the node to resync with cluster.
It went into a crashloop, with the same: Trx 54088471 tries to abort slave trx 54088527.
First core dump looks like:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f30f06fa9fb in __GI_abort () at abort.c:100
#2 0x000055d85b55d914 in wsrep_abort_slave_trx (bf_seqno=<optimized out>, victim_seqno=<optimized out>) at ./storage/innobase/handler/ha_innodb.cc:18893
#3 0x000055d85b56283c in bg_wsrep_kill_trx (void_arg=0x7f1088023830) at ./storage/innobase/handler/ha_innodb.cc:19058
#4 0x000055d85b1e5a83 in handle_manager (arg=arg@entry=0x0) at ./sql/sql_manager.cc:112
#5 0x000055d85b8ea67a in pfs_spawn_thread (arg=0x55d8c61e5228) at ./storage/perfschema/pfs.cc:1869
#6 0x00007f30f0ab26db in start_thread (arg=0x7f1153399700) at pthread_create.c:463
case QUERY_IDLE:
{
WSREP_DEBUG("kill IDLE for " TRX_ID_FMT, victim_trx->id);
if (wsrep_thd_exec_mode(thd) == REPL_RECV) {
WSREP_DEBUG("kill BF IDLE, seqno: %lld",
(long long)wsrep_thd_trx_seqno(thd));
wsrep_abort_slave_trx(arg->bf_seqno, // <-- CRASH
wsrep_thd_trx_seqno(thd));
goto ret_unlock;
}
/* This will lock thd from proceeding after net_read() */
wsrep_thd_set_conflict_state(thd, ABORTING);
(gdb) print *arg
$2 = {thd_id = 67, trx_id = 7493842400, bf_seqno = 54088471, signal = 1}
(gdb) print *(unsigned*)((void*)thd + 0x58e0)
$4 = 54088527
(gdb) print thd->wsrep_trx_meta
$22 = {gtid = {uuid = {data = "\023\257\200\252\203j\021\353\214#2\365|\266", <incomplete sequence \324>}, seqno = 54088527}, depends_on = 54088417}
(gdb) print thd->proc_info
$23 = 0x7f0fb0006570 "Applied write set 54088527"
(gdb) info locals
rcode = <optimized out>
arg = 0x7f1088023830
thd = 0x7f0fb0000c08
victim_trx = 0x7f242978d1f8
awake = false
(gdb) up
#4 0x000055d85b1e5a83 in handle_manager (arg=arg@entry=0x0) at ./sql/sql_manager.cc:112
(gdb) info locals
next = 0x0
cb = 0x7f1088023868
error = 0
abstime = {tv_sec = 94387747591648, tv_nsec = 139848169358893}
reset_flush_time = true
Went back to 10.3.25 because of
MDEV-25190(among others).Let me know if I can get you more debug info from the dump.