Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.3.28, 10.2(EOL), 10.4(EOL), 10.5
-
HideCentOS 7.9.2009
mysqld would have been started with the following arguments:
--datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1500 --slow_query_log=1 --log_error=/data/mariadb/mariadb.log --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=6666 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://[redacted] --wsrep_node_address=[redacted] --wsrep_sst_method=mariabackup --wsrep_sst_auth=[redacted] --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON --wsrep_log_conflicts=1ShowCentOS 7.9.2009 mysqld would have been started with the following arguments: --datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1500 --slow_query_log=1 --log_error=/data/mariadb/mariadb.log --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=6666 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://[redacted] --wsrep_node_address=[redacted] --wsrep_sst_method=mariabackup --wsrep_sst_auth=[redacted] --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON --wsrep_log_conflicts=1
Description
About 29 hours after updating a previously very stable 3 server MariaDB Galera cluster from 10.3.27 to 10.3.28, one of the nodes crashed with the following message:
2021-03-10 18:22:48 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
|
at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
|
2021-03-10 18:22:48 0 [ERROR] WSREP: cancel commit bad exit: 7 33792346039
|
210310 18:22:48 [ERROR] mysqld got signal 6 ;
|
Attached is a bit more log (coincidentally had been running with conflict logging enabled), but I don't think there's much more I can provide right now. I'm still logging this since before the update to 10.3.28 the cluster had been running very stable for months, and this could be somehow related to MDEV-25111, which we also encountered first time right after updating to 10.3.28.
Attachments
- gdb.txt
- 207 kB
Issue Links
- blocks
-
MDEV-25368 Galera cluster hangs on Freeing items
-
- Closed
-
- causes
-
MDEV-24294 MariaDB - Cluster freezes if node hangs
-
- Closed
-
-
MDEV-25992 Galera 3 Server crash with signal 6 after RBR event apply failed
-
- Closed
-
-
MDEV-26099 MariaDB 10.5.10/10.5.11 Galera assertion crash
-
- Closed
-
- is caused by
-
MDEV-23328 Server hang due to Galera lock conflict resolution
-
- Closed
-
- is duplicated by
-
MDEV-28604 MariaDB crashing very often
-
- Closed
-
- relates to
-
MDEV-24915 Galera conflict resolution is unnecessarily complex
-
- Closed
-
-
MDEV-25410 Assertion `state_ == s_exec' failed - mysqld got signal 6
-
- Closed
-
-
MDEV-25609 Signal 11 on wsrep_mysqld.cc:2620
-
- Closed
-
-
MDEV-24397 MariaDB Galera Server Crashes on Large DELETE
-
- Closed
-
-
MDEV-25111 Long semaphore wait (> 800 secs), server stops responding
-
- Closed
-
Activity
same issue here -
2021-03-23 11:11:32 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
at galera/src/replicator_smm.cpp:abort_trx():727
2021-03-23 11:11:32 0 [ERROR] WSREP: cancel commit bad exit: 7 21859814393
210323 11:11:32 [ERROR] mysqld got signal 6 ;
it happened when doing ALTER table - adding partitions to a table (with intensive INSERTs)
I just did an upgrade from 10.2 to 10.3 and by chance hit this release 10.3.28 and immediately the nodes started crashing. 3 node galera cluster, in total they've crashed over 10 times in the 15 hours since the upgrade finished.
I'm afraid it's urgent for us to downgrade to 10.3.27 so I have no more diagnostics info to give you.
OS: CentOS 7
Log is same as previous reporters.
As far as I know we do not have any ALTER TABLE use.
Very heavy write use and a very large DB, 330GB.
I can reproduce similar crash with mtr test suite, the smoking gun seems to be the new bg_wsrep_kill_trx () background killer execution, debug log messages before assert give:
2021-04-21 11:40:37 2 [Note] WSREP: Winning thread: |
THD: 2, mode: applier, state: executing, conflict: no conflict, seqno: 1500 |
SQL: INSERT INTO c VALUES (1, 0)^E<E5>^?`^S^B |
2021-04-21 11:40:37 2 [Note] WSREP: Victim thread: |
THD: 259, mode: local, state: executing, conflict: cert failure, seqno: -1 |
SQL: COMMIT
|
2021-04-21 11:40:37 0 [Note] WSREP: BF kill (1, seqno: 1500), victim: (259) trx: 3743 |
2021-04-21 11:40:37 0 [Note] WSREP: Aborting query: COMMIT conf 7 trx: -1 |
2021-04-21 11:40:37 0 [Note] WSREP: victim 3743 in state 7 |
2021-04-21 11:40:37 259 [Note] WSREP: cleanup transaction for LOCAL_STATE: COMMIT |
2021-04-21 11:40:37 259 [Note] WSREP: COMMIT failed, MDL released: 259 |
2021-04-21 11:40:37 259 [Note] WSREP: cert failure, thd: 259 is_AC: 0, retry: 0 - 1 SQL: COMMIT |
2021-04-21 11:40:37 2 [Note] WSREP: Set WSREPXid for InnoDB: fc2a06c7-a27c-11eb-ad0f-fb22db561fab:1502 |
2021-04-21 11:40:37 2 [Note] WSREP: Set WSREPXid for InnoDB: fc2a06c7-a27c-11eb-ad0f-fb22db561fab:1503 |
2021-04-21 11:40:37 2 [Note] WSREP: Set WSREPXid for InnoDB: fc2a06c7-a27c-11eb-ad0f-fb22db561fab:1504 |
2021-04-21 11:40:37 2 [Note] WSREP: Set WSREPXid for InnoDB: fc2a06c7-a27c-11eb-ad0f-fb22db561fab:1505 |
2021-04-21 11:40:37 259 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait = 15, mask = 1 |
2021-04-21 11:40:37 260 [Note] WSREP: refresh_provider_options: dbug=d,apply_monitor_slave_enter_sync |
2021-04-21 11:40:37 260 [Note] WSREP: refresh_provider_options: dbug= |
2021-04-21 11:40:37 260 [Note] WSREP: refresh_provider_options: dbug=d,local_monitor_enter_sync |
2021-04-21 11:40:37 260 [Note] WSREP: refresh_provider_options: signal=apply_monitor_slave_enter_sync |
2021-04-21 11:40:37 2 [Note] WSREP: cluster conflict due to high priority abort for threads: |
2021-04-21 11:40:37 2 [Note] WSREP: Winning thread: |
THD: 2, mode: applier, state: executing, conflict: no conflict, seqno: 1509 |
SQL: UPDATE c SET f2 = 1 WHERE f1 = 1^E<E5>^?`^S^B |
2021-04-21 11:40:37 2 [Note] WSREP: Victim thread: |
THD: 259, mode: local, state: committing, conflict: no conflict, seqno: 1510 |
SQL: COMMIT
|
2021-04-21 11:40:37 0 [Note] WSREP: BF kill (1, seqno: 1509), victim: (259) trx: 3765 |
2021-04-21 11:40:37 0 [Note] WSREP: Aborting query: COMMIT conf 0 trx: 3765 |
2021-04-21 11:40:37 259 [Note] WSREP: commit failed for reason: 3 conf 0 |
2021-04-21 11:40:37 0 [Note] WSREP: kill query for: 259 |
2021-04-21 11:40:37 0 [Note] WSREP: kill trx QUERY_COMMITTING for 3765 |
2021-04-21 11:40:37 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL) |
This is probably a regression of MDEV-23328 , which brought in the background killer thread. Galera suite mtr tests can reproduce this issue, e.g.: galera.MW-369
stack trace of crashing thread:
#0 __pthread_kill (threadid=<optimized out>, signo=6) |
at ../sysdeps/unix/sysv/linux/pthread_kill.c:56 |
#1 0x000055b6444b5731 in my_write_core (sig=6) |
at /home/seppo/work/wsrep/mariadb-server/mysys/stacktrace.c:386 |
#2 0x000055b643c1e72f in handle_fatal_signal (sig=6) |
at /home/seppo/work/wsrep/mariadb-server/sql/signal_handler.cc:355 |
#3 <signal handler called> |
#4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 |
#5 0x00007f7e5fc8a859 in __GI_abort () at abort.c:79 |
#6 0x000055b643e7833e in bg_wsrep_kill_trx (void_arg=0x7f7e3c0c9360) |
at /home/seppo/work/wsrep/mariadb-server/storage/innobase/handler/ha_innodb.cc:18978 |
#7 0x000055b6438ec1c4 in handle_manager (arg=0x0) |
at /home/seppo/work/wsrep/mariadb-server/sql/sql_manager.cc:112 |
#8 0x000055b644443337 in pfs_spawn_thread (arg=0x55b6479e4b20) |
at /home/seppo/work/wsrep/mariadb-server/storage/perfschema/pfs.cc:1869 |
#9 0x00007f7e5fe62609 in start_thread (arg=<optimized out>) at pthread_create.c:477 |
#10 0x00007f7e5fd87293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 |
|
serg, this task has been discussed in a call with seppo. This seems to be a regression related to changes done for "kill query", and it is hard for Galera to deal with a kill query as such. So the question came up if kill query can be supported at all for Galera. What is your opinion, should we disallow kill query for Galera?
How can KILL be disallowed for galera? Simply if a server is part of a cluster, then KILL is not allowed? That seems to be a bit too drastic, don't you think?
Unfortunately, PR https://github.com/MariaDB/server/pull/1843 is not correct based on pquery QA executed by ramesh. There is at least two bugs found on this version.
- Bug 1:
(rr) where
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x0000145a0820e859 in __GI_abort () at abort.c:79
#2 0x0000145a0820e729 in __assert_fail_base (fmt=0x145a083a4588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=0x55c620df5f79 "!is_owned()",
file=0x55c620df5f88 "/test/mtest/10.3_dbg/storage/innobase/include/sync0policy.ic", line=64,
function=<optimized out>) at assert.c:92
#3 0x0000145a0821ff36 in __GI___assert_fail (assertion=assertion@entry=0x55c620df5f79 "!is_owned()",
file=file@entry=0x55c620df5f88 "/test/mtest/10.3_dbg/storage/innobase/include/sync0policy.ic",
line=line@entry=64,
function=function@entry=0x55c620df5fc8 "void MutexDebug<Mutex>::enter(const Mutex*, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101
#4 0x000055c6205aa5c1 in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x55c6214cf510 <lock_sys+80>,
mutex=0x55c6214cf500 <lock_sys+64>,
name=name@entry=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc",
line=line@entry=5186) at /test/mtest/10.3_dbg/storage/innobase/include/sync0types.h:979
#5 0x000055c620599e94 in GenericPolicy<TTASEventMutex<GenericPolicy> >::enter (line=5186,
filename=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc", mutex=...,
this=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/include/sync0policy.h:339
#6 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (line=5186,
name=0x55c620de7c58 "/test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc", n_delay=4, n_spins=30,
this=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/include/ib0mutex.h:590
#7 innobase_kill_query (thd=<optimized out>) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:5186
#8 0x000055c620358b27 in kill_handlerton (thd=thd@entry=0x7d1080001fe0, plugin=<optimized out>,
level=level@entry=0x2a1c4ce28d8c) at /test/mtest/10.3_dbg/sql/handler.cc:841
#9 0x000055c6200dc552 in plugin_foreach_with_mask (thd=thd@entry=0x7d1080001fe0,
func=func@entry=0x55c620358aae <kill_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1,
state_mask=state_mask@entry=8, arg=arg@entry=0x2a1c4ce28d8c) at /test/mtest/10.3_dbg/sql/sql_plugin.cc:2435
#10 0x000055c620359f34 in ha_kill_query (thd=thd@entry=0x7d1080001fe0, level=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
at /test/mtest/10.3_dbg/sql/handler.cc:849
#11 0x000055c62006d27e in THD::awake_no_mutex (this=this@entry=0x7d1080001fe0,
state_to_set=state_to_set@entry=KILL_QUERY) at /test/mtest/10.3_dbg/include/mysql/psi/mysql_thread.h:717
#12 0x000055c6202a3022 in THD::awake (state_to_set=KILL_QUERY, this=0x7d1080001fe0)
at /test/mtest/10.3_dbg/sql/sql_class.h:3332
#13 wsrep_thd_awake (thd=thd@entry=0x7d1080001fe0, signal=signal@entry=1 '\001', kill_signal=kill_signal@entry=4)
at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2750
#14 0x000055c620596fd7 in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x145a18001250,
bf_trx=bf_trx@entry=0x49602d628380, victim_trx=victim_trx@entry=0x49602d60d0d0, signal=signal@entry=1,
kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19071
#15 0x000055c62059cea0 in wsrep_abort_transaction (bf_thd=0x145a18001250, victim_thd=<optimized out>,
signal=1 '\001', kill_signal=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19170
#16 0x000055c62036956d in ha_abort_transaction (bf_thd=bf_thd@entry=0x145a18001250,
victim_thd=victim_thd@entry=0x7d1080001fe0, signal=signal@entry=1 '\001', kill_signal=kill_signal@entry=4)
at /test/mtest/10.3_dbg/sql/handler.cc:6769
#17 0x000055c6202b0945 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x145a18001250,
victim_thd_ptr=victim_thd_ptr@entry=0x7d1080001fe0, signal=signal@entry=1 '\001',
kill_signal=kill_signal@entry=KILL_QUERY) at /test/mtest/10.3_dbg/sql/wsrep_thd.cc:828
#18 0x000055c6202a0ccf in wsrep_grant_mdl_exception (requestor_ctx=requestor_ctx@entry=0x145a18001370,
ticket=ticket@entry=0x7d10800083d0, key=key@entry=0x7d1080068630)
at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2049
#19 0x000055c620206939 in MDL_lock::can_grant_lock (this=this@entry=0x7d1080068630, type_arg=MDL_EXCLUSIVE,
requestor_ctx=requestor_ctx@entry=0x145a18001370, ignore_lock_priority=ignore_lock_priority@entry=false)
at /test/mtest/10.3_dbg/sql/mdl.cc:1594
#20 0x000055c620207f66 in MDL_context::try_acquire_lock_impl (this=this@entry=0x145a18001370,
mdl_request=mdl_request@entry=0x145a18012188, out_ticket=out_ticket@entry=0x2a1c4ce292d8)
at /test/mtest/10.3_dbg/sql/mdl.cc:1925
#21 0x000055c620208179 in MDL_context::acquire_lock (this=this@entry=0x145a18001370, mdl_request=0x145a18012188,
--Type <RET> for more, q to quit, c to continue without paging--
lock_wait_timeout=lock_wait_timeout@entry=0) at /test/mtest/10.3_dbg/sql/mdl.cc:2069
#22 0x000055c620208ce8 in MDL_context::acquire_locks (this=this@entry=0x145a18001370,
mdl_requests=mdl_requests@entry=0x2a1c4ce29430, lock_wait_timeout=0) at /test/mtest/10.3_dbg/sql/mdl.cc:2268
#23 0x000055c62004a15b in lock_table_names (thd=thd@entry=0x145a18001250, options=...,
tables_start=0x145a18011d20, tables_end=<optimized out>, lock_wait_timeout=<optimized out>,
flags=flags@entry=0) at /test/mtest/10.3_dbg/sql/sql_base.cc:3923
#24 0x000055c62004cd3e in open_tables (thd=thd@entry=0x145a18001250, options=...,
start=start@entry=0x2a1c4ce297c8, counter=counter@entry=0x2a1c4ce297dc, flags=flags@entry=0,
prelocking_strategy=prelocking_strategy@entry=0x2a1c4ce29878) at /test/mtest/10.3_dbg/sql/sql_lex.h:1585
#25 0x000055c62004e279 in open_and_lock_tables (thd=thd@entry=0x145a18001250, options=..., tables=<optimized out>,
tables@entry=0x145a18011d20, derived=derived@entry=false, flags=flags@entry=0,
prelocking_strategy=prelocking_strategy@entry=0x2a1c4ce29878) at /test/mtest/10.3_dbg/sql/sql_base.cc:5109
#26 0x000055c6201886cc in open_and_lock_tables (flags=0, derived=false, tables=0x145a18011d20, options=...,
thd=0x145a18001250) at /test/mtest/10.3_dbg/sql/sql_base.h:495
#27 mysql_create_table (thd=thd@entry=0x145a18001250, create_table=create_table@entry=0x145a18011d20,
create_info=create_info@entry=0x2a1c4ce29a00, alter_info=alter_info@entry=0x2a1c4ce29940)
at /test/mtest/10.3_dbg/sql/sql_table.cc:5282
#28 0x000055c62018a0c5 in Sql_cmd_create_table_like::execute (this=0x145a18011d00, thd=0x145a18001250)
at /test/mtest/10.3_dbg/sql/sql_table.cc:11378
#29 0x000055c6200ce17c in mysql_execute_command (thd=thd@entry=0x145a18001250)
at /test/mtest/10.3_dbg/sql/sql_parse.cc:6075
#30 0x000055c6200d017f in mysql_parse (thd=thd@entry=0x145a18001250,
rawbuf=rawbuf@entry=0x145a18011ae8 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592)",
length=length@entry=186, parser_state=parser_state@entry=0x2a1c4ce2b530,
is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
at /test/mtest/10.3_dbg/sql/sql_parse.cc:7870
#31 0x000055c6200d183e in wsrep_mysql_parse (thd=thd@entry=0x145a18001250,
--Type <RET> for more, q to quit, c to continue without paging--
rawbuf=0x145a18011ae8 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592)", length=186,
parser_state=parser_state@entry=0x2a1c4ce2b530, is_com_multi=is_com_multi@entry=false,
is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_parse.cc:7662
#32 0x000055c6200d2959 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x145a18001250,
packet=packet@entry=0x145a18009301 "CREATE TABLE t1(c1 INTEGER NULL, c2 BINARY(25) NOT NULL, c3 SMALLINT(4) NULL, c4 BINARY(15) NOT NULL PRIMARY KEY, c5 INTEGER NOT NULL UNIQUE KEY,c6 FIXED(10,8) NOT NULL DEFAULT 3.141592);",
packet_length=packet_length@entry=187, is_com_multi=is_com_multi@entry=false,
is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_class.h:1152
#33 0x000055c6200d5ace in do_command (thd=0x145a18001250) at /test/mtest/10.3_dbg/sql/sql_parse.cc:1398
#34 0x000055c6201fdd54 in do_handle_one_connection (connect=connect@entry=0x55c6231056e0)
at /test/mtest/10.3_dbg/sql/sql_connect.cc:1403
#35 0x000055c6201fdf8b in handle_one_connection (arg=0x55c6231056e0)
at /test/mtest/10.3_dbg/sql/sql_connect.cc:1308
#36 0x000055c621e06609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#37 0x0000145a0830b293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
- Bug 2:
(rr) p file
$2 = 0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc"
(rr) p line
$3 = 2664
(rr) p mp->file
$4 = 0x562c3f0aea20 "/test/mtest/10.3_dbg/sql/sql_class.cc"
(rr) p mp->line
$5 = 1980
(rr) where
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00001fd34b9dd859 in __GI_abort () at abort.c:79
#2 0x0000562c3f0158be in safe_mutex_lock (mp=mp@entry=0x1d60100028b8, my_flags=my_flags@entry=0,
file=file@entry=0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc", line=line@entry=2664)
at /test/mtest/10.3_dbg/mysys/thr_mutex.c:265
#3 0x0000562c3e70c51f in inline_mysql_mutex_lock (src_line=2664,
src_file=0x562c3f102d68 "/test/mtest/10.3_dbg/sql/wsrep_mysqld.cc", that=0x1d60100028b8)
at /test/mtest/10.3_dbg/include/mysql/psi/mysql_thread.h:717
#4 wsrep_thd_LOCK (thd=thd@entry=0x1d6010000d90) at /test/mtest/10.3_dbg/sql/wsrep_mysqld.cc:2664
#5 0x0000562c3ea029c1 in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x1455c8000d90,
bf_trx=bf_trx@entry=0x30e64a067468, victim_trx=victim_trx@entry=0x30e64a064240, signal=signal@entry=0,
kill_signal=kill_signal@entry=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:18937
#6 0x0000562c3ea08ea0 in wsrep_abort_transaction (bf_thd=0x1455c8000d90, victim_thd=<optimized out>,
signal=0 '\000', kill_signal=4) at /test/mtest/10.3_dbg/storage/innobase/handler/ha_innodb.cc:19170
#7 0x0000562c3e7d556d in ha_abort_transaction (bf_thd=bf_thd@entry=0x1455c8000d90,
victim_thd=victim_thd@entry=0x1d6010000d90, signal=signal@entry=0 '\000', kill_signal=kill_signal@entry=4)
at /test/mtest/10.3_dbg/sql/handler.cc:6769
#8 0x0000562c3e71c945 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x1455c8000d90,
victim_thd_ptr=victim_thd_ptr@entry=0x1d6010000d90, signal=signal@entry=0 '\000',
kill_signal=kill_signal@entry=KILL_QUERY) at /test/mtest/10.3_dbg/sql/wsrep_thd.cc:828
#9 0x0000562c3e4ce83c in THD::notify_shared_lock (this=0x1455c8000d90, ctx_in_use=<optimized out>,
needs_thr_lock_abort=<optimized out>) at /test/mtest/10.3_dbg/sql/sql_class.cc:2002
#10 0x0000562c3e674325 in MDL_lock::notify_conflicting_locks (ctx=0x1455c8000eb0, this=0xde604090870)
at /test/mtest/10.3_dbg/sql/mdl.h:848
#11 MDL_context::acquire_lock (this=this@entry=0x1455c8000eb0, mdl_request=0x1455c8011c18,
lock_wait_timeout=lock_wait_timeout@entry=86400) at /test/mtest/10.3_dbg/sql/mdl.cc:2114
#12 0x0000562c3e674ce8 in MDL_context::acquire_locks (this=this@entry=0x1455c8000eb0,
mdl_requests=mdl_requests@entry=0x77463330c5b0, lock_wait_timeout=86400)
--Type <RET> for more, q to quit, c to continue without paging--
at /test/mtest/10.3_dbg/sql/mdl.cc:2268
#13 0x0000562c3e4b615b in lock_table_names (thd=thd@entry=0x1455c8000d90, options=...,
tables_start=0x1455c80117b0, tables_end=<optimized out>, lock_wait_timeout=<optimized out>,
flags=flags@entry=0) at /test/mtest/10.3_dbg/sql/sql_base.cc:3923
#14 0x0000562c3e4b8d3e in open_tables (thd=thd@entry=0x1455c8000d90, options=...,
start=start@entry=0x77463330c948, counter=counter@entry=0x77463330c95c, flags=flags@entry=0,
prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_lex.h:1585
#15 0x0000562c3e4ba279 in open_and_lock_tables (thd=thd@entry=0x1455c8000d90, options=..., tables=<optimized out>,
tables@entry=0x1455c80117b0, derived=derived@entry=false, flags=flags@entry=0,
prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_base.cc:5109
#16 0x0000562c3e4ba418 in open_and_lock_tables (prelocking_strategy=0x77463330ca90, flags=0, derived=false,
tables=0x1455c80117b0, thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_base.h:263
#17 open_n_lock_single_table (thd=thd@entry=0x1455c8000d90, table_l=table_l@entry=0x1455c80117b0,
lock_type=lock_type@entry=TL_READ_NO_INSERT, flags=flags@entry=0,
prelocking_strategy=prelocking_strategy@entry=0x77463330ca90) at /test/mtest/10.3_dbg/sql/sql_base.cc:4952
#18 0x0000562c3e602e23 in open_n_lock_single_table (flags=0, lock_type=TL_READ_NO_INSERT, table_l=0x1455c80117b0,
thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_base.h:482
#19 mysql_create_or_drop_trigger (thd=thd@entry=0x1455c8000d90, tables=<optimized out>, create=create@entry=true)
at /test/mtest/10.3_dbg/sql/sql_trigger.cc:558
#20 0x0000562c3e539b22 in mysql_execute_command (thd=thd@entry=0x1455c8000d90)
at /test/mtest/10.3_dbg/sql/sql_parse.cc:5913
#21 0x0000562c3e53c17f in mysql_parse (thd=thd@entry=0x1455c8000d90,
rawbuf=rawbuf@entry=0x1455c8011628 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col", length=length@entry=80, parser_state=parser_state@entry=0x77463330e530,
is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
at /test/mtest/10.3_dbg/sql/sql_parse.cc:7870
#22 0x0000562c3e53d83e in wsrep_mysql_parse (thd=thd@entry=0x1455c8000d90,
rawbuf=0x1455c8011628 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col",
--Type <RET> for more, q to quit, c to continue without paging--
length=80, parser_state=parser_state@entry=0x77463330e530, is_com_multi=is_com_multi@entry=false,
is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_parse.cc:7662
#23 0x0000562c3e53e959 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x1455c8000d90,
packet=packet@entry=0x1455c8008e41 "CREATE TRIGGER t3_trg_err_1 BEFORE INSERT ON t1 FOR EACH ROW SET @_var = NEW.col;", packet_length=packet_length@entry=81, is_com_multi=is_com_multi@entry=false,
is_next_command=is_next_command@entry=false) at /test/mtest/10.3_dbg/sql/sql_class.h:1152
#24 0x0000562c3e541ace in do_command (thd=0x1455c8000d90) at /test/mtest/10.3_dbg/sql/sql_parse.cc:1398
#25 0x0000562c3e669d54 in do_handle_one_connection (connect=connect@entry=0x562c42177370)
at /test/mtest/10.3_dbg/sql/sql_connect.cc:1403
#26 0x0000562c3e669f8b in handle_one_connection (arg=0x562c42177370)
at /test/mtest/10.3_dbg/sql/sql_connect.cc:1308
#27 0x0000562c403d1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#28 0x00001fd34bada293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
From these Bug 2 seems to be easy to fix. However, bug 1 looks complicated.
jplindst both these crashes happen when resolving MDL conflicts. The PR is not changing MDL conflict resolving execution, other than passing the kill signal as new argument, and from the stack traces above, it seems that MDL conflict protesting high priority transaction is passing the correct KILL_QUERY signal.
The test carried out here, therefore, is probably surfacing new issue, which you would encounter also with 10.3 HEAD version.
As you seem to have rr traces there, please dig out the BF victim thread's status. Here is only shown the BF thread trace, which is only half of the picture.
For bug2 I agree this could be repeatable with 10.3 HEAD and in there victim has following information
(rr) p thd->wsrep_query_state
|
$1 = QUERY_EXEC
|
(rr) p thd->wsrep_conflict_state
|
$2 = NO_CONFLICT
|
(rr) p thd->wsrep_exec_mode
|
$3 = LOCAL_STATE
|
(rr) p victim_trx->state
|
$4 = TRX_STATE_ACTIVE
|
(rr) p bf_trx
|
$5 = (const trx_t * const) 0x30e64a067468
|
(rr) p bf_trx->state
|
$6 = TRX_STATE_NOT_STARTED
|
(rr) p bf_thd
|
$7 = (THD * const) 0x1455c8000d90
|
(rr) p bf_thd->wsrep_query_state
|
$8 = QUERY_EXEC
|
(rr) p bf_thd->wsrep_conflict_state
|
$9 = NO_CONFLICT
|
(rr) p bf_thd->wsrep_exec_mode
|
$10 = TOTAL_ORDER
|
For bug1 I do not agree as 10.3 HEAD does not contain wsrep_innobase_kill_one_trx function. In this case victim has following state:
(rr) p trx->id
|
$1 = 280
|
(rr) p trx->state
|
$2 = TRX_STATE_ACTIVE
|
(rr) p thd->wsrep_query_state
|
value has been optimized out
|
(rr) p trx->mysql_thd->wsrep_query_state
|
$3 = QUERY_EXEC
|
(rr) p trx->mysql_thd->wsrep_exec_mode
|
$4 = LOCAL_STATE
|
(rr) p trx->mysql_thd->wsrep_conflict_state
|
$5 = NO_CONFLICT
|
(rr) p trx->lock.wait_lock
|
$7 = (ib_lock_t *) 0x0
|
One of my nodes from My 3-node 10.3.28 cluster crashed in this way yesterday, same messages logged in the syslog:
Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld[29212]: 2021-06-06 21:23:44 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld[29212]: #011 at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld[29212]: 2021-06-06 21:23:44 0 [ERROR] WSREP: cancel commit bad exit: 7 370256912
Jun 6 21:23:45 ta-eng-gal1mdb5 mysqld[29212]: 210606 21:23:44 [ERROR] mysqld got signal 6 ;
We have been running 10.3.28 for just over 2 weeks and this is the first time we have seen this crash, we upgraded from 10.3.9 (which has been very stable for the last 3 years).
Looks that the PR is already merged, and review done. However, assigned for review just to get the state of this issue correct in Jira
Hello,
Just to comment that this affects 10.3.29 (running on Debian 10) that we recently upgraded on 2 galera clusters. Nodes keep crashing at random times with the exact error message noted in this issue.
- 10.2 : https://github.com/MariaDB/server/commit/9a9f68c862f8045340c1dbc380a15b1eed68c38e
- 10.3 : https://github.com/MariaDB/server/commit/08edfb7edd8a6aa6c8bab6429f551eb39d03b0f5
- 10.4 : https://github.com/MariaDB/server/commit/7d26c225150e81fa7092d81aecf79ee381d8d77f
- 10.5 : https://github.com/MariaDB/server/commit/63e0087b86d6c38e6023d414bad569144334f0b4
- 10.6: Not needed
I submitted some review comments for the 10.3 version (2 commits), identifying possible problems. I noticed that the 10.4 and 10.5 versions include some changes to row_upd_sec_index_entry() that are not mentioned in the commit message. It is not clear how they are related to these changes.
Given that most changes are outside InnoDB and this is replacing MDEV-23328 with another implementation, I think that the final review must be conducted by serg, who authored that change. Ideally, the test galera.galera_bf_kill_debug would not be removed.
Shall I review the PR? At the moment the last comment is yours “Unfortunately, this is not correct.” and the status is “Changes requested”
Also, was it tested on 10.6 ? In 10.6 InnoDB mutexes are covered by server's deadlock detector. You might want to put this fix into 10.6 and run all galera tests on a debug build. To be sure it didn't introduce any deadlocks.
serg Please review. I have applied review comments and other fixes on above commits. These fixes are tested by Ramesh using pquery. There is no need to apply this to 10.6 as this background killing is already removed there.
- branch : bb-10.2-galera
- commit : 60bcd225cc39f6e9072ba30e3c79c4dbf4277766
- https://github.com/MariaDB/server/commit/60bcd225cc39f6e9072ba30e3c79c4dbf4277766
- http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-galera
- Issues:
- test case: rpl.rpl_parallel_conflicts
- safe_mutex: Trying to unlock mutex LOCK_thd_data that wasn't locked at /home/jan/mysql/10.2/sql/slave.cc, line 323
- Last used at /home/jan/mysql/10.2/sql/sql_class.h, line: 4144
- test case : galera.MW-286 (BAD)
Thread 1 (Thread 0x7f92c8afe640 (LWP 1675562)):
#0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
#1 0x00005647f29ba3fb in my_write_core (sig=6) at /home/jan/mysql/10.2/mysys/stacktrace.c:382
#2 0x00005647f2260fe2 in handle_fatal_signal (sig=6) at /home/jan/mysql/10.2/sql/signal_handler.cc:355
#3 <signal handler called>
#4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#5 0x00007f92d8c8d864 in __GI_abort () at abort.c:79
#6 0x00007f92d8c8d749 in __assert_fail_base (fmt=0x7f92d8e16f78 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5647f2b7ef0d "!is_owned()", file=0x5647f2b7eed0 "/home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic", line=64, function=<optimized out>) at assert.c:92
#7 0x00007f92d8c9f3d6 in __GI___assert_fail (assertion=0x5647f2b7ef0d "!is_owned()", file=0x5647f2b7eed0 "/home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic", line=64, function=0x5647f2b7ee58 "void MutexDebug<Mutex>::enter(const Mutex*, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101
#8 0x00005647f24947dc in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f060, mutex=0x7f92c800f050, name=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/sync0policy.ic:64
#9 0x00005647f24930f4 in GenericPolicy<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f060, mutex=..., filename=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/sync0policy.h:346
#10 0x00005647f249120f in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x7f92c800f050, n_spins=30, n_delay=4, name=0x5647f2b6dfc8 "/home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc", line=5263) at /home/jan/mysql/10.2/storage/innobase/include/ib0mutex.h:635
#11 0x00005647f24691cb in innobase_kill_query (thd=0x7f9264000d90) at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:5263
#12 0x00005647f2263830 in kill_handlerton (thd=0x7f9264000d90, plugin=0x7f92b403ac50, level=0x7f92c8afbe74) at /home/jan/mysql/10.2/sql/handler.cc:843
#13 0x00005647f1ff6bb3 in plugin_foreach_with_mask (thd=0x7f9264000d90, func=0x5647f22637aa <kill_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x7f92c8afbe74) at /home/jan/mysql/10.2/sql/sql_plugin.cc:2411
#14 0x00005647f226388e in ha_kill_query (thd=0x7f9264000d90, level=THD_ABORT_SOFTLY) at /home/jan/mysql/10.2/sql/handler.cc:850
#15 0x00005647f1f904f2 in THD::awake (this=0x7f9264000d90, state_to_set=KILL_QUERY) at /home/jan/mysql/10.2/sql/sql_class.cc:1733
#16 0x00005647f21c34b6 in wsrep_thd_awake (thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2764
#17 0x00005647f248534b in wsrep_innobase_kill_one_trx (bf_thd=0x7f92b4001fe0, bf_trx=0x7f92abc00120, victim_trx=0x7f92abc01228, signal=1) at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:19722
#18 0x00005647f2485795 in wsrep_abort_transaction (hton=0x5647f479e730, bf_thd=0x7f92b4001fe0, victim_thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/storage/innobase/handler/ha_innodb.cc:19797
#19 0x00005647f22716fb in ha_abort_transaction (bf_thd=0x7f92b4001fe0, victim_thd=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/handler.cc:6326
#20 0x00005647f21d2ddc in wsrep_abort_thd (bf_thd_ptr=0x7f92b4001fe0, victim_thd_ptr=0x7f9264000d90, signal=1 '\001') at /home/jan/mysql/10.2/sql/wsrep_thd.cc:837
#21 0x00005647f21c1679 in wsrep_grant_mdl_exception (requestor_ctx=0x7f92b4002100, ticket=0x7f926418abc0, key=0x7f9264158f80) at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2057
#22 0x00005647f213d493 in MDL_lock::can_grant_lock (this=0x7f9264158f80, type_arg=MDL_SHARED_WRITE, requestor_ctx=0x7f92b4002100, ignore_lock_priority=false) at /home/jan/mysql/10.2/sql/mdl.cc:1592
#23 0x00005647f213dd34 in MDL_context::try_acquire_lock_impl (this=0x7f92b4002100, mdl_request=0x7f92b48d50c8, out_ticket=0x7f92c8afc308) at /home/jan/mysql/10.2/sql/mdl.cc:1923
#24 0x00005647f213e0c0 in MDL_context::acquire_lock (this=0x7f92b4002100, mdl_request=0x7f92b48d50c8, lock_wait_timeout=86400) at /home/jan/mysql/10.2/sql/mdl.cc:2067
#25 0x00005647f1f664c5 in open_table_get_mdl_lock (thd=0x7f92b4001fe0, ot_ctx=0x7f92c8afc9d0, mdl_request=0x7f92b48d50c8, flags=0, mdl_ticket=0x7f92c8afc630) at /home/jan/mysql/10.2/sql/sql_base.cc:1461
#26 0x00005647f1f66e90 in open_table (thd=0x7f92b4001fe0, table_list=0x7f92b48d4c70, ot_ctx=0x7f92c8afc9d0) at /home/jan/mysql/10.2/sql/sql_base.cc:1760
#27 0x00005647f1f6a46c in open_and_process_table (thd=0x7f92b4001fe0, tables=0x7f92b48d4c70, counter=0x7f92c8afca64, flags=0, prelocking_strategy=0x7f92c8afcae8, has_prelocking_list=false, ot_ctx=0x7f92c8afc9d0) at /home/jan/mysql/10.2/sql/sql_base.cc:3589
#28 0x00005647f1f6b370 in open_tables (thd=0x7f92b4001fe0, options=..., start=0x7f92c8afca48, counter=0x7f92c8afca64, flags=0, prelocking_strategy=0x7f92c8afcae8) at /home/jan/mysql/10.2/sql/sql_base.cc:4062
#29 0x00005647f1f6c9c0 in open_and_lock_tables (thd=0x7f92b4001fe0, options=..., tables=0x7f92b48d4c70, derived=false, flags=0, prelocking_strategy=0x7f92c8afcae8) at /home/jan/mysql/10.2/sql/sql_base.cc:4861
#30 0x00005647f1f3251b in open_and_lock_tables (thd=0x7f92b4001fe0, tables=0x7f92b48d4c70, derived=false, flags=0) at /home/jan/mysql/10.2/sql/sql_base.h:508
#31 0x00005647f239a995 in Rows_log_event::do_apply_event (this=0x7f92b4028840, rgi=0x7f92b4021740) at /home/jan/mysql/10.2/sql/log_event.cc:10854
#32 0x00005647f1f22f47 in Log_event::apply_event (this=0x7f92b4028840, rgi=0x7f92b4021740) at /home/jan/mysql/10.2/sql/log_event.h:1462
#33 0x00005647f21cf5bf in wsrep_apply_events (thd=0x7f92b4001fe0, events_buf=0x7f92cbd57150, buf_len=493589) at /home/jan/mysql/10.2/sql/wsrep_applier.cc:164
#34 0x00005647f21cfb8b in wsrep_apply_cb (ctx=0x7f92b4001fe0, buf=0x7f92cbd57150, buf_len=502024, flags=1, meta=0x7f92c8afd350) at /home/jan/mysql/10.2/sql/wsrep_applier.cc:267
#35 0x00007f92d3cc99cc in galera::TrxHandle::apply (this=0x7f92b403e060, recv_ctx=0x7f92b4001fe0, apply_cb=0x5647f21cf87f <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, meta=...) at /home/jan/mysql/mariadb-galeralib/galera/src/trx_handle.cpp:316
#36 0x00007f92d3cd08e3 in apply_trx_ws (recv_ctx=0x7f92b4001fe0, apply_cb=0x5647f21cf87f <wsrep_apply_cb(void*, void const*, size_t, uint32_t, wsrep_trx_meta_t const*)>, commit_cb=0x5647f21cfeb4 <wsrep_commit_cb(void*, uint32_t, wsrep_trx_meta_t const*, wsrep_bool_t*, bool)>, trx=..., meta=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:34
#37 0x00007f92d3cd3b15 in galera::ReplicatorSMM::apply_trx (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0, trx=0x7f92b403e060) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:455
#38 0x00007f92d3cd82a6 in galera::ReplicatorSMM::process_trx (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0, trx=0x7f92b403e060) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:1259
#39 0x00007f92d3d13781 in galera::GcsActionSource::dispatch (this=0x5647f47376d8, recv_ctx=0x7f92b4001fe0, act=..., exit_loop=@0x7f92c8afda7b: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:115
#40 0x00007f92d3d13fb9 in galera::GcsActionSource::process (this=0x5647f47376d8, recv_ctx=0x7f92b4001fe0, exit_loop=@0x7f92c8afda7b: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:180
#41 0x00007f92d3cd3141 in galera::ReplicatorSMM::async_recv (this=0x5647f4736f50, recv_ctx=0x7f92b4001fe0) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:363
#42 0x00007f92d3ca86b3 in galera_recv (gh=0x5647f47040f0, recv_ctx=0x7f92b4001fe0) at /home/jan/mysql/mariadb-galeralib/galera/src/wsrep_provider.cpp:244
#43 0x00005647f21d1df8 in wsrep_replication_process (thd=0x7f92b4001fe0) at /home/jan/mysql/10.2/sql/wsrep_thd.cc:486
#44 0x00005647f21c1ba4 in start_wsrep_THD (arg=0x5647f476aaf0) at /home/jan/mysql/10.2/sql/wsrep_mysqld.cc:2169
#45 0x00005647f295c085 in pfs_spawn_thread (arg=0x5647f47595f0) at /home/jan/mysql/10.2/storage/perfschema/pfs.cc:1869
#46 0x00007f92d91de450 in start_thread (arg=0x7f92c8afe640) at pthread_create.c:473
#47 0x00007f92d8d7ed53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
- Problem found with pquery:
(rr) where
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x000073fe62f61859 in __GI_abort () at abort.c:79
#2 0x0000562f2a5ffabe in safe_mutex_lock (mp=mp@entry=0x146fb4002c28, my_flags=my_flags@entry=0,
file=file@entry=0x562f2a6d7ec0 "/test/mtest/10.2_dbg/sql/wsrep_mysqld.cc", line=line@entry=2689) at /test/mtest/10.2_dbg/mysys/thr_mutex.c:264
#3 0x0000562f29e70f47 in inline_mysql_mutex_lock (src_line=2689, src_file=0x562f2a6d7ec0 "/test/mtest/10.2_dbg/sql/wsrep_mysqld.cc", that=0x146fb4002c28)
at /test/mtest/10.2_dbg/include/mysql/psi/mysql_thread.h:688
#4 wsrep_thd_LOCK (thd=thd@entry=0x146fb4001230) at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2689
#5 0x0000562f2a0e351b in wsrep_innobase_kill_one_trx (bf_thd=bf_thd@entry=0x146fb8001260, bf_trx=bf_trx@entry=0x1d1f69a22120,
victim_trx=victim_trx@entry=0x1d1f69a3cae8, signal=signal@entry=0) at /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19607
#6 0x0000562f2a0f3b96 in wsrep_abort_transaction (hton=<optimized out>, bf_thd=0x146fb8001260, victim_thd=<optimized out>, signal=0 '\000')
at /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19797
#7 0x0000562f29efaafb in ha_abort_transaction (bf_thd=bf_thd@entry=0x146fb8001260, victim_thd=victim_thd@entry=0x146fb4001230, signal=signal@entry=0 '\000')
at /test/mtest/10.2_dbg/sql/handler.cc:6326
#8 0x0000562f29e80b87 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x146fb8001260, victim_thd_ptr=victim_thd_ptr@entry=0x146fb4001230,
signal=signal@entry=0 '\000') at /test/mtest/10.2_dbg/sql/wsrep_thd.cc:837
#9 0x0000562f29c9b55a in THD::notify_shared_lock (this=0x146fb8001260, ctx_in_use=<optimized out>, needs_thr_lock_abort=<optimized out>)
at /test/mtest/10.2_dbg/sql/sql_class.cc:1897
#10 0x0000562f29dffd6b in MDL_lock::notify_conflicting_locks (ctx=0x146fb8001380, this=0x3cd3540acf30) at /test/mtest/10.2_dbg/sql/mdl.h:847
#11 MDL_context::acquire_lock (this=this@entry=0x146fb8001380, mdl_request=0x146fb8011c30, lock_wait_timeout=lock_wait_timeout@entry=86400)
at /test/mtest/10.2_dbg/sql/mdl.cc:2104
#12 0x0000562f29e00682 in MDL_context::acquire_locks (this=this@entry=0x146fb8001380, mdl_requests=mdl_requests@entry=0x4d1f42c0ea50,
lock_wait_timeout=86400) at /test/mtest/10.2_dbg/sql/mdl.cc:2258
#13 0x0000562f29c83b05 in lock_table_names (thd=thd@entry=0x146fb8001260, options=..., tables_start=tables_start@entry=0x146fb80117d8,
tables_end=tables_end@entry=0x0, lock_wait_timeout=<optimized out>, flags=flags@entry=0) at /test/mtest/10.2_dbg/sql/sql_base.cc:3806
#14 0x0000562f29d84b51 in lock_table_names (flags=0, lock_wait_timeout=<optimized out>, table_list_end=0x0, table_list=0x146fb80117d8, thd=0x146fb8001260)
at /test/mtest/10.2_dbg/sql/sql_base.h:238
#15 mysql_rm_table (thd=thd@entry=0x146fb8001260, tables=tables@entry=0x146fb80117d8, if_exists=<optimized out>, drop_temporary=<optimized out>)
at /test/mtest/10.2_dbg/sql/sql_table.cc:2042
#16 0x0000562f29ce3c6c in mysql_execute_command (thd=thd@entry=0x146fb8001260) at /test/mtest/10.2_dbg/sql/structs.h:530
#17 0x0000562f29ce9809 in mysql_parse (thd=thd@entry=0x146fb8001260, rawbuf=rawbuf@entry=0x146fb8011718 "DROP TABLE t1", length=length@entry=13,
parser_state=parser_state@entry=0x4d1f42c10540, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
at /test/mtest/10.2_dbg/sql/sql_parse.cc:7793
#18 0x0000562f29ceae12 in wsrep_mysql_parse (thd=thd@entry=0x146fb8001260, rawbuf=0x146fb8011718 "DROP TABLE t1", length=13,
parser_state=parser_state@entry=0x4d1f42c10540, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
at /test/mtest/10.2_dbg/sql/sql_parse.cc:7585
#19 0x0000562f29cec36b in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x146fb8001260,
packet=packet@entry=0x146fb8008f31 "DROP TABLE t1;", packet_length=packet_length@entry=14, is_com_multi=is_com_multi@entry=false,
is_next_command=is_next_command@entry=false) at /test/mtest/10.2_dbg/sql/sql_class.h:1109
#20 0x0000562f29cef14a in do_command (thd=0x146fb8001260) at /test/mtest/10.2_dbg/sql/sql_parse.cc:1381
#21 0x0000562f29df51b1 in do_handle_one_connection (connect=connect@entry=0x562f2dce8540) at /test/mtest/10.2_dbg/sql/sql_connect.cc:1336
#22 0x0000562f29df53dc in handle_one_connection (arg=0x562f2dce8540) at /test/mtest/10.2_dbg/sql/sql_connect.cc:1241
#23 0x0000448464591609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#24 0x000073fe6305e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
- test case: rpl.rpl_parallel_conflicts
As Debian 11 ships mariadb 10.5 it's impossible to get galera up and running there. Any ETA available?
Problem in 10.4 as follows
- branch : bb-10.4-
MDEV-25114 - test case : galera.MW-286
- stack trace:
Thread 1 (Thread 0x7f5fd43e0640 (LWP 924119)):
#0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:54
#1 0x000055bd674e2f2e in my_write_core (sig=6) at /home/jan/mysql/10.4/mysys/stacktrace.c:386
#2 0x000055bd66c26a9d in handle_fatal_signal (sig=6) at /home/jan/mysql/10.4/sql/signal_handler.cc:356
#3 <signal handler called>
#4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#5 0x00007f5fdbe81864 in __GI_abort () at abort.c:79
#6 0x00007f5fdbe81749 in __assert_fail_base (fmt=0x7f5fdc00af78 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55bd677df9ee "!is_owned()", file=0x55bd677dfa00 "/home/jan/mysql/10.4/storage/innobase/include/sync0policy.h", line=121, function=<optimized out>) at assert.c:92
#7 0x00007f5fdbe933d6 in __GI___assert_fail (assertion=0x55bd677df9ee "!is_owned()", file=0x55bd677dfa00 "/home/jan/mysql/10.4/storage/innobase/include/sync0policy.h", line=121, function=0x55bd677dfa40 "void MutexDebug<Mutex>::enter(const Mutex&, const char*, unsigned int) [with Mutex = TTASEventMutex<GenericPolicy>]") at assert.c:101
#8 0x000055bd66eb265a in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x55bd680a09d0 <lock_sys+80>, mutex=..., filename=0x55bd677cfbb0 "/home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc", line=5195) at /home/jan/mysql/10.4/storage/innobase/include/sync0policy.h:121
#9 0x000055bd66eb0599 in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x55bd680a09c0 <lock_sys+64>, n_spins=30, n_delay=4, name=0x55bd677cfbb0 "/home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc", line=5195) at /home/jan/mysql/10.4/storage/innobase/include/ib0mutex.h:592
#10 0x000055bd66e843dc in innobase_kill_query (thd=0x7f5fa0002120) at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:5195
#11 0x000055bd66c2b196 in kill_handlerton (thd=0x7f5fa0002120, plugin=0x7f5fb0034500, level=0x7f5fd43dd974) at /home/jan/mysql/10.4/sql/handler.cc:865
#12 0x000055bd668d4eff in plugin_foreach_with_mask (thd=0x7f5fa0002120, func=0x55bd66c2b056 <kill_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x7f5fd43dd974) at /home/jan/mysql/10.4/sql/sql_plugin.cc:2474
#13 0x000055bd66c2b1f8 in ha_kill_query (thd=0x7f5fa0002120, level=THD_ABORT_SOFTLY) at /home/jan/mysql/10.4/sql/handler.cc:872
#14 0x000055bd6683e273 in THD::awake_no_mutex (this=0x7f5fa0002120, state_to_set=KILL_QUERY) at /home/jan/mysql/10.4/sql/sql_class.cc:1899
#15 0x000055bd66b4dcc9 in wsrep_thd_bf_abort (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/sql/service_wsrep.cc:252
#16 0x000055bd66ea1888 in wsrep_innobase_kill_one_trx (bf_thd=0x7f5fb0000d90, victim_trx=0x7f5fc4bd5218, signal=1 '\001') at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:18725
#17 0x000055bd66ea1c48 in wsrep_abort_transaction (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/storage/innobase/handler/ha_innodb.cc:18773
#18 0x000055bd66c3df05 in ha_abort_transaction (bf_thd=0x7f5fb0000d90, victim_thd=0x7f5fa0002120, signal=1 '\001') at /home/jan/mysql/10.4/sql/handler.cc:7129
#19 0x000055bd66b30040 in wsrep_handle_mdl_conflict (requestor_ctx=0x7f5fb0000eb0, ticket=0x7f5fa0143f20, key=0x7f5fa00944d0) at /home/jan/mysql/10.4/sql/wsrep_mysqld.cc:2386
#20 0x000055bd66a510ed in MDL_lock::can_grant_lock (this=0x7f5fa00944d0, type_arg=MDL_SHARED_WRITE, requestor_ctx=0x7f5fb0000eb0, ignore_lock_priority=false) at /home/jan/mysql/10.4/sql/mdl.cc:1798
#21 0x000055bd66a51a5e in MDL_context::try_acquire_lock_impl (this=0x7f5fb0000eb0, mdl_request=0x7f5fb002c6b0, out_ticket=0x7f5fd43dddd8) at /home/jan/mysql/10.4/sql/mdl.cc:2129
#22 0x000055bd66a51e6e in MDL_context::acquire_lock (this=0x7f5fb0000eb0, mdl_request=0x7f5fb002c6b0, lock_wait_timeout=86400) at /home/jan/mysql/10.4/sql/mdl.cc:2276
#23 0x000055bd6680e7cb in open_table_get_mdl_lock (thd=0x7f5fb0000d90, ot_ctx=0x7f5fd43de4a0, mdl_request=0x7f5fb002c6b0, flags=0, mdl_ticket=0x7f5fd43de100) at /home/jan/mysql/10.4/sql/sql_base.cc:1647
#24 0x000055bd6680f15b in open_table (thd=0x7f5fb0000d90, table_list=0x7f5fb002c260, ot_ctx=0x7f5fd43de4a0) at /home/jan/mysql/10.4/sql/sql_base.cc:1915
#25 0x000055bd66813523 in open_and_process_table (thd=0x7f5fb0000d90, tables=0x7f5fb002c260, counter=0x7f5fd43de534, flags=0, prelocking_strategy=0x7f5fd43de5b8, has_prelocking_list=false, ot_ctx=0x7f5fd43de4a0) at /home/jan/mysql/10.4/sql/sql_base.cc:3890
#26 0x000055bd668147f7 in open_tables (thd=0x7f5fb0000d90, options=..., start=0x7f5fd43de518, counter=0x7f5fd43de534, flags=0, prelocking_strategy=0x7f5fd43de5b8) at /home/jan/mysql/10.4/sql/sql_base.cc:4371
#27 0x000055bd6681676d in open_and_lock_tables (thd=0x7f5fb0000d90, options=..., tables=0x7f5fb002c260, derived=false, flags=0, prelocking_strategy=0x7f5fd43de5b8) at /home/jan/mysql/10.4/sql/sql_base.cc:5317
#28 0x000055bd667ce32f in open_and_lock_tables (thd=0x7f5fb0000d90, tables=0x7f5fb002c260, derived=false, flags=0) at /home/jan/mysql/10.4/sql/sql_base.h:503
#29 0x000055bd66d96de7 in Rows_log_event::do_apply_event (this=0x7f5fb002cb90, rgi=0x7f5fb0022700) at /home/jan/mysql/10.4/sql/log_event.cc:11409
#30 0x000055bd667b6b41 in Log_event::apply_event (this=0x7f5fb002cb90, rgi=0x7f5fb0022700) at /home/jan/mysql/10.4/sql/log_event.h:1492
#31 0x000055bd66b40a1b in wsrep_apply_events (thd=0x7f5fb0000d90, rli=0x7f5fb0016fd0, events_buf=0x7f5fc7fffbf8, buf_len=0) at /home/jan/mysql/10.4/sql/wsrep_applier.cc:200
#32 0x000055bd66b2183e in Wsrep_applier_service::apply_write_set (this=0x7f5fd43dfca0, ws_meta=..., data=...) at /home/jan/mysql/10.4/sql/wsrep_high_priority_service.cc:550
#33 0x000055bd675613b9 in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/src/server_state.cpp:327
#34 0x000055bd6756532c in wsrep::server_state::on_apply (this=0x55bd694aafe0, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/src/server_state.cpp:1137
#35 0x000055bd675803e5 in wsrep::high_priority_service::apply (this=0x7f5fd43dfca0, ws_handle=..., ws_meta=..., data=...) at /home/jan/mysql/10.4/wsrep-lib/include/wsrep/high_priority_service.hpp:47
#36 0x000055bd6757d343 in (anonymous namespace)::apply_cb (ctx=0x7f5fd43dfca0, wsh=0x7f5fd43deec0, flags=65, buf=0x7f5fd43deed0, meta=0x7f5fd43df190, exit_loop=0x7f5fd43df14f) at /home/jan/mysql/10.4/wsrep-lib/src/wsrep_provider_v26.cpp:504
#37 0x00007f5fd6dd6bf5 in galera::TrxHandleSlave::apply (this=this@entry=0x7f5fb0030700, recv_ctx=recv_ctx@entry=0x7f5fd43dfca0, apply_cb=0x55bd6757d110 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., exit_loop=exit_loop@entry=@0x7f5fd43df14f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/trx_handle.cpp:391
#38 0x00007f5fd6de6216 in galera::ReplicatorSMM::apply_trx (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0, ts=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:504
#39 0x00007f5fd6de9fd0 in galera::ReplicatorSMM::process_trx (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0, ts_ptr=...) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:2123
#40 0x00007f5fd6e1dd20 in galera::GcsActionSource::process_writeset (this=0x55bd69511270, recv_ctx=0x7f5fd43dfca0, act=..., exit_loop=@0x7f5fd43df87f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:62
#41 0x00007f5fd6e1e9d2 in galera::GcsActionSource::process (this=0x55bd69511270, recv_ctx=0x7f5fd43dfca0, exit_loop=@0x7f5fd43df87f: false) at /home/jan/mysql/mariadb-galeralib/galera/src/gcs_action_source.cpp:183
#42 0x00007f5fd6dea4c0 in galera::ReplicatorSMM::async_recv (this=0x55bd694df1a0, recv_ctx=0x7f5fd43dfca0) at /home/jan/mysql/mariadb-galeralib/galera/src/replicator_smm.cpp:390
#43 0x00007f5fd6dc31f1 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /home/jan/mysql/mariadb-galeralib/galera/src/wsrep_provider.cpp:236
#44 0x000055bd6757e510 in wsrep::wsrep_provider_v26::run_applier (this=0x55bd6939c410, applier_ctx=0x7f5fd43dfca0) at /home/jan/mysql/10.4/wsrep-lib/src/wsrep_provider_v26.cpp:796
#45 0x000055bd66b41359 in wsrep_replication_process (thd=0x7f5fb0000d90, arg=0x55bd69517260) at /home/jan/mysql/10.4/sql/wsrep_thd.cc:57
#46 0x000055bd66b31d01 in start_wsrep_THD (arg=0x55bd69517260) at /home/jan/mysql/10.4/sql/wsrep_mysqld.cc:2915
#47 0x000055bd6746e4bf in pfs_spawn_thread (arg=0x55bd69501450) at /home/jan/mysql/10.4/storage/perfschema/pfs.cc:1869
#48 0x00007f5fdc3db450 in start_thread (arg=0x7f5fd43e0640) at pthread_create.c:473
#49 0x00007f5fdbf72d53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
- saving '/home/jan/mysql/10.4/mysql-test/var/log/galera.MW-286-innodb/' to '/home/jan/mysql/10.4/mysql-test/var/log/galera.MW-286-innodb/'
serg marko This is now ready for review.
Latest versions:
- 10.2: https://github.com/MariaDB/server/commit/8f075c6d3dde8a7eec3ce901a0699d1478e5c065
- 10.3: https://github.com/MariaDB/server/commit/1eb067c5a29cc7f7245e8e7056e687b74494476e
- 10.4: https://github.com/MariaDB/server/commit/f79227a61f80e6590d967c110b0f9cf6f3d8bffe
- 10.5: https://github.com/MariaDB/server/commit/d683b6ece75a07d8cfaf9301532184cde9e3a1f7
- 10.6: https://github.com/MariaDB/server/commit/14b125c8b54439360e2277e725c56af0ee52d029
Next step is to do pquery QA runs on these.
jplindst pquery run looks good. Did not hit any new crashes related to this issue.
pquery run test case:
- multi-threaded
- multi master
- SQL file prepared with KILL CONNECTION/QUERY statements
I posted a few comments to the 10.2 version of this:
- The claimed revert of
MDEV-23328includes some strange changes, including introducing a potential race condition to wsrep_abort_transaction(). - The proposed fix appears to introduce a race condition in THD::notify_shared_lock(). I am not very familiar with this code, but I suspect that as soon as THD::LOCK_thd_data on the to-be-aborted session is released, that session object may be concurrently aborted by some other thread, and could be assigned to another transaction before wsrep_abort_thd() is invoked. I think that this (as well as the above mentioned change to wsrep_abort_transaction() needs to be tested with an extra sleep or yield added between the mutex release and the call.
- If I look at the total across the 2 commits, I see that the function wsrep_innobase_kill_one_trx() has been moved somewhere else in the file, which makes it harder to read the actual changes. Please try to generate a minimal diff, so that I can review deeper.
I see that there is a 10.6 version as well, not affecting InnoDB, and seemingly not affecting anything outside Galera.
The latest revision for 10.2 looks mostly OK to me. My only concern is a potential race condition in wsrep_close_client_connections(), but because I am not familiar with that code, I may be mistaken there.
serg Can you review latest versions in here
- 10.2 : https://github.com/MariaDB/server/commit/4b164f176e6ec34864afdaa3dc7be23c1ce09c79
- 10.3 : https://github.com/MariaDB/server/commit/8b29e8dec808f9f17d5f72c3209322dc72fbe4d3
- 10.4 : https://github.com/MariaDB/server/commit/5e9c98bc0595ea5e8ce297adf9ef31bfcabb638d
- 10.5 : https://github.com/MariaDB/server/commit/da9a37bf4181d92699af40fdd82ab5b7ae1bf307
- 10.6 : https://github.com/MariaDB/server/commit/2b9b10e05bbf67d8e6ee67337182b114e88656e2
Suggested way to fix this issue is not correct. I will fall back to KILL as TOI solution.
serg Can you please review following:
- 10.2
- branch: bb-10.2-KILL-as-TOI-galera
- commit: https://github.com/MariaDB/server/commit/6d0c1f3ae12593470f2a556c33e586bc08c677e8
- 10.3
- branch: bb-10.3-KILL-as-TOI-galera
- commit: https://github.com/MariaDB/server/commit/ed85dc379fe686ee2bd8707826864e3d7c62877c
Final versions:
- branch: bb-10.2-KILL-as-TOI-galera commit : https://github.com/MariaDB/server/commit/f3e5e3d897c90dcf04586e57c6bec5ef6fabb8e5
- branch: bb-10.3-KILL-as-TOI-galera commit : https://github.com/MariaDB/server/commit/002192b92cf3c9e206d711ca36cdf6694a457a8f
- branch: bb-10.4-KILL-as-TOI-galera commit : https://github.com/MariaDB/server/commit/778d88ea2d9ec9b2083a6eaadea5d31f9571ff14
- branch: bb-10.5-KILL-as-TOI-galera commit : https://github.com/MariaDB/server/commit/2c8f52ea53d7532e395fa92c3b0e9c5dfb619403
OMG, we are awaiting for this fix, thank you! After the September's 10.3.24 -> 10.3.31 upgrade, our three nodes dev cluster (10.3.31) has its node's mariadb service crashing, each node, every one or two hours. Wanted to revert, but we have to much data to restore. A festival of mariadb restarting nodes! Thanks again for your great efforts! Fingers crossed!
This wasn't an isolated occurrence. Another server crashed a couple of hours later:
2021-03-10 21:01:10 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
2021-03-10 21:01:10 0 [ERROR] WSREP: cancel commit bad exit: 7 34182537224
210310 21:01:10 [ERROR] mysqld got signal 6 ;