[MDEV-28472] BF lock wait long for trx - Assertion `mode_ == m_local || transaction_.is_streaming()' failed Created: 2022-05-04  Updated: 2023-11-23  Resolved: 2023-08-07

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.6.7
Fix Version/s: 10.6.15

Type: Bug Priority: Major
Reporter: Daniel Bray Assignee: Seppo Jaakola
Resolution: Fixed Votes: 5
Labels: None
Environment:

CentOS Stream 8 (latest patches)
VMware VSAN 7.x
4 node Galera cluster
ProxySQL splitting 1 writer with 3 backup readers


Attachments: File alter_v1.yy     File node1.err     File node2.err    
Issue Links:
Relates
relates to MDEV-29293 MariaDB stuck on starting commit stat... Closed

 Description   

We've been running this environment for awhile now. This is the first time we've seen this particular issue. This database/table has been unaltered in years, and has millions of writes to it daily.

2022-05-04 2:09:15 16 [Note] InnoDB: WSREP: BF lock wait long for trx:0xe3d26236f query: insert into SystemEvents (Message, Facility, FromHost, Priority, DeviceReportedTime, ReceivedAt, InfoUnitID, SysLogTag)

mariadbd: /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.6.7/wsrep-lib/include/wsrep/client_state.hpp:668: int wsrep::client_state::bf_abort(wsrep::seqno): Assertion `mode_ == m_local || transaction_.is_streaming()' failed.

Server version: 10.6.7-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=47
max_threads=65537
thread_count=51

Once the above happens, a simple restart doesn't work. We have to purge the data from the node, and resync everything back over to get everything to start up.

Here is the relevant config information:
[mariadb]
tmpdir=/var/tmp/
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_status_output=off
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=4G
innodb_log_file_size=4G
innodb_io_capacity=8000
innodb_lock_wait_timeout=1
innodb_monitor_enable=all
thread_handling = pool-of-threads
thread_stack = 192000
thread_cache_size = 4
thread_pool_size = 8
thread_pool_oversubscribe = 3
max_connections=500
max_allowed_packet=512M
performance_schema=ON
performance_schema_session_connect_attrs_size = 1024
userstat = 1
slow_query_log
slow_query_log_file=/var/log/mariadb/slow_query.log
log_error=/var/log/mariadb/mariadb.log

[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_sst_method=rsync
wsrep_slave_threads=16
wsrep_provider_options="gmcast.segment=0; gcache.size=4G; gcache.page_size=4G;"



 Comments   
Comment by Ramesh Sivaraman [ 2022-06-15 ]

jplindst Reproduced similar assertion using RQG grammar.

git clone https://github.com/Percona-Lab/sysbench-tpcc
./tpcc.lua --mysql-socket=/test/GAL_MD150622-mariadb-10.6.9-linux-x86_64-dbg/node1/node1_socket.sock --mysql-user=root --mysql-db=test --time=300 --threads=64 --report-interval=1 --tables=1 --scale=10 --db-driver=mysql prepare
 
perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=11063:socket=/test/GAL_MD150622-mariadb-10.6.9-linux-x86_64-dbg/node1/node1_socket.sock:user=root:database=test --grammar=conf/mariadb/alter_v1.yy --threads=4
perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=11331:socket=/test/GAL_MD150622-mariadb-10.6.9-linux-x86_64-dbg/node2/node2_socket.sock:user=root:database=test --grammar=conf/mariadb/alter_v1.yy --threads=4

10.6.9 6c82ab4f726b83f09646c22afddd4c102e60607b (Debug)

Core was generated by `/test/GAL_MD150622-mariadb-10.6.9-linux-x86_64-dbg/bin/mysqld --defaults-file=/'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x154174ed2700 (LWP 236302))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x000055fd9d5c7e76 in my_write_core (sig=sig@entry=6) at /test/10.6_dbg/mysys/stacktrace.c:424
#2  0x000055fd9ce522c9 in handle_fatal_signal (sig=6) at /test/10.6_dbg/sql/signal_handler.cc:345
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00001541fe4a0859 in __GI_abort () at abort.c:79
#6  0x00001541fe4a0729 in __assert_fail_base (fmt=0x1541fe636588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55fd9d96fc78 "mode_ == m_local || transaction_.is_streaming()", file=0x55fd9d756dd8 "/test/10.6_dbg/wsrep-lib/include/wsrep/client_state.hpp", line=679, function=<optimized out>) at assert.c:92
#7  0x00001541fe4b2006 in __GI___assert_fail (assertion=assertion@entry=0x55fd9d96fc78 "mode_ == m_local || transaction_.is_streaming()", file=file@entry=0x55fd9d756dd8 "/test/10.6_dbg/wsrep-lib/include/wsrep/client_state.hpp", line=line@entry=679, function=function@entry=0x55fd9d96fc38 "int wsrep::client_state::total_order_bf_abort(wsrep::seqno)") at assert.c:101
#8  0x000055fd9d166bce in wsrep::client_state::total_order_bf_abort (bf_seqno=<optimized out>, this=0x154180007550) at /test/10.6_dbg/wsrep-lib/include/wsrep/client_state.hpp:680
#9  wsrep_bf_abort (bf_thd=bf_thd@entry=0x154128000db8, victim_thd=victim_thd@entry=0x154180000ff8) at /test/10.6_dbg/sql/wsrep_thd.cc:384
#10 0x000055fd9d16e957 in wsrep_thd_bf_abort (bf_thd=bf_thd@entry=0x154128000db8, victim_thd=victim_thd@entry=0x154180000ff8, signal=signal@entry=1 '\001') at /test/10.6_dbg/sql/service_wsrep.cc:222
#11 0x000055fd9d192038 in lock_wait_wsrep_kill (bf_trx=bf_trx@entry=0x1541fc204c80, thd_id=<optimized out>, trx_id=<optimized out>) at /test/10.6_dbg/storage/innobase/handler/ha_innodb.cc:18701
#12 0x000055fd9d231a49 in lock_wait_wsrep (trx=trx@entry=0x1541fc204c80) at /test/10.6_dbg/storage/innobase/lock/lock0lock.cc:1000
#13 0x000055fd9d237016 in lock_wait (thr=thr@entry=0x15412801b660) at /test/10.6_dbg/storage/innobase/lock/lock0lock.cc:1744
#14 0x000055fd9d3183b1 in row_mysql_handle_errors (new_err=new_err@entry=0x154174ecce1c, trx=trx@entry=0x1541fc204c80, thr=thr@entry=0x15412801b660, savept=savept@entry=0x0) at /test/10.6_dbg/storage/innobase/row/row0mysql.cc:686
#15 0x000055fd9d239888 in lock_table_for_trx (table=0x1541d803a090, trx=trx@entry=0x1541fc204c80, mode=mode@entry=LOCK_X, no_wait=no_wait@entry=false) at /test/10.6_dbg/storage/innobase/lock/lock0lock.cc:3655
#16 0x000055fd9d19a253 in ha_innobase::rename_table (this=<optimized out>, from=0x154174ecd530 "./test/item1", to=0x154174ecd730 "./test/#sql-backup-39578-12") at /test/10.6_dbg/storage/innobase/handler/ha_innodb.cc:14112
#17 0x000055fd9ce5ecf4 in handler::ha_rename_table (this=this@entry=0x154128016000, from=from@entry=0x154174ecd530 "./test/item1", to=to@entry=0x154174ecd730 "./test/#sql-backup-39578-12") at /test/10.6_dbg/sql/handler.cc:5326
#18 0x000055fd9cc4e07d in mysql_rename_table (base=base@entry=0x55fd9f27b888, old_db=old_db@entry=0x154174ecf740, old_name=old_name@entry=0x154174ecf750, new_db=new_db@entry=0x154174ecf740, new_name=new_name@entry=0x154174ecded0, id=id@entry=0x154174ecf7e0, flags=2) at /test/10.6_dbg/sql/sql_table.cc:4988
#19 0x000055fd9cc6357c in mysql_alter_table (thd=thd@entry=0x154128000db8, new_db=new_db@entry=0x1541280057b8, new_name=new_name@entry=0x154128005bd0, create_info=create_info@entry=0x154174ed0510, table_list=<optimized out>, table_list@entry=0x154128014030, alter_info=alter_info@entry=0x154174ed0420, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/10.6_dbg/sql/sql_table.cc:10602
#20 0x000055fd9cce498e in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x154128000db8) at /test/10.6_dbg/sql/structs.h:568
#21 0x000055fd9cb9af30 in mysql_execute_command (thd=thd@entry=0x154128000db8, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/10.6_dbg/sql/sql_parse.cc:5996
#22 0x000055fd9cb835db in mysql_parse (thd=thd@entry=0x154128000db8, rawbuf=rawbuf@entry=0x154128013f00 "ALTER TABLE `item1` FORCE, ALGORITHM=COPY /* QNO 4 CON_ID 18 */", length=length@entry=63, parser_state=parser_state@entry=0x154174ed1470) at /test/10.6_dbg/sql/sql_parse.cc:8029
#23 0x000055fd9cb82fe4 in wsrep_mysql_parse (thd=thd@entry=0x154128000db8, rawbuf=0x154128013f00 "ALTER TABLE `item1` FORCE, ALGORITHM=COPY /* QNO 4 CON_ID 18 */", length=63, parser_state=parser_state@entry=0x154174ed1470) at /test/10.6_dbg/sql/sql_parse.cc:7843
#24 0x000055fd9cb90d99 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x154128000db8, packet=packet@entry=0x15412800b569 "ALTER TABLE `item1` FORCE, ALGORITHM=COPY /* QNO 4 CON_ID 18 */ ", packet_length=packet_length@entry=64, blocking=blocking@entry=true) at /test/10.6_dbg/sql/sql_class.h:1358
#25 0x000055fd9cb93495 in do_command (thd=0x154128000db8, blocking=blocking@entry=true) at /test/10.6_dbg/sql/sql_parse.cc:1409
#26 0x000055fd9ccdd85b in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55fd9f5405f8, put_in_cache=put_in_cache@entry=true) at /test/10.6_dbg/sql/sql_connect.cc:1418
#27 0x000055fd9ccddd67 in handle_one_connection (arg=0x55fd9f5405f8) at /test/10.6_dbg/sql/sql_connect.cc:1312
#28 0x00001541fe9b1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x00001541fe59d163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Jean Gau [ 2022-10-26 ]

Hello,

I've got the same behaviour on our cluster with MariaDB 10.6.10 on CentOS 7 server :

221024 13:05:01 [ERROR] mysqld got signal 6 ;
221024 15:06:17 [ERROR] mysqld got signal 6 ;
221024 18:50:38 [ERROR] mysqld got signal 6 ;
221024 19:07:37 [ERROR] mysqld got signal 6 ;
221025 14:25:56 [ERROR] mysqld got signal 6 ;
221025 14:29:49 [ERROR] mysqld got signal 6 ;
221025 14:31:17 [ERROR] mysqld got signal 6 ;
221025 14:33:48 [ERROR] mysqld got signal 6 ;
221026 10:07:22 [ERROR] mysqld got signal 6 ;

It's even more frequent since the update from 10.6.7 to 10.6.10.

What does "Fix Version 10.6 [ 24028 ]" means?

Comment by Luke Cousins [ 2023-04-11 ]

I believe we might be experiencing the same issue. It only happens when a node has been running for a while. If we restart it's rare for this to happen. How can I get further information for you next time this happens?

This is what is in syslog:

Apr 11 14:34:50 breww-db-03 mariadbd[1134]: mariadbd: /home/buildbot/buildbot/build/mariadb-10.11.2/wsrep-lib/include/wsrep/client_state.hpp:678: int wsrep::client_state::total_order_bf_abort(wsrep::seqno): Assertion `mode_ == m_local || transaction_.is_streaming()' failed.
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: 230411 14:34:50 [ERROR] mysqld got signal 6 ;
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: This could be because you hit a bug. It is also possible that this binary
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: or one of the libraries it was linked against is corrupt, improperly built,
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: or misconfigured. This error can also be caused by malfunctioning hardware.
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: We will try our best to scrape up some info that will hopefully help
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: diagnose the problem, but since we have already crashed,
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: something is definitely wrong and this may fail.
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: Server version: 10.11.2-MariaDB-1:10.11.2+maria~ubu2004 source revision: cafba8761af55ae16cc69c9b53a341340a845b36
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: key_buffer_size=134217728
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: read_buffer_size=131072
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: max_used_connections=4
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: max_threads=1002
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: thread_count=21
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: It is possible that mysqld could use up to
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2337895 K  bytes of memory
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: Hope that's ok; if not, decrease some variables in the equation.
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: Thread pointer: 0x7f740c000c58
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: Attempting backtrace. You can use the following information to find out
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: where mysqld died. If you see no messages after this, something went
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: terribly wrong...
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: stack_bottom = 0x7f74985d5d88 thread_stack 0x49000
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(my_print_stacktrace)[0x558cced33582]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(handle_fatal_signal)[0x558cce80c895]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(__restore_rt)[0x7f777fb55420]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(gsignal)[0x7f777f65900b]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(abort)[0x7f777f638859]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: /lib/x86_64-linux-gnu/libc.so.6(+0x22729)[0x7f777f638729]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(__assert_fail)[0x7f777f649fd6]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep_bf_abort(THD*, THD*))[0x558cceac05b4]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep_thd_bf_abort)[0x558cceac8bed]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x558cceae90a2]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x558cce421214]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x558cce421aef]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x558cceba0038]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x558cceb3591f]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x558cceaed80c]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(mysql_rename_table(handlerton*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, st_mysql_const_unsigned_lex_string*, unsigned int))[0x558cce66b3af]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, Table_specification_st*, TABLE_LIST*, Recreate_info*, Alter_info*, unsigned int, st_order*, bool, bool))[0x558cce67820c]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(Sql_cmd_alter_table::execute(THD*))[0x558cce6e56a8]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(mysql_execute_command(THD*, bool))[0x558cce5bc4fe]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x558cce5ac007]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(Query_log_event::do_apply_event(rpl_group_info*, char const*, unsigned int))[0x558cce94a921]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(Log_event::apply_event(rpl_group_info*))[0x558cce93d5e8]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep_apply_events(THD*, Relay_log_info*, void const*, unsigned long))[0x558cceabf37a]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(Wsrep_high_priority_service::apply_toi(wsrep::ws_meta const&, wsrep::const_buffer const&, wsrep::mutable_buffer&))[0x558cceaa3b6e]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep::server_state::on_apply(wsrep::high_priority_service&, wsrep::ws_handle const&, wsrep::ws_meta const&, wsrep::const_buffer const&))[0x558ccedd3fe4]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep::wsrep_provider_v26::options[abi:cxx11]() const)[0x558ccede857c]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: src/trx_handle.cpp:396(galera::TrxHandleSlave::apply(void*, wsrep_cb_status (*)(void*, wsrep_ws_handle const*, unsigned int, wsrep_buf const*, wsrep_trx_meta const*, bool*), wsrep_trx_meta const&, bool&))[0x7f777f17cceb]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: src/replicator_smm.cpp:516(galera::ReplicatorSMM::apply_trx(void*, galera::TrxHandleSlave&))[0x7f777f1914e6]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: src/replicator_smm.cpp:2156(galera::ReplicatorSMM::process_trx(void*, boost::shared_ptr<galera::TrxHandleSlave> const&))[0x7f777f193c48]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: src/gcs_action_source.cpp:63(galera::GcsActionSource::process_writeset(void*, gcs_action const&, bool&))[0x7f777f1c7e7b]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: src/gcs_action_source.cpp:110(galera::GcsActionSource::dispatch(void*, gcs_action const&, bool&))[0x7f777f1c80a3]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: src/gcs_action_source.cpp:29(Release::~Release())[0x7f777f1c8682]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: src/replicator_smm.cpp:402(galera::ReplicatorSMM::async_recv(void*))[0x7f777f194ef0]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: src/wsrep_provider.cpp:291(galera_recv)[0x7f777f1684a1]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*))[0x558ccede8bc2]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(wsrep_fire_rollbacker)[0x558cceac12e1]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(start_wsrep_THD(void*))[0x558cceaaef9b]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x558ccea37b06]
Apr 11 14:34:50 breww-db-03 mariadbd[1134]: ??:0(start_thread)[0x7f777fb49609]
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: ??:0(clone)[0x7f777f735133]
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Trying to get some variables.
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Some pointers may be invalid and cause the dump to abort.
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Query (0x7f73fc066c74): ALTER TABLE `accounts_organisationaccount` MODIFY `locale_code` varchar(5) NOT NULL
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Connection ID (thread ID): 13
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Status: NOT_KILLED
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=off,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: information that should help you find out what is causing the crash.
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Writing a core file...
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Working directory at /var/lib/mysql
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Resource Limits:
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Limit                     Soft Limit           Hard Limit           Units
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max cpu time              unlimited            unlimited            seconds
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max file size             unlimited            unlimited            bytes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max data size             unlimited            unlimited            bytes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max stack size            8388608              unlimited            bytes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max core file size        0                    unlimited            bytes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max resident set          unlimited            unlimited            bytes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max processes             63672                63672                processes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max open files            32768                32768                files
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max locked memory         65536                65536                bytes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max address space         unlimited            unlimited            bytes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max file locks            unlimited            unlimited            locks
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max pending signals       63672                63672                signals
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max msgqueue size         819200               819200               bytes
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max nice priority         0                    0
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max realtime priority     0                    0
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Max realtime timeout      unlimited            unlimited            us
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
Apr 11 14:34:51 breww-db-03 mariadbd[1134]: Kernel version: Linux version 5.4.0-146-generic (buildd@lcy02-amd64-026) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #163-Ubuntu SMP Fri Mar 17 18:26:02 UTC 2023
Apr 11 14:35:05 breww-db-03 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
Apr 11 14:35:05 breww-db-03 systemd[1]: mariadb.service: Failed with result 'signal'.
Apr 11 14:35:10 breww-db-03 systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 1.
Apr 11 14:35:10 breww-db-03 systemd[1]: Stopped MariaDB 10.11.2 database server.

Comment by Seppo Jaakola [ 2023-04-20 ]

Using proxysql, which emits KILL CONNECTION statements, could be related

Comment by Seppo Jaakola [ 2023-04-20 ]

Linked MDEV-29293 to this issue. Looking at the description of original issue report, appears that ProxySQL has been used for load balancing the cluster. ProxySQL is using KILL CONNECTION commands, at will, and they could relate to the underlying issue.
The later methods to reproduce this issue use RQG grammar, which does not use KILL, but causes (maybe different) issue with ALTER TABLE processing. However, the proposed fix for MDEV-29293 has a fix for KILL processing in cluster and has also refactored the method of aborting victims of MDL lock conflict. I suggest testing this RQG test against the PR of MDEV-29293.

Comment by Ramesh Sivaraman [ 2023-04-21 ]

seppo RQG test using alter_v1.yy grammar looks good on 10.6-MDEV-29293 branch.

Comment by Daniel Bray [ 2023-06-13 ]

I've since updated to:
MariaDB 10.6.12
ProxySQL 2.2.2-1

And unfortunately, the issue was seen (while I was on vacation) last week:
mariadbd: /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.6.12/wsrep-lib/include/wsrep/client_state.hpp:667: int wsrep::client_state::bf_abort(wsrep::seqno): Assertion `mode_ == m_local || transaction_.is_streaming()' failed.
230609 0:55:00 [ERROR] mysqld got signal 6 ;

Comment by Jan Lindström [ 2023-08-07 ]

Fixed on MDEV-29293.

Generated at Thu Feb 08 10:01:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.