[MDEV-32779] galera_concurrent_ctas: assertion in the galera::ReplicatorSMM::finish_cert() Created: 2023-11-11  Updated: 2023-12-28

Status: Open
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.5.22
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Critical
Reporter: Julius Goryavsky Assignee: Seppo Jaakola
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-30172 Galera test case cleanup Stalled
Relates
relates to MDEV-24842 Galera test failure on galera_concurr... Open
relates to MDEV-25939 Galera test failure on galera.galera_... Open
relates to MDEV-33129 Crash in wsrep::wsrep_provider_v26::r... Open

 Description   

galera_concurrent_ctas test failed with assertion in the galera::ReplicatorSMM::finish_cert():

galera.galera_concurrent_ctas 'innodb'   [ fail ]
        Test ended at 2023-11-10 20:07:52
 
CURRENT_TEST: galera.galera_concurrent_ctas
ERROR 1050 (42S01) at line 1: Table 't1' already exists
ERROR 1050 (42S01) at line 1: Table 't1' already exists
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
 
 
Server [mysqld.1 - pid: 2091623, winpid: 2091623, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2023-11-10 20:07:44 2 [ERROR] Slave SQL: Error 'Table 't1' already exists' on query. Default database: 'test'. Query: 'CREATE TABLE `t1` (
  `SLEEP(0.1)` int(1) NOT NULL
)', Internal MariaDB error code: 1050
2023-11-10 20:07:44 2 [Warning] WSREP: Event 1 Query apply failed: 1, seqno 72
2023-11-10 20:07:45 0 [Note] WSREP: Member 0(panda) initiates vote on 5b615829-7ffc-11ee-b5ce-ce76384a43e5:72,8d891cf9f0867089:  Table 't1' already exists, Error_code: 1050;
2023-11-10 20:07:45 0 [Note] WSREP: Votes over 5b615829-7ffc-11ee-b5ce-ce76384a43e5:72:
   8d891cf9f0867089:   1/2
Waiting for more votes.
2023-11-10 20:07:45 0 [Note] WSREP: Member 1(panda) responds to vote on 5b615829-7ffc-11ee-b5ce-ce76384a43e5:72,0000000000000000: Success
2023-11-10 20:07:45 0 [Note] WSREP: Votes over 5b615829-7ffc-11ee-b5ce-ce76384a43e5:72:
   0000000000000000:   1/2
   8d891cf9f0867089:   1/2
Winner: 0000000000000000
2023-11-10 20:07:45 2 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on 5b615829-7ffc-11ee-b5ce-ce76384a43e5:72
	 at /home/panda/galera/galera/src/replicator_smm.cpp:process_apply_error():1342
	 at /home/panda/galera/galera/src/replicator_smm.cpp:handle_apply_error():1369
2023-11-10 20:07:45 2 [Note] WSREP: Closing send monitor...
2023-11-10 20:07:45 2 [Note] WSREP: Closed send monitor.
2023-11-10 20:07:45 2 [Note] WSREP: gcomm: terminating thread
2023-11-10 20:07:45 2 [Note] WSREP: gcomm: joining thread
2023-11-10 20:07:45 2 [Note] WSREP: gcomm: closing backend
2023-11-10 20:07:46 2 [Note] WSREP: view(view_id(NON_PRIM,5285bfee-a233,5) memb {
	5285bfee-a233,0
} joined {
} left {
} partitioned {
	533ad0ea-8cfd,0
})
2023-11-10 20:07:46 2 [Note] WSREP: PC protocol downgrade 1 -> 0
2023-11-10 20:07:46 2 [Note] WSREP: view((empty))
2023-11-10 20:07:46 2 [Note] WSREP: gcomm: closed
2023-11-10 20:07:46 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2023-11-10 20:07:46 0 [Note] WSREP: Flow-control interval: [16, 16]
2023-11-10 20:07:46 0 [Note] WSREP: Received NON-PRIMARY.
2023-11-10 20:07:46 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 74)
2023-11-10 20:07:46 0 [Note] WSREP: New SELF-LEAVE.
2023-11-10 20:07:46 0 [Note] WSREP: Flow-control interval: [0, 0]
2023-11-10 20:07:46 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2023-11-10 20:07:46 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 74)
2023-11-10 20:07:46 0 [Note] WSREP: RECV thread exiting 0: Success
2023-11-10 20:07:46 2 [Note] WSREP: recv_thread() joined.
2023-11-10 20:07:46 2 [Note] WSREP: Closing replication queue.
2023-11-10 20:07:46 2 [Note] WSREP: Closing slave action queue.
2023-11-10 20:07:46 2 [ERROR] WSREP: Failed to apply write set: gtid: 5b615829-7ffc-11ee-b5ce-ce76384a43e5:72 server_id: 533ad0ea-7ffc-11ee-8cfd-6f966113d31b client_id: 16 trx_id: 169 flags: 3 (start_transaction | commit)
mariadbd: /home/panda/galera/galera/src/replicator_smm.cpp:3293: wsrep_status_t galera::ReplicatorSMM::finish_cert(galera::TrxHandleMaster*, const galera::TrxHandleSlavePtr&): Assertion `ts->global_seqno() == cert_.position() + 1' failed.
231110 20:07:46 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.5.23-MariaDB-debug-log source revision: 5deb8be746e988788ed6e5d91a935a02769bdd93
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=4
max_threads=153
thread_count=7
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63769 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f14d00307f8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f1528211bb8 thread_stack 0x49000
mysys/stacktrace.c:212(my_print_stacktrace)[0x5603d89fd361]
sql/signal_handler.cc:241(handle_fatal_signal)[0x5603d80700d9]
libc_sigaction.c:0(__restore_rt)[0x7f1536354520]
nptl/pthread_kill.c:44(__pthread_kill_implementation)[0x7f15363a89fc]
posix/raise.c:27(__GI_raise)[0x7f1536354476]
stdlib/abort.c:81(__GI_abort)[0x7f153633a7f3]
intl/loadmsgcat.c:1177(_nl_load_domain)[0x7f153633a71b]
/lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f153634be96]
src/replicator_smm.cpp:3296(galera::ReplicatorSMM::finish_cert(galera::TrxHandleMaster*, boost::shared_ptr<galera::TrxHandleSlave> const&))[0x7f1531b3cecf]
src/replicator_smm.cpp:3376(galera::ReplicatorSMM::cert(galera::TrxHandleMaster*, boost::shared_ptr<galera::TrxHandleSlave> const&))[0x7f1531b46141]
src/replicator_smm.cpp:3399(galera::ReplicatorSMM::cert_and_catch(galera::TrxHandleMaster*, boost::shared_ptr<galera::TrxHandleSlave> const&))[0x7f1531b3d30a]
src/replicator_smm.cpp:1056(galera::ReplicatorSMM::replay_trx(galera::TrxHandleMaster&, galera::TrxHandleLock&, void*))[0x7f1531b2fa0b]
src/wsrep_provider.cpp:339(galera_replay_trx)[0x7f1531b01f67]
src/wsrep_provider_v26.cpp:984(wsrep::wsrep_provider_v26::replay(wsrep::ws_handle const&, wsrep::high_priority_service*))[0x5603d8bfc3c5]
sql/wsrep_client_service.cc:297(Wsrep_client_service::replay())[0x5603d847792a]
src/transaction.cpp:2047(wsrep::transaction::replay(std::unique_lock<wsrep::mutex>&))[0x5603d8bf4492]
src/transaction.cpp:882(wsrep::transaction::after_statement(std::unique_lock<wsrep::mutex>&))[0x5603d8befa32]
src/client_state.cpp:266(wsrep::client_state::after_statement())[0x5603d8bcb84d]
sql/wsrep_trans_observer.h:470(wsrep_after_statement(THD*))[0x5603d7ccd13a]
sql/sql_parse.cc:7952(wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x5603d7ce61f8]
sql/sql_parse.cc:1877(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x5603d7cd1c81]
sql/sql_parse.cc:1375(do_command(THD*))[0x5603d7cd0521]
sql/sql_connect.cc:1416(do_handle_one_connection(CONNECT*, bool))[0x5603d7e99ce6]
sql/sql_connect.cc:1320(handle_one_connection)[0x5603d7e99a5b]
perfschema/pfs.cc:2203(pfs_spawn_thread)[0x5603d83f0ea0]
nptl/pthread_create.c:442(start_thread)[0x7f15363a6ac3]
x86_64/clone3.S:83(__clone3)[0x7f1536438a40]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 1
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway. 
Query: 
 
Writing a core file...
Working directory at /home/panda/mariadb-10.5/build/mysql-test/var/mysqld.1/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             63457                63457                processes 
Max open files            1024                 1024                 files     
Max locked memory         2094301184           2094301184           bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       63457                63457                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E


Generated at Thu Feb 08 10:33:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.