[MDEV-24991] mariadb-10.5.9/wsrep-lib/include/wsrep/client_state.hpp:508: int wsrep::client_state::ordered_commit(): Assertion `owning_thread_id_ == wsrep::this_thread::get_id()' failed Created: 2021-02-26  Updated: 2021-05-31  Resolved: 2021-05-31

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.5.9
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Allen Lee (Inactive) Assignee: Teemu Ollakka
Resolution: Incomplete Votes: 6
Labels: need_feedback
Environment:

CentOS Linux release 7.9.2009 (Core), 1TB RAM with Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GH


Attachments: File server.cnf     Text File show_variables.txt    
Issue Links:
Duplicate
duplicates MDEV-24954 10.5.9 crashes on int wsrep::client_s... Closed

 Description   

Customer hit 'ERROR] mysqld got signal 6' 3 times during the sysbench test.
This is 3 node galera cluster and binlog enabled for other DC replication slave.

  • Stacktrace :

    mariadbd: /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.9/wsrep-lib/include/wsrep/client_state.hpp:508: int wsrep::client_state::ordered_commit(): Assertion `owning_thread_id_ == wsrep::this_thread::get_id()' failed.
    210225  8:39:21 [ERROR] mysqld got signal 6 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
     
    To report this bug, see https://mariadb.com/kb/en/reporting-bugs
     
    We will try our best to scrape up some info that will hopefully help
    diagnose the problem, but since we have already crashed, 
    something is definitely wrong and this may fail.
     
    Server version: 10.5.9-MariaDB-log
    key_buffer_size=134217728
    read_buffer_size=1048576
    max_used_connections=0
    max_threads=9652
    thread_count=102
    It is possible that mysqld could use up to 
    key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 30027362 K  bytes of memory
    Hope that's ok; if not, decrease some variables in the equation.
     
    Thread pointer: 0x7eb5000009b8
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...
    stack_bottom = 0x7eb5eda96cb0 thread_stack 0x40000
    ??:0(my_print_stacktrace)[0x56101be667fe]
    ??:0(handle_fatal_signal)[0x56101b86aa37]
    sigaction.c:0(__restore_rt)[0x7f6bb9091630]
    :0(__GI_raise)[0x7f6bb70ea3d7]
    :0(__GI_abort)[0x7f6bb70ebac8]
    :0(__assert_fail_base)[0x7f6bb70e31a6]
    :0(__GI___assert_fail)[0x7f6bb70e3252]
    ??:0(wsrep_commit_ordered)[0x56101bb3699d]
    ??:0(std::pair<std::_Rb_tree_iterator<unsigned int>, bool> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_unique<unsigned int const&>(unsigned int const&))[0x56101bcace3e]
    ??:0(std::pair<std::_Rb_tree_iterator<unsigned int>, bool> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_unique<unsigned int const&>(unsigned int const&))[0x56101bcaaf9e]
    ??:0(std::pair<std::_Rb_tree_iterator<unsigned int>, bool> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_unique<unsigned int const&>(unsigned int const&))[0x56101bcab2e4]
    ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x56101bb4f3a0]
    ??:0(TC_LOG::run_commit_ordered(THD*, bool))[0x56101b95f96d]
    ??:0(MYSQL_BIN_LOG::trx_group_commit_leader(MYSQL_BIN_LOG::group_commit_entry*))[0x56101b964eaa]
    ??:0(MYSQL_BIN_LOG::write_transaction_to_binlog_events(MYSQL_BIN_LOG::group_commit_entry*))[0x56101b96529c]
    ??:0(MYSQL_BIN_LOG::write_transaction_to_binlog(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x56101b965770]
    ??:0(MYSQL_BIN_LOG::write_transaction_to_binlog(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x56101b965981]
    ??:0(MYSQL_BIN_LOG::log_and_order(THD*, unsigned long long, bool, bool, bool))[0x56101b967053]
    ??:0(ha_commit_trans(THD*, bool))[0x56101b87a88c]
    ??:0(trans_commit(THD*))[0x56101b76b6ae]
    ??:0(Wsrep_high_priority_service::commit(wsrep::ws_handle const&, wsrep::ws_meta const&))[0x56101bb17368]
    ??:0(wsrep::server_state::start_streaming_applier(wsrep::id const&, wsrep::transaction_id const&, wsrep::high_priority_service*))[0x56101bee7b9f]
    ??:0(wsrep::server_state::on_apply(wsrep::high_priority_service&, wsrep::ws_handle const&, wsrep::ws_meta const&, wsrep::const_buffer const&))[0x56101bee86b5]
    ??:0(wsrep::wsrep_provider_v26::last_committed_gtid() const)[0x56101bef90a8]
    src/trx_handle.cpp:387(galera::TrxHandleSlave::apply(void*, wsrep_cb_status (*)(void*, wsrep_ws_handle const*, unsigned int, wsrep_buf const*, wsrep_trx_meta const*, bool*), wsrep_trx_meta const&, bool&))[0x7f6ba7c124c0]
    src/replicator_smm.cpp:504(galera::ReplicatorSMM::apply_trx(void*, galera::TrxHandleSlave&))[0x7f6ba7c1e080]
    src/replicator_smm.cpp:2145(galera::ReplicatorSMM::process_trx(void*, boost::shared_ptr<galera::TrxHandleSlave> const&))[0x7f6ba7c24929]
    src/gcs_action_source.cpp:63(galera::GcsActionSource::process_writeset(void*, gcs_action const&, bool&))[0x7f6ba7c4f099]
    src/gcs_action_source.cpp:110(galera::GcsActionSource::dispatch(void*, gcs_action const&, bool&))[0x7f6ba7c4fe57]
    src/gcs_action_source.cpp:29(~Release)[0x7f6ba7c50384]
    src/replicator_smm.cpp:390(galera::ReplicatorSMM::async_recv(void*))[0x7f6ba7c24e7b]
    src/wsrep_provider.cpp:263(galera_recv)[0x7f6ba7c032b8]
    ??:0(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*))[0x56101bef97de]
    ??:0(wsrep_fire_rollbacker)[0x56101bb2ed78]
    ??:0(start_wsrep_THD(void*))[0x56101bb22243]
    ??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x56101bab811d]
    pthread_create.c:0(start_thread)[0x7f6bb9089ea5]
    ??:0(__clone)[0x7f6bb71b29fd]
     
    Trying to get some variables.
    Some pointers may be invalid and cause the dump to abort.
    Query (0x0): (null)
    Connection ID (thread ID): 55
    Status: NOT_KILLED
     
    Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
     
    The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
    information that should help you find out what is causing the crash.
     
    We think the query pointer is invalid, but we will try to print it anyway. 
    Query: 
     
    Writing a core file...
    Working directory at /var/lib/mysql/datadir
    Resource Limits:
    Limit                     Soft Limit           Hard Limit           Units     
    Max cpu time              unlimited            unlimited            seconds   
    Max file size             unlimited            unlimited            bytes     
    Max data size             unlimited            unlimited            bytes     
    Max stack size            8388608              unlimited            bytes     
    Max core file size        0                    unlimited            bytes     
    Max resident set          unlimited            unlimited            bytes     
    Max processes             4122688              4122688              processes 
    Max open files            65536                65536                files     
    Max locked memory         65536                65536                bytes     
    Max address space         unlimited            unlimited            bytes     
    Max file locks            unlimited            unlimited            locks     
    Max pending signals       4122688              4122688              signals   
    Max msgqueue size         819200               819200               bytes     
    Max nice priority         0                    0                    
    Max realtime priority     0                    0                    
    Max realtime timeout      unlimited            unlimited            us        
    Core pattern: |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h
     
    2021-02-25  8:40:20 0 [Note] WSREP: Loading provider /usr/lib64/galera/libgalera_smm.so initial position: 23945203-7647-11eb-bbf0-d3328adba549:91465900
    2021-02-25  8:40:20 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
    2021-02-25  8:40:20 0 [Note] WSREP: wsrep_load(): Galera 4.7(ree4f10f) by Codership Oy <info@codership.com> loaded successfully.
    2021-02-25  8:40:20 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
    2021-02-25  8:40:20 0 [Note] WSREP: Found saved state: 23945203-7647-11eb-bbf0-d3328adba549:-1, safe_to_bootstrap: 0
    2021-02-25  8:40:20 0 [Note] WSREP: GCache DEBUG: opened preamble:
    Version: 2
    UUID: 23945203-7647-11eb-bbf0-d3328adba549
    Seqno: -1 - -1
    Offset: -1
    Synced: 0
    



 Comments   
Comment by Teemu Ollakka [ 2021-03-23 ]

Configuration file shows that log_bin is enabled but log_slave_updates is not, and might be the reason for the crash.

Could the customer try if the crash is still reproducible with log_slave_updates is enabled?

Comment by Nathan Neulinger [ 2021-04-09 ]

I just saw this on one of my clusters with same conditions (log_bin without log_slave_updates), will try to update to enable log_slave_updates. No specific action triggering, just random failure out of the blue likely due to activity.

Generated at Thu Feb 08 09:34:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.