[MDEV-28736] Attempts to join crashed node to the cluster through IST ends with another crash after wsrep_recover Created: 2022-05-17  Updated: 2022-07-27  Resolved: 2022-07-04

Status: Closed
Project: MariaDB Server
Component/s: Galera, Platform FreeBSD
Affects Version/s: 10.6.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Yakov Kushnirsky Assignee: Yakov Kushnirsky
Resolution: Incomplete Votes: 2
Labels: None
Environment:

FreeBSD 13; using custom build TODO-3345


Issue Links:
Relates

 Description   

The customer reports that an attempt to join crashed node to the cluster through IST ends with another crash after wsrep_recover. Wsrep thread did not wait that recover thread to
finish the job, but initiate IST procedure and crashes again - backtraces are below.
And full error log is in the case. Platform is FreeBSD.

2022-05-15  1:55:21 96 [Note] WSREP: Starting applier thread 96
2022-05-15  1:55:21 97 [Note] WSREP: Starting applier thread 97
2022-05-15  1:55:21 98 [Note] WSREP: Starting applier thread 98
2022-05-15  1:55:21 101 [Note] WSREP: Starting applier thread 101
2022-05-15  1:55:21 99 [Note] WSREP: Starting applier thread 99
2022-05-15  1:55:21 0 [Note] /usr/local/libexec/mariadbd: ready for connections.
Version: '10.6.8-4-MariaDB-enterprise-log'  socket: '/tmp/mysql.sock'  port: 3306  MariaDB Enterprise Server
2022-05-15  1:55:21 103 [Note] WSREP: Starting applier thread 103
0x13ef71e <my_print_stacktrace+0x2e> at /usr/local/libexec/mariadbd
mysys/my_addr_resolve.c:299(my_addr_resolve)[0xce6460]
0x801935580 <pthread_sigmask+0x540> at /lib/libthr.so.3
thread/thr_sig.c:0(handle_signal)[0x801934b3f]
0x7ffffffff2d3 <__gxx_personality_v0+0x7ffffeb6ae03> at ???
0xd9ec24 <thd_get_thread_id+0x4> at /usr/local/libexec/mariadbd
mysys/my_addr_resolve.c:299(my_addr_resolve)[0x12b9590]
0x12b8f9b <_Z9lock_waitP9que_thr_t+0x5b> at /usr/local/libexec/mariadbd
0x132f9b1 <_Z23row_mysql_handle_errorsP7dberr_tP5trx_tP9que_thr_tP12trx_savept_t+0x61> at /usr/local/libexec/mariadbd
sql/mysqld.cc:1848(unireg_abort)[0x13497a6]
maria/ma_check.c:3055(writekeys)[0x11a7891]
maria/ma_check.c:3159(_ma_flush_table_files_before_swap)[0xbc80e2]
0xbd3d29 <_ZN7handler17rnd_pos_by_recordEPh+0x59> at /usr/local/libexec/mariadbd
perfschema/pfs_stat.h:76(PFS_single_stat::aggregate(PFS_single_stat const*))[0xc97b40]
0xc98b70 <_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0x190> at /usr/local/libexec/mariadbd
perfschema/table_events_statements.cc:239(table_events_statements_common::make_row_part_1(PFS_events_statements*, sql_digest_storage*))[0xc92f88]
0xd351b8 <_ZN9Log_event11apply_eventEP14rpl_group_info+0x68> at /usr/local/libexec/mariadbd
0x1189d12 <_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x392> at /usr/local/libexec/mariadbd
sql/sys_vars.inl:508(Sys_var_charptr_base)[0x116ba87]
0x147df0c <_ZN5wsrep12server_state8on_applyERNS_21high_priority_serviceERKNS_9ws_handleERKNS_7ws_metaERKNS_12const_bufferE+0x3cc> at /usr/local/libexec/mariadbd
0x1489eef <_ZN12_GLOBAL__N_18apply_cbEPvPK15wsrep_ws_handlejPK9wsrep_bufPK14wsrep_trx_metaPb+0xaf> at /usr/local/libexec/mariadbd
0x80391b690 <wsrep_ps_free_node_stat+0x9510> at /rw_part/usr-local/lib/mysql/libgalera_enterprise_smm.so
src/trx_handle.cpp:392(galera::TrxHandleSlave::apply(void*, wsrep_cb_status (*)(void*, wsrep_ws_handle const*, unsigned int, wsrep_buf const*, wsrep_trx_meta const*, bool*), wsrep_trx_meta const&, bool&))[0x80392143a]
0x803969b00 <wsrep_ps_free_node_stat+0x57980> at /rw_part/usr-local/lib/mysql/libgalera_enterprise_smm.so
0x803969441 <wsrep_ps_free_node_stat+0x572c1> at /rw_part/usr-local/lib/mysql/libgalera_enterprise_smm.so
0x803920847 <wsrep_ps_free_node_stat+0xe6c7> at /rw_part/usr-local/lib/mysql/libgalera_enterprise_smm.so
src/replicator_smm.cpp:538(galera::ReplicatorSMM::apply_trx(void*, galera::TrxHandleSlave&))[0x80390b001]
0x148a63b <_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0xb> at /usr/local/libexec/mariadbd
mysys/my_addr_resolve.c:299(my_addr_resolve)[0x118a20d]
0x117a083 <_Z15start_wsrep_THDPv+0x2e3> at /usr/local/libexec/mariadbd
0x110c177 <pfs_spawn_thread+0xd7> at /usr/local/libexec/mariadbd
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x1d02204e5b): UPDATE nl_game_providers.game_bettings SET  ext_bet_status = 'Won' WHERE provider_id = '2' AND recno = '13076845542'
 
Connection ID (thread ID): 37
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Core pattern: %N.core


Generated at Thu Feb 08 10:03:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.