Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29265

Assertion `mode_ == m_local || transaction_.is_streaming()' during SST

Details

    Description

      Within a 2-node galera cluster, one node has crashed. You can find both nodes logs below:

      mariadbd: ./wsrep-lib/include/wsrep/client_state.hpp:668: int wsrep::client_state::bf_abort(wsrep::seqno): Assertion `mode_ == m_local || transaction_.is_streaming()' failed.
      220807 12:59:06 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.6.7-MariaDB-2ubuntu1.1-log
      key_buffer_size=268435456
      read_buffer_size=131072
      max_used_connections=50
      max_threads=5002
      thread_count=64
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 11276717 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f2280000c68
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f3bc00e2ca8 thread_stack 0x49000
      /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x55a8e7586702]
      /usr/sbin/mariadbd(handle_fatal_signal+0x478)[0x55a8e70c14d8]
      /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f3be3990520]
      /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f3be39e4a7c]
      ??:0(__sigaction)[0x7f3be3990476]
      ??:0(abort)[0x7f3be39767f3]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f3be397671b]
      ??:0(__assert_fail)[0x7f3be3987e96]
      /usr/sbin/mariadbd(_Z14wsrep_bf_abortP3THDS0_+0x607)[0x55a8e736b4c7]
      /usr/sbin/mariadbd(wsrep_thd_bf_abort+0x1d)[0x55a8e73715ad]
      ??:0(wsrep_bf_abort(THD*, THD*))[0x55a8e738e8a2]
      ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55a8e6d4c827]
      ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a8e6d4c979]
      ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a8e741d6c5]
      ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x55a8e741ea0e]
      ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x55a8e74202d1]
      ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x55a8e744f771]
      ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x55a8e745014f]
      ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x55a8e7430d30]
      ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x55a8e7392b74]
      ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55a8e70cedaa]
      ??:0(handler::ha_update_row(unsigned char const*, unsigned char const*))[0x55a8e71e41ba]
      ??:0(Update_rows_log_event::do_exec_row(rpl_group_info*))[0x55a8e71d7b87]
      ??:0(Rows_log_event::do_apply_event(rpl_group_info*))[0x55a8e7369470]
      ??:0(wsrep_apply_events(THD*, Relay_log_info*, void const*, unsigned long))[0x55a8e7350fe0]
      ??:0(Wsrep_high_priority_service::remove_fragments(wsrep::ws_meta const&))[0x55a8e7351e26]
      ??:0(Wsrep_applier_service::apply_write_set(wsrep::ws_meta const&, wsrep::const_buffer const&, wsrep::mutable_buffer&))[0x55a8e75fbefb]
      ??:0(wsrep::server_state::start_streaming_applier(wsrep::id const&, wsrep::transaction_id const&, wsrep::high_priority_service*))[0x55a8e760e72e]
      /usr/lib/galera/libgalera_smm.so(+0x53b14)[0x7f3bd24b4b14]
      /usr/lib/galera/libgalera_smm.so(+0x5bb55)[0x7f3bd24bcb55]
      /usr/lib/galera/libgalera_smm.so(+0x67ba8)[0x7f3bd24c8ba8]
      /usr/lib/galera/libgalera_smm.so(+0x87595)[0x7f3bd24e8595]
      src/trx_handle.cpp:396(galera::TrxHandleSlave::apply(void*, wsrep_cb_status (*)(void*, wsrep_ws_handle const*, unsigned int, wsrep_buf const*, wsrep_trx_meta const*, bool*), wsrep_trx_meta const&, bool&))[0x7f3bd24c0cd0]
      /usr/lib/galera/libgalera_smm.so(+0x463c1)[0x7f3bd24a73c1]
      /usr/sbin/mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x55a8e760edd2]
      /usr/sbin/mariadbd(+0xc55577)[0x55a8e736b577]
      ??:0(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*))[0x55a8e735c5a3]
      ??:0(start_wsrep_THD(void*))[0x55a8e72eb386]
      /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f3be39e2b43]
      /lib/x86_64-linux-gnu/libc.so.6(+0x126a00)[0x7f3be3a74a00]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f3bc614b6cb): UPDATE incidents set    status_last_update='2022-08-07 12:59:05'
                                                      , status_changed_users_id='2526'
                                                      , status_id=3402
                                                  where incidents_id=110443
       
      Connection ID (thread ID): 7
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units
      Max cpu time              unlimited            unlimited            seconds
      Max file size             unlimited            unlimited            bytes
      Max data size             unlimited            unlimited            bytes
      Max stack size            8388608              unlimited            bytes
      Max core file size        0                    unlimited            bytes
      Max resident set          unlimited            unlimited            bytes
      Max processes             514410               514410               processes
      Max open files            1048576              1048576              files
      Max locked memory         524288               524288               bytes
      Max address space         unlimited            unlimited            bytes
      Max file locks            unlimited            unlimited            locks
      Max pending signals       514410               514410               signals
      Max msgqueue size         819200               819200               bytes
      Max nice priority         0                    0
      Max realtime priority     0                    0
      Max realtime timeout      unlimited            unlimited            us
      Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
      

      Here is other node log:

      2022-08-07 12:59:21 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 368fe320-a19b with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 1813 rttvar: 3317 rto: 204000 lost: 0 last_data_recv: 3008 cwnd: 10 last_queued_since: 4874349 last_delivered_since: 3005087293 send_queue_length: 0 send_queue_bytes: 0 segment: 0 messages: 0
      2022-08-07 12:59:21 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.1:4567
      2022-08-07 12:59:22 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') reconnecting to 368fe320-a19b (tcp://192.168.0.1:4567), attempt 0
      2022-08-07 12:59:23 0 [Note] WSREP: evs::proto(2c964753-8f33, OPERATIONAL, view_id(REG,2c964753-8f33,6)) suspecting node: 368fe320-a19b
      2022-08-07 12:59:23 0 [Note] WSREP: evs::proto(2c964753-8f33, OPERATIONAL, view_id(REG,2c964753-8f33,6)) suspected node without join message, declaring inactive
      2022-08-07 12:59:24 0 [Note] WSREP: view(view_id(NON_PRIM,2c964753-8f33,6) memb {
              2c964753-8f33,0
      } joined {
      } left {
      } partitioned {
              368fe320-a19b,0
      })
      2022-08-07 12:59:24 0 [Note] WSREP: view(view_id(NON_PRIM,2c964753-8f33,7) memb {
              2c964753-8f33,0
      } joined {
      } left {
      } partitioned {
              368fe320-a19b,0
      })
      2022-08-07 12:59:24 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2022-08-07 12:59:24 0 [Note] WSREP: Flow-control interval: [16, 16]
      2022-08-07 12:59:24 0 [Note] WSREP: Received NON-PRIMARY.
      2022-08-07 12:59:24 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 19818073)
      2022-08-07 12:59:24 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2022-08-07 12:59:24 58815930 [Warning] WSREP: Send action {(nil), 664, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 0 [Note] WSREP: Flow-control interval: [16, 16]
      2022-08-07 12:59:24 0 [Note] WSREP: Received NON-PRIMARY.
      2022-08-07 12:59:24 58815935 [Warning] WSREP: Send action {(nil), 1272, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58815943 [Warning] WSREP: Send action {(nil), 664, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58815977 [Warning] WSREP: Send action {(nil), 784, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816029 [Warning] WSREP: Send action {(nil), 792, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816100 [Warning] WSREP: Send action {(nil), 1928, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816107 [Warning] WSREP: Send action {(nil), 840, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816112 [Warning] WSREP: Send action {(nil), 656, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 14 [Note] WSREP: ================================================
      View:
        id: 2c96698a-0fdf-11ed-90d6-7ecc8fa70984:19818073
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(1):
              0: 2c964753-0fdf-11ed-8f33-fb3ea59a5a33, ovh6.1check.com
      =================================================
      2022-08-07 12:59:24 14 [Note] WSREP: Non-primary view
      2022-08-07 12:59:24 58816115 [Warning] WSREP: Send action {(nil), 656, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816199 [Warning] WSREP: Send action {(nil), 1256, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 14 [Note] WSREP: Server status change synced -> connected
      2022-08-07 12:59:24 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2022-08-07 12:59:24 58816203 [Warning] WSREP: Send action {(nil), 496, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2022-08-07 12:59:24 14 [Note] WSREP: ================================================
      View:
        id: 2c96698a-0fdf-11ed-90d6-7ecc8fa70984:19818073
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(1):
              0: 2c964753-0fdf-11ed-8f33-fb3ea59a5a33, ovh6.1check.com
      =================================================
      2022-08-07 12:59:24 14 [Note] WSREP: Non-primary view
      2022-08-07 12:59:24 14 [Note] WSREP: Server status change connected -> connected
      2022-08-07 12:59:24 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2022-08-07 12:59:24 58816207 [Warning] WSREP: Send action {(nil), 792, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2022-08-07 12:59:24 58816209 [Warning] WSREP: Send action {(nil), 776, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816290 [Warning] WSREP: Send action {(nil), 656, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816330 [Warning] WSREP: Send action {(nil), 2640, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816365 [Warning] WSREP: Send action {(nil), 664, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816409 [Warning] WSREP: Send action {(nil), 784, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816410 [Warning] WSREP: Send action {(nil), 445056, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816419 [Warning] WSREP: Send action {(nil), 656, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816420 [Warning] WSREP: Send action {(nil), 622056, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816424 [Warning] WSREP: Send action {(nil), 354216, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816425 [Warning] WSREP: Send action {(nil), 587704, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816565 [Warning] WSREP: Send action {(nil), 848, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816569 [Warning] WSREP: Send action {(nil), 888, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816571 [Warning] WSREP: Send action {(nil), 1256, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816574 [Warning] WSREP: Send action {(nil), 2672, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816580 [Warning] WSREP: Send action {(nil), 1264, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816727 [Warning] WSREP: Send action {(nil), 776, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816731 [Warning] WSREP: Send action {(nil), 1384, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816751 [Warning] WSREP: Send action {(nil), 1384, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816755 [Warning] WSREP: Send action {(nil), 1264, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816763 [Warning] WSREP: Send action {(nil), 656, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816769 [Warning] WSREP: Send action {(nil), 856, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816773 [Warning] WSREP: Send action {(nil), 776, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816767 [Warning] WSREP: Send action {(nil), 3549536, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816864 [Warning] WSREP: Send action {(nil), 3472, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816958 [Warning] WSREP: Send action {(nil), 656, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816962 [Warning] WSREP: Send action {(nil), 584, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816966 [Warning] WSREP: Send action {(nil), 784, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816969 [Warning] WSREP: Send action {(nil), 888, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58794613 [Warning] WSREP: Send action {(nil), 664, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58816991 [Warning] WSREP: Send action {(nil), 1272, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817066 [Warning] WSREP: Send action {(nil), 784, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817211 [Warning] WSREP: Send action {(nil), 888, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817224 [Warning] WSREP: Send action {(nil), 664, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817229 [Warning] WSREP: Send action {(nil), 656, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817233 [Warning] WSREP: Send action {(nil), 776, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817331 [Warning] WSREP: Send action {(nil), 776, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817438 [Warning] WSREP: Send action {(nil), 912, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817548 [Warning] WSREP: Send action {(nil), 664, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817645 [Warning] WSREP: Send action {(nil), 912, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817648 [Warning] WSREP: Send action {(nil), 784, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817691 [Warning] WSREP: Send action {(nil), 528, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817779 [Warning] WSREP: Send action {(nil), 555520, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:24 58817886 [Warning] WSREP: Send action {(nil), 848, WRITESET} returned -107 (Transport endpoint is not connected)
      2022-08-07 12:59:25 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 129 rttvar: 64 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000635628 last_delivered_since: 3000635628 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 12:59:29 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 150 rttvar: 75 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000046360 last_delivered_since: 3000046360 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 12:59:33 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 144 rttvar: 72 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000007511 last_delivered_since: 3000007511 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 12:59:37 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 92 rttvar: 46 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000093147 last_delivered_since: 3000093147 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 12:59:41 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 104 rttvar: 52 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000125469 last_delivered_since: 3000125469 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 12:59:45 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 98 rttvar: 49 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000138902 last_delivered_since: 3000138902 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 12:59:49 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 107 rttvar: 53 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000076137 last_delivered_since: 3000076137 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 12:59:53 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 136 rttvar: 68 rto: 200000 lost: 0 last_data_recv: 3004 cwnd: 10 last_queued_since: 3000132517 last_delivered_since: 3000132517 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 12:59:57 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 71 rttvar: 35 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000305696 last_delivered_since: 3000305696 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:01 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 230 rttvar: 115 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000022184 last_delivered_since: 3000022184 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:05 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 102 rttvar: 51 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000129301 last_delivered_since: 3000129301 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:09 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 94 rttvar: 47 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000115407 last_delivered_since: 3000115407 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:13 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 80 rttvar: 40 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000233084 last_delivered_since: 3000233084 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:17 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 99 rttvar: 49 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000139482 last_delivered_since: 3000139482 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:21 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 90 rttvar: 45 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000107178 last_delivered_since: 3000107178 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:25 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 100 rttvar: 50 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000138070 last_delivered_since: 3000138070 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:29 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 114 rttvar: 57 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000134852 last_delivered_since: 3000134852 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:33 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 103 rttvar: 51 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000161454 last_delivered_since: 3000161454 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:37 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 106 rttvar: 53 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000170602 last_delivered_since: 3000170602 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:41 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 108 rttvar: 54 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000223534 last_delivered_since: 3000223534 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:45 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 102 rttvar: 51 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000219764 last_delivered_since: 3000219764 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:49 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 114 rttvar: 57 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000123103 last_delivered_since: 3000123103 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:53 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 98 rttvar: 49 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000112880 last_delivered_since: 3000112880 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:00:57 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 98 rttvar: 49 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000162084 last_delivered_since: 3000162084 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:01:01 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://192.168.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 92 rttvar: 46 rto: 200000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 3000047362 last_delivered_since: 3000047362 send_queue_length: 0 send_queue_bytes: 0
      2022-08-07 13:01:11 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') reconnecting to 368fe320-a19b (tcp://192.168.0.1:4567), attempt 30
      2022-08-07 13:01:12 0 [Note] WSREP: (2c964753-8f33, 'tcp://0.0.0.0:4567') connection established to 368fe320-a19c tcp://192.168.0.1:4567
      2022-08-07 13:01:12 0 [Note] WSREP: remote endpoint tcp://192.168.0.1:4567 changed identity 368fe320-11cc-11ed-a19b-3bac27a309d1 -> 368fe320-11cc-11ed-a19c-3bac27a309d1
      2022-08-07 13:01:13 0 [Note] WSREP: declaring 368fe320-a19c at tcp://192.168.0.1:4567 stable
      2022-08-07 13:01:13 0 [Note] WSREP: re-bootstrapping prim from partitioned components
      

      Attachments

        Issue Links

          Activity

            Ours were crashing every 3-4 hours during weekday business hours.
            I disabled galera and ran on a single 10.6.8 server which was solid for 24 hours which seems to confirm the problem is related to galera/replication.

            Ultimately decided to roll back to a 10.3 backup image and replicate data changes since the upgrade into the recovered server, then added back in 10.3 cluster members.
            Not a single crash after 36 hours in a 3x node cluster on 10.3. I guess I will have to wait for a fix and try an upgrade again in some months.

            MrJemson Andrew Robinson added a comment - Ours were crashing every 3-4 hours during weekday business hours. I disabled galera and ran on a single 10.6.8 server which was solid for 24 hours which seems to confirm the problem is related to galera/replication. Ultimately decided to roll back to a 10.3 backup image and replicate data changes since the upgrade into the recovered server, then added back in 10.3 cluster members. Not a single crash after 36 hours in a 3x node cluster on 10.3. I guess I will have to wait for a fix and try an upgrade again in some months.

            I feel lucky as we've not experienced any crash since 3 days. This assert on is_streaming() might be network related? But sure enough it's galera related.

            Even more lucky as I tried reverting to 10.3 and 10.4 but the mariabackup used to boot-strap a replication would not work and we can't stop our service to dump/restore the DB.

            Still such instability and the lack of concerns is very worrying.

            ccounotte COUNOTTE CEDRIC added a comment - I feel lucky as we've not experienced any crash since 3 days. This assert on is_streaming() might be network related? But sure enough it's galera related. Even more lucky as I tried reverting to 10.3 and 10.4 but the mariabackup used to boot-strap a replication would not work and we can't stop our service to dump/restore the DB. Still such instability and the lack of concerns is very worrying.

            New server crash today, which resulted in the server hanging, while remaining 2 nodes kept serving queries.

            mariadbd: ./wsrep-lib/include/wsrep/client_state.hpp:668: int wsrep::client_state::bf_abort(wsrep::seqno): Assertion `mode_ == m_local || transaction_.is_streaming()' failed.
            220816 10:45:24 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
             
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
             
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.
             
            Server version: 10.6.7-MariaDB-2ubuntu1.1-log
            key_buffer_size=268435456
            read_buffer_size=131072
            max_used_connections=100
            max_threads=5002
            thread_count=114
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 11276717 K  bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.
             
            Thread pointer: 0x7faf48000c68
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x7fc5b415aca8 thread_stack 0x49000
            /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x564af1bc2702]
            ??:0(my_print_stacktrace)[0x564af16fd4d8]
            /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fc5f74e0520]
            ??:0(__sigaction)[0x7fc5f7534a7c]
            ??:0(raise)[0x7fc5f74e0476]
            ??:0(abort)[0x7fc5f74c67f3]
            /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7fc5f74c671b]
            ??:0(__assert_fail)[0x7fc5f74d7e96]
            /usr/sbin/mariadbd(_Z14wsrep_bf_abortP3THDS0_+0x607)[0x564af19a74c7]
            ??:0(wsrep_bf_abort(THD*, THD*))[0x564af19ad5ad]
            ??:0(wsrep_thd_bf_abort)[0x564af19ca8a2]
            ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x564af1388827]
            ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x564af1388979]
            ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x564af1a596c5]
            ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x564af1a5aa0e]
            ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x564af1a5c2d1]
            ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x564af1a8b771]
            ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x564af1a8c14f]
            ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x564af1a6cd30]
            ??:0(void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&))[0x564af19ceb74]
            ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x564af170adaa]
            ??:0(handler::ha_update_row(unsigned char const*, unsigned char const*))[0x564af18201ba]
            ??:0(Update_rows_log_event::do_exec_row(rpl_group_info*))[0x564af1813b87]
            ??:0(Rows_log_event::do_apply_event(rpl_group_info*))[0x564af19a5470]
            ??:0(wsrep_apply_events(THD*, Relay_log_info*, void const*, unsigned long))[0x564af198cfe0]
            ??:0(Wsrep_high_priority_service::remove_fragments(wsrep::ws_meta const&))[0x564af198de26]
            ??:0(wsrep::server_state::start_streaming_applier(wsrep::id const&, wsrep::transaction_id const&, wsrep::high_priority_service*))[0x564af1c37efb]
            ??:0(wsrep::wsrep_provider_v26::options[abi:cxx11]() const)[0x564af1c4a72e]
            /usr/lib/galera/libgalera_smm.so(+0x53b14)[0x7fc5e6004b14]
            /usr/lib/galera/libgalera_smm.so(+0x5bb55)[0x7fc5e600cb55]
            /usr/lib/galera/libgalera_smm.so(+0x67ba8)[0x7fc5e6018ba8]
            src/trx_handle.cpp:396(galera::TrxHandleSlave::apply(void*, wsrep_cb_status (*)(void*, wsrep_ws_handle const*, unsigned int, wsrep_buf const*, wsrep_trx_meta const*, bool*), wsrep_trx_meta const&, bool&))[0x7fc5e6038595]
            /usr/lib/galera/libgalera_smm.so(+0x5fcd0)[0x7fc5e6010cd0]
            /usr/lib/galera/libgalera_smm.so(+0x463c1)[0x7fc5e5ff73c1]
            /usr/sbin/mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x564af1c4add2]
            ??:0(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*))[0x564af19a7577]
            ??:0(wsrep_bf_abort(THD*, THD*))[0x564af19985a3]
            ??:0(start_wsrep_THD(void*))[0x564af1927386]
            /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7fc5f7532b43]
            ??:0(pthread_condattr_setpshared)[0x7fc5f75c4a00]
             
            Trying to get some variables.
            Some pointers may be invalid and cause the dump to abort.
            Query (0x7fc5dd4772ab): UPDATE _1check_RFR003007.incidents set status_last_update='2022-08-16 10:45:23'
                                                            , status_changed_users_id='51'
                                                            , status_id=9
                                                        where incidents_id=1062
             
            Connection ID (thread ID): 12
            Status: NOT_KILLED
             
            Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
             
            The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
            information that should help you find out what is causing the crash.
            Writing a core file...
            Working directory at /var/lib/mysql
            Resource Limits:
            Limit                     Soft Limit           Hard Limit           Units
            Max cpu time              unlimited            unlimited            seconds
            Max file size             unlimited            unlimited            bytes
            Max data size             unlimited            unlimited            bytes
            Max stack size            8388608              unlimited            bytes
            Max core file size        unlimited            unlimited            bytes
            Max resident set          unlimited            unlimited            bytes
            Max processes             514397               514397               processes
            Max open files            1048576              1048576              files
            Max locked memory         524288               524288               bytes
            Max address space         unlimited            unlimited            bytes
            Max file locks            unlimited            unlimited            locks
            Max pending signals       514397               514397               signals
            Max msgqueue size         819200               819200               bytes
            Max nice priority         0                    0
            Max realtime priority     0                    0
            Max realtime timeout      unlimited            unlimited            us
            Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
            

            It would be nice if this bug was taken care of asap.

            ccounotte COUNOTTE CEDRIC added a comment - New server crash today, which resulted in the server hanging, while remaining 2 nodes kept serving queries. mariadbd: ./wsrep-lib/include/wsrep/client_state.hpp: 668 : int wsrep::client_state::bf_abort(wsrep::seqno): Assertion `mode_ == m_local || transaction_.is_streaming()' failed. 220816 10 : 45 : 24 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.   To report this bug, see https: //mariadb.com/kb/en/reporting-bugs   We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.   Server version: 10.6 . 7 -MariaDB-2ubuntu1. 1 -log key_buffer_size= 268435456 read_buffer_size= 131072 max_used_connections= 100 max_threads= 5002 thread_count= 114 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 11276717 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.   Thread pointer: 0x7faf48000c68 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this , something went terribly wrong... stack_bottom = 0x7fc5b415aca8 thread_stack 0x49000 /usr/sbin/mariadbd(my_print_stacktrace+ 0x32 )[ 0x564af1bc2702 ] ??: 0 (my_print_stacktrace)[ 0x564af16fd4d8 ] /lib/x86_64-linux-gnu/libc.so. 6 (+ 0x42520 )[ 0x7fc5f74e0520 ] ??: 0 (__sigaction)[ 0x7fc5f7534a7c ] ??: 0 (raise)[ 0x7fc5f74e0476 ] ??: 0 (abort)[ 0x7fc5f74c67f3 ] /lib/x86_64-linux-gnu/libc.so. 6 (+ 0x2871b )[ 0x7fc5f74c671b ] ??: 0 (__assert_fail)[ 0x7fc5f74d7e96 ] /usr/sbin/mariadbd(_Z14wsrep_bf_abortP3THDS0_+ 0x607 )[ 0x564af19a74c7 ] ??: 0 (wsrep_bf_abort(THD*, THD*))[ 0x564af19ad5ad ] ??: 0 (wsrep_thd_bf_abort)[ 0x564af19ca8a2 ] ??: 0 (wsrep_notify_status(wsrep::server_state::state, wsrep::view const *))[ 0x564af1388827 ] ??: 0 (Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const &))[ 0x564af1388979 ] ??: 0 (Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const &))[ 0x564af1a596c5 ] ??: 0 ( void std::this_thread::sleep_for< long , std::ratio<1l, 1l> >(std::chrono::duration< long , std::ratio<1l, 1l> > const &))[ 0x564af1a5aa0e ] ??: 0 ( void std::this_thread::sleep_for< long , std::ratio<1l, 1l> >(std::chrono::duration< long , std::ratio<1l, 1l> > const &))[ 0x564af1a5c2d1 ] ??: 0 ( void std::this_thread::sleep_for< long , std::ratio<1l, 1l> >(std::chrono::duration< long , std::ratio<1l, 1l> > const &))[ 0x564af1a8b771 ] ??: 0 ( void std::this_thread::sleep_for< long , std::ratio<1l, 1l> >(std::chrono::duration< long , std::ratio<1l, 1l> > const &))[ 0x564af1a8c14f ] ??: 0 ( void std::this_thread::sleep_for< long , std::ratio<1l, 1l> >(std::chrono::duration< long , std::ratio<1l, 1l> > const &))[ 0x564af1a6cd30 ] ??: 0 ( void std::this_thread::sleep_for< long , std::ratio<1l, 1l> >(std::chrono::duration< long , std::ratio<1l, 1l> > const &))[ 0x564af19ceb74 ] ??: 0 (wsrep_notify_status(wsrep::server_state::state, wsrep::view const *))[ 0x564af170adaa ] ??: 0 (handler::ha_update_row(unsigned char const *, unsigned char const *))[ 0x564af18201ba ] ??: 0 (Update_rows_log_event::do_exec_row(rpl_group_info*))[ 0x564af1813b87 ] ??: 0 (Rows_log_event::do_apply_event(rpl_group_info*))[ 0x564af19a5470 ] ??: 0 (wsrep_apply_events(THD*, Relay_log_info*, void const *, unsigned long ))[ 0x564af198cfe0 ] ??: 0 (Wsrep_high_priority_service::remove_fragments(wsrep::ws_meta const &))[ 0x564af198de26 ] ??: 0 (wsrep::server_state::start_streaming_applier(wsrep::id const &, wsrep::transaction_id const &, wsrep::high_priority_service*))[ 0x564af1c37efb ] ??: 0 (wsrep::wsrep_provider_v26::options[abi:cxx11]() const )[ 0x564af1c4a72e ] /usr/lib/galera/libgalera_smm.so(+ 0x53b14 )[ 0x7fc5e6004b14 ] /usr/lib/galera/libgalera_smm.so(+ 0x5bb55 )[ 0x7fc5e600cb55 ] /usr/lib/galera/libgalera_smm.so(+ 0x67ba8 )[ 0x7fc5e6018ba8 ] src/trx_handle.cpp: 396 (galera::TrxHandleSlave::apply( void *, wsrep_cb_status (*)( void *, wsrep_ws_handle const *, unsigned int , wsrep_buf const *, wsrep_trx_meta const *, bool*), wsrep_trx_meta const &, bool&))[ 0x7fc5e6038595 ] /usr/lib/galera/libgalera_smm.so(+ 0x5fcd0 )[ 0x7fc5e6010cd0 ] /usr/lib/galera/libgalera_smm.so(+ 0x463c1 )[ 0x7fc5e5ff73c1 ] /usr/sbin/mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+ 0x12 )[ 0x564af1c4add2 ] ??: 0 (wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*))[ 0x564af19a7577 ] ??: 0 (wsrep_bf_abort(THD*, THD*))[ 0x564af19985a3 ] ??: 0 (start_wsrep_THD( void *))[ 0x564af1927386 ] /lib/x86_64-linux-gnu/libc.so. 6 (+ 0x94b43 )[ 0x7fc5f7532b43 ] ??: 0 (pthread_condattr_setpshared)[ 0x7fc5f75c4a00 ]   Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query ( 0x7fc5dd4772ab ): UPDATE _1check_RFR003007.incidents set status_last_update= '2022-08-16 10:45:23' , status_changed_users_id= '51' , status_id= 9 where incidents_id= 1062   Connection ID (thread ID): 12 Status: NOT_KILLED   Optimizer switch : index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off   The manual page at https: //mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /var/lib/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 514397 514397 processes Max open files 1048576 1048576 files Max locked memory 524288 524288 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 514397 514397 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E It would be nice if this bug was taken care of asap.

            Since I upgraded to 10.6.9, servers didn't crash, however it got worse because now the whole cluster gets stuck as described here: https://jira.mariadb.org/browse/MDEV-29388.

            ccounotte COUNOTTE CEDRIC added a comment - Since I upgraded to 10.6.9, servers didn't crash, however it got worse because now the whole cluster gets stuck as described here: https://jira.mariadb.org/browse/MDEV-29388 .
            sysprg Julius Goryavsky added a comment - - edited

            According to the Codership team, this issue is closed with the fix for https://jira.mariadb.org/browse/MDEV-34836; Since MDEV-29265 is not directly reproducible in existing mtr tests, I am closing this issue as solved (fixed) with the fix for MDEV-34836. If the issue reoccurs, it should be investigated separately on a new basis - since according to the current view, the fix for MDEV-34836 should close it.

            sysprg Julius Goryavsky added a comment - - edited According to the Codership team, this issue is closed with the fix for https://jira.mariadb.org/browse/MDEV-34836 ; Since MDEV-29265 is not directly reproducible in existing mtr tests, I am closing this issue as solved (fixed) with the fix for MDEV-34836 . If the issue reoccurs, it should be investigated separately on a new basis - since according to the current view, the fix for MDEV-34836 should close it.

            People

              sysprg Julius Goryavsky
              ccounotte COUNOTTE CEDRIC
              Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.