Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38553

Server crashes in transaction replay

    XMLWordPrintable

Details

    • Bug
    • Status: In Progress (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.11
    • 10.11
    • Galera
    • None
    • Can result in hang or crash
    • Q1/2026 Galera Development

    Description

      Server crashes in transaction replay during RQG run on the first two nodes of a 3-node cluster.

      Branch 10.11-MDEV-22124
      version_source_revision 929d65d44e7dfbac96eb733783bda5a7ee7fa26e

      2026-01-13 14:27:29 1 [Warning] WSREP: Event 42 Write_rows_v1 apply failed: 121, seqno 3787
      mariadbd: /test/codership-mariadb-server/wsrep-lib/src/transaction.cpp:670: int wsrep::transaction::before_rollback(): Assertion `state() == s_executing || state() == s_preparing || state() == s_prepared || state() == s_must_abort || state() == s_aborting || state() == s_cert_failed || state() == s_must_replay' failed.
      260113 14:27:29 [ERROR] /opt/MDEV-38218-mariadb-10.11.16-linux-x86_64-dbg/bin/mariadbd got signal 6 ;
      Sorry, we probably made a mistake, and this is a bug.
       
      Your assistance in bug reporting will enable us to fix this for the next release.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs about how to report
      a bug on https://jira.mariadb.org/.
       
      Please include the information from the server start above, to the end of the
      information below.
       
      Server version: 10.11.16-MariaDB-debug-log source revision: 929d65d44e7dfbac96eb733783bda5a7ee7fa26e
       
      WSREP: Suppressing further logging
      WSREP: Shutting down network communications
       
      The information page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/
      contains instructions to obtain a better version of the backtrace below.
      Following these instructions will help MariaDB developers provide a fix quicker.
       
      Attempting backtrace. Include this in the bug report.
      (note: Retrieving this information may fail)
       
      Thread pointer: 0x708144038278
      stack_bottom = 0x7081dc4c0000 thread_stack 0x49000
      2026-01-13 14:27:29 0 [Note] /opt/MDEV-38218-mariadb-10.11.16-linux-x86_64-dbg/bin/mariadbd (initiated by: unknown): Normal shutdown
      2026-01-13 14:27:29 0 [Note] WSREP: Shutdown replication
      2026-01-13 14:27:29 0 [Note] WSREP: Server status change synced -> disconnecting
      2026-01-13 14:27:29 0 [Note] WSREP: Closing send monitor...
      2026-01-13 14:27:29 0 [Note] WSREP: Closed send monitor.
      2026-01-13 14:27:29 0 [Note] WSREP: gcomm: terminating thread
      2026-01-13 14:27:29 0 [Note] WSREP: gcomm: joining thread
      2026-01-13 14:27:29 0 [Note] WSREP: gcomm: closing backend
      2026-01-13 14:27:30 0 [Note] WSREP: (44d5b159-994e, 'tcp://0.0.0.0:19006') turning message relay requesting on, nonlive peers: tcp://127.0.0.1:19009 
      2026-01-13 14:27:31 0 [Note] WSREP: (44d5b159-994e, 'tcp://0.0.0.0:19006') reconnecting to 44d965e6-be0b (tcp://127.0.0.1:19009), attempt 0
      2026-01-13 14:27:31 0 [Note] WSREP: Failed to establish connection: Operation aborted.
      mysys/stacktrace.c:215(my_print_stacktrace)[0x5cce23ef2b68]
      sql/signal_handler.cc:230(handle_fatal_signal)[0x5cce2349e432]
      libc_sigaction.c:0(__restore_rt)[0x708211045330]
      nptl/pthread_kill.c:44(__pthread_kill_implementation)[0x70821109eb2c]
      posix/raise.c:27(__GI_raise)[0x70821104527e]
      stdlib/abort.c:81(__GI_abort)[0x7082110288ff]
      intl/loadmsgcat.c:1177(_nl_load_domain)[0x70821102881b]
      /lib/x86_64-linux-gnu/libc.so.6(+0x3b517)[0x70821103b517]
      2026-01-13 14:27:35 0 [Note] WSREP: Failed to establish connection: Operation aborted.
      src/transaction.cpp:679(wsrep::transaction::before_rollback())[0x5cce24001993]
      src/client_state.cpp:411(wsrep::client_state::before_rollback())[0x5cce23fd6041]
      sql/wsrep_trans_observer.h:449(wsrep_before_rollback(THD*, bool))[0x5cce234a0e12]
      sql/handler.cc:2332(ha_rollback_trans(THD*, bool))[0x5cce234a66cf]
      sql/transaction.cc:391(trans_rollback(THD*))[0x5cce232abec5]
      sql/wsrep_high_priority_service.cc:388(Wsrep_high_priority_service::rollback(wsrep::ws_handle const&, wsrep::ws_meta const&))[0x5cce238ca5a5]
      src/server_state.cpp:343(apply_write_set(wsrep::server_state&, wsrep::high_priority_service&, wsrep::ws_handle const&, wsrep::ws_meta const&, wsrep::const_buffer const&))[0x5cce23fe742c]
      src/server_state.cpp:1133(wsrep::server_state::on_apply(wsrep::high_priority_service&, wsrep::ws_handle const&, wsrep::ws_meta const&, wsrep::const_buffer const&))[0x5cce23feb691]
      wsrep/high_priority_service.hpp:48(wsrep::high_priority_service::apply(wsrep::ws_handle const&, wsrep::ws_meta const&, wsrep::const_buffer const&))[0x5cce24015c6d]
      src/wsrep_provider_v26.cpp:510((anonymous namespace)::apply_cb(void*, wsrep_ws_handle const*, unsigned int, wsrep_buf const*, wsrep_trx_meta const*, bool*))[0x5cce2401232b]
      src/trx_handle.cpp:396(galera::TrxHandleSlave::apply(void*, wsrep_cb_status (*)(void*, wsrep_ws_handle const*, unsigned int, wsrep_buf const*, wsrep_trx_meta const*, bool*), wsrep_trx_meta const&, bool&))[0x70820c475f52]
      src/trx_handle.hpp:826(galera::TrxHandleMaster::lock())[0x70820c49ea9a]
      src/trx_handle.hpp:1126(galera::TrxHandleLock::~TrxHandleLock())[0x70820c4641e1]
      2026-01-13 14:27:39 0 [Note] WSREP: Failed to establish connection: Operation aborted.
      2026-01-13 14:27:39 0 [Note] WSREP: evs::proto(44d5b159-994e, LEAVING, view_id(REG,44d5b159-994e,4)) suspecting node: 44d965e6-be0b
      2026-01-13 14:27:39 0 [Note] WSREP: evs::proto(44d5b159-994e, LEAVING, view_id(REG,44d5b159-994e,4)) suspected node without join message, declaring inactive
      2026-01-13 14:27:39 0 [Note] WSREP: view(view_id(NON_PRIM,44d5b159-994e,4) memb {
      	44d5b159-994e,0
      } joined {
      } left {
      } partitioned {
      	44d965e6-be0b,0
      })
      2026-01-13 14:27:39 0 [Note] WSREP: PC protocol downgrade 1 -> 0
      2026-01-13 14:27:39 0 [Note] WSREP: view((empty))
      2026-01-13 14:27:39 0 [Note] WSREP: gcomm: closed
      2026-01-13 14:27:39 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2026-01-13 14:27:39 0 [Note] WSREP: Flow-control interval: [16, 16]
      2026-01-13 14:27:39 0 [Note] WSREP: Received NON-PRIMARY.
      2026-01-13 14:27:39 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 3802)
      2026-01-13 14:27:39 0 [Note] WSREP: New SELF-LEAVE.
      2026-01-13 14:27:39 0 [Note] WSREP: Flow-control interval: [0, 0]
      2026-01-13 14:27:39 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2026-01-13 14:27:39 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 3802)
      2026-01-13 14:27:39 0 [Note] WSREP: RECV thread exiting 0: Success
      2026-01-13 14:27:39 0 [Note] WSREP: recv_thread() joined.
      2026-01-13 14:27:39 0 [Note] WSREP: Closing send queue.
      2026-01-13 14:27:39 0 [Note] WSREP: Closing receive queue.
      src/wsrep_provider_v26.cpp:1036(wsrep::wsrep_provider_v26::replay(wsrep::ws_handle const&, wsrep::high_priority_service*))[0x5cce24014651]
      sql/wsrep_client_service.cc:304(Wsrep_client_service::replay())[0x5cce238c4049]
      src/transaction.cpp:2069(wsrep::transaction::replay(std::unique_lock<wsrep::mutex>&))[0x5cce2400726c]
      src/transaction.cpp:893(wsrep::transaction::after_statement(std::unique_lock<wsrep::mutex>&))[0x5cce24002496]
      src/client_state.cpp:265(wsrep::client_state::after_statement())[0x5cce23fd5364]
      sql/wsrep_trans_observer.h:482(wsrep_after_statement(THD*))[0x5cce23086159]
      sql/sql_parse.cc:8070(wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5cce2309f749]
      sql/sql_parse.cc:1908(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x5cce2308adcb]
      sql/sql_parse.cc:1434(do_command(THD*, bool))[0x5cce230897e2]
      sql/sql_connect.cc:1475(do_handle_one_connection(CONNECT*, bool))[0x5cce2328ca37]
      sql/sql_connect.cc:1389(handle_one_connection)[0x5cce2328c7ac]
      perfschema/pfs.cc:2203(pfs_spawn_thread)[0x5cce2383d6cc]
      nptl/pthread_create.c:447(start_thread)[0x70821109caa4]
      x86_64/clone3.S:80(clone3)[0x708211129c6c]
       
      Connection ID (thread ID): 1
      Status: NOT_KILLED
      Query (0x708207808432): INSERT IGNORE INTO `oltp23` ( `id`, `k`) VALUES ( NULL, 588578816 )  /* QNO 29 CON_ID 34 */
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off,cset_narrowing=off
       
      Writing a core file...
      Working directory at /opt/MDEV-38218-mariadb-10.11.16-linux-x86_64-dbg/mysql-test/var/mysqld.2/data
      Resource Limits (excludes unlimited resources):
      Limit                     Soft Limit           Hard Limit           Units     
      Max stack size            8388608              unlimited            bytes     
      Max processes             256930               256930               processes 
      Max open files            32198                32198                files     
      Max locked memory         8427692032           8427692032           bytes     
      Max pending signals       256930               256930               signals   
      Max msgqueue size         819200               819200               bytes     
      Max nice priority         0                    0                    
      Max realtime priority     0                    0                    
      Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -F%F -- %E
       
      Kernel version: Linux version 6.14.0-1020-gcp (buildd@lcy02-amd64-085) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #21~24.04.1-Ubuntu SMP Fri Oct 17 00:56:30 UTC 2025
       
      # 2026-01-13T14:27:46 [225603] datadir is /opt/MDEV-38218-mariadb-10.11.16-linux-x86_64-dbg/mysql-test/var/mysqld.2/data/
      # 2026-01-13T14:27:46 [225603] binary is /opt/MDEV-38218-mariadb-10.11.16-linux-x86_64-dbg/bin/mysqld
      # 2026-01-13T14:27:46 [225603] bindir is /opt/MDEV-38218-mariadb-10.11.16-linux-x86_64-dbg/bin
      # 2026-01-13T14:27:46 [225603] WARNING: Core file not found!
      "/home/susil_behera/randgen" is not a core dump: file format not recognized
      backtrace.gdb:19: Error in sourced command file:
      No stack.
      # 2026-01-13T14:27:46 [225603] 
      "/home/susil_behera/randgen" is not a core dump: file format not recognized
      # 2026-01-13T14:27:47 [225603] 
      # 2026-01-13T14:27:47 [225603] Test completed with failure status STATUS_SERVER_CRASHED (101)
      

      Repro steps:

      export WSREP_PROVIDER=<FULL PATH OF libgalera_smm.so>
      cd mysql-test
      mtr galera_3nodes.galera_wsrep_schema --mysqld=--log-bin --start-and-exit
      cd randgen
      perl gendata.pl --dsn=dbi:mysql:host=127.0.0.1:port=19000:user=root:database=test --spec=conf/mariadb/oltp.zz
      perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=19001:user=root:database=test --grammar=conf/mariadb/oltp_and_ddl.yy --threads=32 --duration=1300 --queries=100000000 &
      perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=19001:user=root:database=test --grammar=conf/mariadb/oltp_and_ddl.yy --threads=32 --duration=1300 --queries=100000000 &
      

      logs:
      PFA

      Attachments

        1. gdb_bt_from_1st_core_dump.txt
          210 kB
          Susil Behera
        2. gdb_bt_from_2nd_core_dump.txt
          229 kB
          Susil Behera
        3. mysqld.1.err
          59 kB
          Susil Behera
        4. mysqld.2.err
          84 kB
          Susil Behera
        5. mysqld.3.err
          143 kB
          Susil Behera

        Activity

          People

            Lampio Pekka
            susil.behera Susil Behera
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.