[MDEV-30862] Assertion `mode_ == m_high_priority' failed in void wsrep::client_state::after_applying() Created: 2023-03-16  Updated: 2023-04-18  Resolved: 2023-04-18

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4, 10.5, 10.6, 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 11.1.1, 10.11.3, 10.4.29, 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Critical
Reporter: Ramesh Sivaraman Assignee: Julius Goryavsky
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-25455 Assertion `thd->wsrep_trx().active()'... Open

 Description   

Node2 crashes when executing the following test case. Test case crashes differently on galapq server(VM box) which is similar to MDEV-25455.

SET autocommit=0;
SET SESSION wsrep_trx_fragment_size=1;
CREATE TABLE t2 SELECT seq FROM seq_1_to_50;

Leads to:

11.0.2 97ff62b99b7bc709074540b918142afd75ba6284 (Optimized)

#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000561c19d67ddf in my_write_core (sig=sig@entry=6) at /test/11.0_opt/mysys/stacktrace.c:424
#2  0x0000561c198ccfb0 in handle_fatal_signal (sig=6) at /test/11.0_opt/sql/signal_handler.cc:357
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00001477bf2aa859 in __GI_abort () at abort.c:79
#6  0x00001477bf2aa729 in __assert_fail_base (fmt=0x1477bf440588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561c1a0ae572 "mode_ == m_high_priority", 
    file=0x561c19f27898 "/test/11.0_opt/wsrep-lib/include/wsrep/client_state.hpp", line=303, function=<optimized out>) at assert.c:92
#7  0x00001477bf2bbfd6 in __GI___assert_fail (assertion=assertion@entry=0x561c1a0ae572 "mode_ == m_high_priority", file=file@entry=0x561c19f27898 "/test/11.0_opt/wsrep-lib/include/wsrep/client_state.hpp", 
    line=line@entry=303, function=function@entry=0x561c1a0ae3c8 "void wsrep::client_state::after_applying()") at assert.c:101
#8  0x0000561c19b08883 in wsrep::client_state::after_applying (this=0x147738023368) at /test/11.0_opt/sql/sql_class.h:2332
#9  Wsrep_high_priority_service::log_dummy_write_set (err=..., ws_meta=..., ws_handle=..., this=0x147738024590) at /test/11.0_opt/sql/wsrep_high_priority_service.cc:519
#10 Wsrep_high_priority_service::log_dummy_write_set (this=0x147738024590, ws_handle=..., ws_meta=..., err=...) at /test/11.0_opt/sql/wsrep_high_priority_service.cc:465
#11 0x0000561c19eedd3e in apply_fragment (server_state=..., high_priority_service=..., streaming_applier=0x147738024590, ws_handle=..., ws_meta=..., data=...) at /test/11.0_opt/wsrep-lib/src/server_state.cpp:123
#12 0x0000561c19ef057f in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /test/11.0_opt/wsrep-lib/src/server_state.cpp:360
#13 0x0000561c19ef11c1 in wsrep::server_state::on_apply (this=<optimized out>, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /test/11.0_opt/wsrep-lib/src/server_state.cpp:1130
#14 0x0000561c19f073bc in wsrep::high_priority_service::apply (data=..., ws_meta=..., ws_handle=..., this=0x14777cd9bcf0) at /test/11.0_opt/wsrep-lib/include/wsrep/high_priority_service.hpp:47
#15 (anonymous namespace)::apply_cb (ctx=ctx@entry=0x14777cd9bcf0, wsh=wsh@entry=0x14777cd9ad20, flags=<optimized out>, flags@entry=64, buf=buf@entry=0x14777cd9ad30, meta=meta@entry=0x14777cd9afe0, 
    exit_loop=exit_loop@entry=0x14777cd9af9f) at /test/11.0_opt/wsrep-lib/src/wsrep_provider_v26.cpp:507
#16 0x00001477bed0ff45 in galera::TrxHandleSlave::apply (this=this@entry=0x14773801cb90, recv_ctx=recv_ctx@entry=0x14777cd9bcf0, 
    apply_cb=0x561c19f07290 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., 
    exit_loop=exit_loop@entry=@0x14777cd9af9f: false) at /test/galera_4x/galera/src/trx_handle.cpp:396
#17 0x00001477bed266fe in galera::ReplicatorSMM::apply_trx (this=0x561c1c599030, recv_ctx=0x14777cd9bcf0, ts=...) at /test/galera_4x/galera/src/replicator_smm.cpp:516
#18 0x00001477bed2a918 in galera::ReplicatorSMM::process_trx (this=0x561c1c599030, recv_ctx=0x14777cd9bcf0, ts_ptr=...) at /test/galera_4x/galera/src/replicator_smm.cpp:2138
#19 0x00001477bed5c339 in galera::GcsActionSource::process_writeset (this=0x561c1c591b70, recv_ctx=0x14777cd9bcf0, act=..., exit_loop=@0x14777cd9b91f: false)
    at /test/galera_4x/galera/src/gcs_action_source.cpp:62
#20 0x00001477bed5cd73 in galera::GcsActionSource::dispatch (this=0x561c1c591b70, recv_ctx=0x14777cd9bcf0, act=..., exit_loop=<optimized out>) at /test/galera_4x/galera/src/gcs_action_source.cpp:110
#21 0x00001477bed5d332 in galera::GcsActionSource::process (this=0x561c1c591b70, recv_ctx=0x14777cd9bcf0, exit_loop=@0x14777cd9b91f: false) at /test/galera_4x/galera/src/gcs_action_source.cpp:186
#22 0x00001477bed27da0 in galera::ReplicatorSMM::async_recv (this=0x561c1c599030, recv_ctx=0x14777cd9bcf0) at /test/galera_4x/galera/src/replicator_smm.cpp:402
#23 0x00001477becfc681 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /test/galera_4x/galera/src/wsrep_provider.cpp:264
#24 0x0000561c19f07a02 in wsrep::wsrep_provider_v26::run_applier (this=<optimized out>, applier_ctx=<optimized out>) at /test/11.0_opt/wsrep-lib/src/wsrep_provider_v26.cpp:858
#25 0x0000561c19b25cb1 in wsrep_replication_process (thd=0x147738000c58, arg=<optimized out>) at /test/11.0_opt/wsrep-lib/include/wsrep/server_state.hpp:320
#26 0x0000561c19b14228 in start_wsrep_THD (arg=0x561c1c896f80) at /test/11.0_opt/sql/wsrep_mysqld.h:541
#27 0x00001477bf7bb609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#28 0x00001477bf3a7133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

11.0.2 97ff62b99b7bc709074540b918142afd75ba6284 (Debug)

#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x00005634a5c559d3 in my_write_core (sig=sig@entry=11) at /test/11.0_dbg/mysys/stacktrace.c:424
#2  0x00005634a5550cf9 in handle_fatal_signal (sig=11) at /test/11.0_dbg/sql/signal_handler.cc:357
#3  <signal handler called>
#4  std::__atomic_base<wait_for_commit*>::load (__m=std::memory_order_relaxed, this=0x8f8f8f8f8f8f9087) at /usr/include/c++/9/bits/atomic_base.h:734
#5  std::atomic<wait_for_commit*>::load (__m=std::memory_order_relaxed, this=0x8f8f8f8f8f8f9087) at /usr/include/c++/9/atomic:519
#6  wait_for_commit::unregister_wait_for_prior_commit (this=0x8f8f8f8f8f8f8f8f) at /test/11.0_dbg/sql/sql_class.h:2355
#7  wsrep_unregister_from_group_commit (thd=0x14f714020e88) at /test/11.0_dbg/sql/wsrep_binlog.cc:390
#8  0x00005634a57fec83 in Wsrep_high_priority_service::log_dummy_write_set (this=0x14f71402aa70, ws_handle=..., ws_meta=..., err=...) at /test/11.0_dbg/sql/wsrep_high_priority_service.cc:513
#9  0x00005634a5dc3ff8 in apply_fragment (server_state=..., high_priority_service=..., streaming_applier=streaming_applier@entry=0x14f71402aa70, ws_handle=..., ws_meta=..., data=...)
    at /test/11.0_dbg/wsrep-lib/src/server_state.cpp:123
#10 0x00005634a5dcc3c8 in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /test/11.0_dbg/wsrep-lib/src/server_state.cpp:360
#11 0x00005634a5dcd1a3 in wsrep::server_state::on_apply (this=<optimized out>, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /test/11.0_dbg/wsrep-lib/src/server_state.cpp:1130
#12 0x00005634a5ddfd3a in wsrep::high_priority_service::apply (data=..., ws_meta=..., ws_handle=..., this=0x14f73f296c00) at /test/11.0_dbg/wsrep-lib/include/wsrep/high_priority_service.hpp:47
#13 (anonymous namespace)::apply_cb (ctx=ctx@entry=0x14f73f296c00, wsh=wsh@entry=0x14f73f295c30, flags=flags@entry=64, buf=buf@entry=0x14f73f295c40, meta=meta@entry=0x14f73f295ef0, 
    exit_loop=exit_loop@entry=0x14f73f295eaf) at /test/11.0_dbg/wsrep-lib/src/wsrep_provider_v26.cpp:507
#14 0x000014f7b1c94f45 in galera::TrxHandleSlave::apply (this=this@entry=0x14f714020ae0, recv_ctx=recv_ctx@entry=0x14f73f296c00, 
    apply_cb=0x5634a5ddfbb7 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., 
    exit_loop=exit_loop@entry=@0x14f73f295eaf: false) at /test/galera_4x/galera/src/trx_handle.cpp:396
#15 0x000014f7b1cab6fe in galera::ReplicatorSMM::apply_trx (this=0x5634a799e0b0, recv_ctx=0x14f73f296c00, ts=...) at /test/galera_4x/galera/src/replicator_smm.cpp:516
#16 0x000014f7b1caf918 in galera::ReplicatorSMM::process_trx (this=0x5634a799e0b0, recv_ctx=0x14f73f296c00, ts_ptr=...) at /test/galera_4x/galera/src/replicator_smm.cpp:2138
#17 0x000014f7b1ce1339 in galera::GcsActionSource::process_writeset (this=0x5634a7996bd0, recv_ctx=0x14f73f296c00, act=..., exit_loop=@0x14f73f29682f: false)
    at /test/galera_4x/galera/src/gcs_action_source.cpp:62
#18 0x000014f7b1ce1d73 in galera::GcsActionSource::dispatch (this=0x5634a7996bd0, recv_ctx=0x14f73f296c00, act=..., exit_loop=<optimized out>) at /test/galera_4x/galera/src/gcs_action_source.cpp:110
#19 0x000014f7b1ce2332 in galera::GcsActionSource::process (this=0x5634a7996bd0, recv_ctx=0x14f73f296c00, exit_loop=@0x14f73f29682f: false) at /test/galera_4x/galera/src/gcs_action_source.cpp:186
#20 0x000014f7b1cacda0 in galera::ReplicatorSMM::async_recv (this=0x5634a799e0b0, recv_ctx=0x14f73f296c00) at /test/galera_4x/galera/src/replicator_smm.cpp:402
#21 0x000014f7b1c81681 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /test/galera_4x/galera/src/wsrep_provider.cpp:264
#22 0x00005634a5de0a12 in wsrep::wsrep_provider_v26::run_applier (this=<optimized out>, applier_ctx=<optimized out>) at /test/11.0_dbg/wsrep-lib/src/wsrep_provider_v26.cpp:858
#23 0x00005634a581c593 in wsrep_replication_process (thd=0x14f714000f88, arg=<optimized out>) at /test/11.0_dbg/sql/wsrep_server_state.h:56
#24 0x00005634a5809434 in start_wsrep_THD (arg=0x5634a7de42e0) at /test/11.0_dbg/sql/wsrep_mysqld.h:541
#25 0x000014f7b2740609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#26 0x000014f7b232c133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



 Comments   
Comment by Ramesh Sivaraman [ 2023-03-17 ]

Node2 sometimes disconnects from the cluster when executing the test case instead of node crash.

node2:root@localhost> show status like '%wsrep%stat%';
+---------------------------+--------------------------------------+
| Variable_name             | Value                                |
+---------------------------+--------------------------------------+
| wsrep_local_state_uuid    | 6f24370b-c4bc-11ed-a2f9-5631bab5a160 |
| wsrep_local_state         | 5                                    |
| wsrep_local_state_comment | Inconsistent                         |
| wsrep_cluster_state_uuid  | 6f24370b-c4bc-11ed-a2f9-5631bab5a160 |
| wsrep_cluster_status      | Disconnected                         |
+---------------------------+--------------------------------------+
5 rows in set (0.001 sec)
 
node2:root@localhost> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.6.13-MariaDB |
+-----------------+
1 row in set (0.000 sec)
 
node2:root@localhost> 

10.6.13 f169dfb41adcb637732507ed56d3038003170a15 (Optimized)

#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x00005573256c1aff in my_write_core (sig=sig@entry=11) at /test/10.6_opt/mysys/stacktrace.c:424
#2  0x0000557325202620 in handle_fatal_signal (sig=11) at /test/10.6_opt/sql/signal_handler.cc:357
#3  <signal handler called>
#4  0x0000557325479dff in wsrep_get_apply_format (thd=thd@entry=0x1503f001db28) at /test/10.6_opt/sql/wsrep_applier.cc:82
#5  0x0000557325479899 in wsrep_dump_rbr_buf_with_header (thd=thd@entry=0x1503f001db28, rbr_buf=0x150457fffb58, buf_len=79) at /test/10.6_opt/sql/wsrep_binlog.cc:290
#6  0x000055732546105c in apply_events (err=..., data=..., rli=<optimized out>, thd=0x1503f001db28) at /test/10.6_opt/wsrep-lib/include/wsrep/buffer.hpp:48
#7  Wsrep_applier_service::apply_write_set (this=<optimized out>, ws_meta=..., data=..., err=...) at /test/10.6_opt/sql/wsrep_high_priority_service.cc:593
#8  0x0000557325830c08 in apply_fragment (server_state=..., high_priority_service=..., streaming_applier=0x1503f0024fa0, ws_handle=..., ws_meta=..., data=...) at /test/10.6_opt/wsrep-lib/src/server_state.cpp:99
#9  0x00005573258332cd in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /test/10.6_opt/wsrep-lib/src/server_state.cpp:390
#10 0x0000557325834391 in wsrep::server_state::on_apply (this=<optimized out>, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /test/10.6_opt/wsrep-lib/src/server_state.cpp:1130
#11 0x0000557325845a6c in wsrep::high_priority_service::apply (data=..., ws_meta=..., ws_handle=..., this=0x1504664ddcf0) at /test/10.6_opt/wsrep-lib/include/wsrep/high_priority_service.hpp:47
#12 (anonymous namespace)::apply_cb (ctx=ctx@entry=0x1504664ddcf0, wsh=wsh@entry=0x1504664dcd20, flags=<optimized out>, flags@entry=0, buf=buf@entry=0x1504664dcd30, meta=meta@entry=0x1504664dcfe0, 
    exit_loop=exit_loop@entry=0x1504664dcf9f) at /test/10.6_opt/wsrep-lib/src/wsrep_provider_v26.cpp:507
#13 0x00001504677e0f45 in galera::TrxHandleSlave::apply (this=this@entry=0x1504400075b0, recv_ctx=recv_ctx@entry=0x1504664ddcf0, 
    apply_cb=0x557325845940 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., 
    exit_loop=exit_loop@entry=@0x1504664dcf9f: false) at /test/galera_4x/galera/src/trx_handle.cpp:396
#14 0x00001504677f76fe in galera::ReplicatorSMM::apply_trx (this=0x5573277df0e0, recv_ctx=0x1504664ddcf0, ts=...) at /test/galera_4x/galera/src/replicator_smm.cpp:516
#15 0x00001504677fb918 in galera::ReplicatorSMM::process_trx (this=0x5573277df0e0, recv_ctx=0x1504664ddcf0, ts_ptr=...) at /test/galera_4x/galera/src/replicator_smm.cpp:2138
#16 0x000015046782d339 in galera::GcsActionSource::process_writeset (this=0x5573277d7c70, recv_ctx=0x1504664ddcf0, act=..., exit_loop=@0x1504664dd91f: false)
    at /test/galera_4x/galera/src/gcs_action_source.cpp:62
#17 0x000015046782dd73 in galera::GcsActionSource::dispatch (this=0x5573277d7c70, recv_ctx=0x1504664ddcf0, act=..., exit_loop=<optimized out>) at /test/galera_4x/galera/src/gcs_action_source.cpp:110
#18 0x000015046782e332 in galera::GcsActionSource::process (this=0x5573277d7c70, recv_ctx=0x1504664ddcf0, exit_loop=@0x1504664dd91f: false) at /test/galera_4x/galera/src/gcs_action_source.cpp:186
#19 0x00001504677f8da0 in galera::ReplicatorSMM::async_recv (this=0x5573277df0e0, recv_ctx=0x1504664ddcf0) at /test/galera_4x/galera/src/replicator_smm.cpp:402
#20 0x00001504677cd681 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at /test/galera_4x/galera/src/wsrep_provider.cpp:264
#21 0x00005573258460b2 in wsrep::wsrep_provider_v26::run_applier (this=<optimized out>, applier_ctx=<optimized out>) at /test/10.6_opt/wsrep-lib/src/wsrep_provider_v26.cpp:858
#22 0x000055732547be11 in wsrep_replication_process (thd=0x150440000c58, arg=<optimized out>) at /test/10.6_opt/wsrep-lib/include/wsrep/server_state.hpp:320
#23 0x000055732546b548 in start_wsrep_THD (arg=0x557327817d70) at /test/10.6_opt/sql/wsrep_mysqld.h:540
#24 0x0000150468263609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#25 0x0000150467e4f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Jan Lindström [ 2023-04-17 ]

https://github.com/MariaDB/server/pull/2561

Comment by Julius Goryavsky [ 2023-04-18 ]

Merged after review and testing: https://github.com/MariaDB/server/commit/bc3bfcf943b817b19a41e4f599b4f2e9a259b263

Generated at Thu Feb 08 10:19:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.