[MDEV-22232] Server crashes in rpl_sql_thread_info::cached_charset_compare / wsrep_apply_events Created: 2020-04-13  Updated: 2024-01-31  Resolved: 2023-11-21

Status: Closed
Project: MariaDB Server
Component/s: Galera, wsrep
Affects Version/s: 10.4, 10.5
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Julius Goryavsky
Resolution: Fixed Votes: 0
Labels: None
Environment:

Galera 26.4.4(r4599)


Issue Links:
Relates
relates to MDEV-18894 Galera: 10.4 nodes are crashed with S... Closed
relates to MDEV-22230 Unexpected ER_ERROR_ON_RENAME upon DR... Closed
relates to MDEV-27761 Galera node crash on rpl_sql_thread_i... Closed

 Description   

Notes:

  • It currently fails every time for me, but the test case is still obviously non-deterministic, run with --repeat=N if it doesn't fail right away, and don't put it into the regression suite, create a properly synchronized one instead.
  • The stack trace looks similar to MDEV-18894, but the scenario is essentially different: there is no versioning here and no rolling upgrade (or upgrade of any kind), in fact the same failure is reproducible with only one-node "cluster".

--source include/galera_cluster.inc
 
--connect (con1,localhost,root,,test)
--connect (con2,localhost,root,,test)
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
CREATE TABLE t2 (b INT) ENGINE=InnoDB;
SELECT * FROM t1;
--send
  ALTER TABLE t2 DROP FOREIGN KEY b, ALGORITHM=COPY;
 
--connection con1
--send
  CREATE OR REPLACE TABLE t1 (c INT) ENGINE=InnodB;
 
--connection default
CREATE TABLE t3 SELECT * FROM t1;
 
# Cleanup
--connection con1
--reap
--disconnect con1
--connection con2
--error ER_ERROR_ON_RENAME,ER_CANT_DROP_FIELD_OR_KEY
--reap
--disconnect con2
--connection default
DROP TABLE IF EXISTS t1, t2, t3;

10.4 edc3899d

#3  <signal handler called>
#4  __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:943
#5  0x000055be17f88070 in rpl_sql_thread_info::cached_charset_compare (this=0xa5a5a5a5a5a5a5a5, charset=0x7f4e58136690 "\b") at /data/src/10.4/sql/rpl_rli.cc:2468
#6  0x000055be182c66ac in Query_log_event::do_apply_event (this=0x7f4e581365a0, rgi=0x7f4e58174c70, query_arg=0x7f4e58161f9b "CREATE TABLE `t3` (\n  `a` int(11) DEFAULT NULL\n)", q_len_arg=48) at /data/src/10.4/sql/log_event.cc:5569
#7  0x000055be182c5fdf in Query_log_event::do_apply_event (this=0x7f4e581365a0, rgi=0x7f4e58174c70) at /data/src/10.4/sql/log_event.cc:5411
#8  0x000055be17d19c7f in Log_event::apply_event (this=0x7f4e581365a0, rgi=0x7f4e58174c70) at /data/src/10.4/sql/log_event.h:1482
#9  0x000055be18090726 in wsrep_apply_events (thd=0x7f4e5812e7d0, rli=0x7f4e581707f0, events_buf=0x7f4ebe632fd0, buf_len=0) at /data/src/10.4/sql/wsrep_applier.cc:200
#10 0x000055be1806f6eb in Wsrep_replayer_service::apply_write_set (this=0x7f4eb52545c0, ws_meta=..., data=...) at /data/src/10.4/sql/wsrep_high_priority_service.cc:658
#11 0x000055be18abaaf9 in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /data/src/10.4/wsrep-lib/src/server_state.cpp:328
#12 0x000055be18abe652 in wsrep::server_state::on_apply (this=0x55be1b31a980, high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at /data/src/10.4/wsrep-lib/src/server_state.cpp:1125
#13 0x000055be18ad5b21 in wsrep::high_priority_service::apply (this=0x7f4eb52545c0, ws_handle=..., ws_meta=..., data=...) at /data/src/10.4/wsrep-lib/include/wsrep/high_priority_service.hpp:47
#14 0x000055be18ad2db8 in (anonymous namespace)::apply_cb (ctx=0x7f4eb52545c0, wsh=0x7f4eb52535c0, flags=65, buf=0x7f4eb52535d0, meta=0x7f4eb5253e00, exit_loop=0x7f4eb5253870) at /data/src/10.4/wsrep-lib/src/wsrep_provider_v26.cpp:496
#15 0x00007f4ebf3e8693 in galera::TrxHandleSlave::apply (this=this@entry=0x7f4e5812b860, recv_ctx=recv_ctx@entry=0x7f4eb52545c0, apply_cb=0x55be18ad2b89 <(anonymous namespace)::apply_cb(void*, wsrep_ws_handle_t const*, uint32_t, wsrep_buf_t const*, wsrep_trx_meta_t const*, wsrep_bool_t*)>, meta=..., exit_loop=@0x7f4eb5253870: false) at galera/src/trx_handle.cpp:391
#16 0x00007f4ebf4425b5 in galera::ReplicatorSMM::replay_trx (this=0x55be1b345670, trx=..., lock=..., trx_ctx=0x7f4eb52545c0) at galera/src/replicator_smm.cpp:1100
#17 0x00007f4ebf46055f in galera_replay_trx (gh=<optimized out>, trx_handle=<optimized out>, recv_ctx=0x7f4eb52545c0) at galera/src/wsrep_provider.cpp:311
#18 0x000055be18ad4617 in wsrep::wsrep_provider_v26::replay (this=0x55be1b2ead40, ws_handle=..., reply_service=0x7f4eb52545c0) at /data/src/10.4/wsrep-lib/src/wsrep_provider_v26.cpp:857
#19 0x000055be1806a588 in Wsrep_client_service::replay (this=0x7f4e58006c70) at /data/src/10.4/sql/wsrep_client_service.cc:272
#20 0x000055be18acb35a in wsrep::transaction::replay (this=0x7f4e58006cf0, lock=...) at /data/src/10.4/wsrep-lib/src/transaction.cpp:1703
#21 0x000055be18ac7c0b in wsrep::transaction::after_statement (this=0x7f4e58006cf0) at /data/src/10.4/wsrep-lib/src/transaction.cpp:816
#22 0x000055be18ab30f8 in wsrep::client_state::after_statement (this=0x7f4e58006c88) at /data/src/10.4/wsrep-lib/src/client_state.cpp:248
#23 0x000055be17e0a2d0 in wsrep_after_statement (thd=0x7f4e58000af0) at /data/src/10.4/sql/wsrep_trans_observer.h:394
#24 0x000055be17e2383b in wsrep_mysql_parse (thd=0x7f4e58000af0, rawbuf=0x7f4e58013198 "CREATE TABLE t3 SELECT * FROM t1", length=32, parser_state=0x7f4eb5255160, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7734
#25 0x000055be17e0f130 in dispatch_command (command=COM_QUERY, thd=0x7f4e58000af0, packet=0x7f4e58027f11 "CREATE TABLE t3 SELECT * FROM t1", packet_length=32, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1827
#26 0x000055be17e0d8fd in do_command (thd=0x7f4e58000af0) at /data/src/10.4/sql/sql_parse.cc:1360
#27 0x000055be17f96f51 in do_handle_one_connection (connect=0x55be1b8951a0) at /data/src/10.4/sql/sql_connect.cc:1412
#28 0x000055be17f96ca0 in handle_one_connection (arg=0x55be1b8951a0) at /data/src/10.4/sql/sql_connect.cc:1316
#29 0x000055be189a10fd in pfs_spawn_thread (arg=0x55be1b88bba0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#30 0x00007f4ec60aa4a4 in start_thread (arg=0x7f4eb5256700) at pthread_create.c:456
#31 0x00007f4ec41ded0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Reproducible on 10.4, 10.5.
All of debug, non-debug and ASAN crash the same way.
Not reproducible on 10.3.



 Comments   
Comment by Denis Protivensky [ 2023-10-31 ]

The fix: https://github.com/MariaDB/server/pull/2811

Comment by Julius Goryavsky [ 2023-11-21 ]

Fixed, https://github.com/MariaDB/server/commit/e39c497c809511bcc37a658405c7aa4b5be2cf6a

Comment by Julius Goryavsky [ 2024-01-31 ]

Additional fix merged with head revision: https://github.com/MariaDB/server/commit/f4ee7c110cd6faee3fa80b61ae572f471341c906

Generated at Thu Feb 08 09:13:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.