[MDEV-28816] Assertion `wsrep_thd_is_applying(thd)' failed in int wsrep_ignored_error_code(Log_event*, int) Created: 2022-06-13  Updated: 2023-08-16  Resolved: 2023-03-29

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: None
Fix Version/s: 11.2.1

Type: Bug Priority: Critical
Reporter: Ramesh Sivaraman Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None
Environment:

preview-10.10-MDEV-16329-online-alter


Attachments: File n1.cnf     File n2.cnf    
Issue Links:
Problem/Incident
is caused by MDEV-16329 Engine-independent online ALTER TABLE Closed

 Description   

RQG test case

perl gendata.pl --dsn=dbi:mysql:host=127.0.0.1:port=12620:socket=/test/mtest/GAL_MD130622-mariadb-10.10.0-linux-x86_64-dbg/node1/node1_socket.sock:user=root:database=test --spec=conf/mariadb/oltp.zz
perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=12620:socket=/test/mtest/GAL_MD130622-mariadb-10.10.0-linux-x86_64-dbg/node1/node1_socket.sock:user=root:database=test --grammar=conf/mariadb/oltp_and_ddl.yy --threads=32 --duration=30000 --queries=1000000000 &

Leads to

10.10.0 d7a7d16713070c7c2902c3df4b4f6d024cd0320f (Debug)

mysqld: /test/mtest/10.10_dbg/sql/wsrep_mysqld.cc:3364: int wsrep_ignored_error_code(Log_event*, int): Assertion `wsrep_thd_is_applying(thd) && !wsrep_thd_is_local_toi(thd)' failed.

10.10.0 d7a7d16713070c7c2902c3df4b4f6d024cd0320f (Debug)

Core was generated by `/test/mtest/GAL_MD130622-mariadb-10.10.0-linux-x86_64-dbg/bin/mysqld --defaults'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000000070000002 in ?? ()
[Current thread is 1 (Thread 0x44f041de9700 (LWP 170126))]
(gdb) bt
#0  0x0000000070000002 in ?? ()
#1  0x00007a2c7c46f766 in _raw_syscall () at /home/roc/rr/rr/src/preload/raw_syscall.S:120
#2  0x00007a2c7c46b04e in traced_raw_syscall (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:272
#3  0x00007a2c7c46e4d1 in syscall_hook_internal (call=0x14c7ddad9fa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3295
#4  syscall_hook (call=0x14c7ddad9fa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3329
#5  0x00007a2c7c46ae50 in _syscall_hook_trampoline () at /home/roc/rr/rr/src/preload/syscall_hook.S:313
#6  0x00007a2c7c46aeaf in __morestack () at /home/roc/rr/rr/src/preload/syscall_hook.S:458
#7  0x00007a2c7c46af08 in _syscall_hook_trampoline_89_c2_f7_da () at /home/roc/rr/rr/src/preload/syscall_hook.S:504
#8  0x00007a2c7c739f0c in __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#9  0x000055f00e1ad6bf in my_write_core (sig=sig@entry=6) at /test/mtest/10.10_dbg/mysys/stacktrace.c:424
#10 0x000055f00da532c7 in handle_fatal_signal (sig=6) at /test/mtest/10.10_dbg/sql/signal_handler.cc:345
#11 <signal handler called>
#12 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#13 0x00007dd04f62d859 in __GI_abort () at abort.c:79
#14 0x00007dd04f62d729 in __assert_fail_base (fmt=0x7dd04f7c3588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55f00e56e650 "wsrep_thd_is_applying(thd) && !wsrep_thd_is_local_toi(thd)", file=0x55f00e56d750 "/test/mtest/10.10_dbg/sql/wsrep_mysqld.cc", line=3364, function=<optimized out>) at assert.c:92
#15 0x00007dd04f63ef36 in __GI___assert_fail (assertion=assertion@entry=0x55f00e56e650 "wsrep_thd_is_applying(thd) && !wsrep_thd_is_local_toi(thd)", file=file@entry=0x55f00e56d750 "/test/mtest/10.10_dbg/sql/wsrep_mysqld.cc", line=line@entry=3364, function=function@entry=0x55f00e56e620 "int wsrep_ignored_error_code(Log_event*, int)") at assert.c:101
#16 0x000055f00dd2c5a3 in wsrep_ignored_error_code (ev=ev@entry=0x55f01073d868, error=error@entry=1062) at /test/mtest/10.10_dbg/sql/wsrep_mysqld.cc:3364
#17 0x000055f00dbdb9cc in Rows_log_event::do_apply_event (this=0x55f01073d868, rgi=<optimized out>) at /test/mtest/10.10_dbg/include/my_bitmap.h:106
#18 0x000055f00dbd2f40 in Log_event::apply_event (this=this@entry=0x55f01073d868, rgi=rgi@entry=0x44f041de1180) at /test/mtest/10.10_dbg/sql/log_event.cc:4152
#19 0x000055f00d83616a in online_alter_read_from_binlog (thd=thd@entry=0x55f0106f2c18, rgi=rgi@entry=0x44f041de1180, log=log@entry=0x55f010794870) at /test/mtest/10.10_dbg/sql/sql_table.cc:11440
#20 0x000055f00d84eeee in copy_data_between_tables (thd=thd@entry=0x55f0106f2c18, from=from@entry=0x14c7e40b7388, to=to@entry=0x55f010793268, create=@0x44f041de73e0: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x55f0107045e8, last = 0x55f010704b08, elements = 4}, <No data fields>}, ignore=ignore@entry=false, order_num=order_num@entry=0, order=0x0, copied=0x44f041de5088, deleted=0x44f041de5090, keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x44f041de6690, online=<optimized out>) at /test/mtest/10.10_dbg/sql/sql_table.cc:11868
#21 0x000055f00d854992 in mysql_alter_table (thd=thd@entry=0x55f0106f2c18, new_db=new_db@entry=0x55f0106f76d8, new_name=new_name@entry=0x55f0106f7ae8, create_info=create_info@entry=0x44f041de7470, table_list=<optimized out>, table_list@entry=0x55f010703b50, alter_info=alter_info@entry=0x44f041de7380, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/mtest/10.10_dbg/sql/sql_table.cc:10897
#22 0x000055f00d8d8788 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x55f0106f2c18) at /test/mtest/10.10_dbg/sql/structs.h:569
#23 0x000055f00d777049 in mysql_execute_command (thd=thd@entry=0x55f0106f2c18, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/mtest/10.10_dbg/sql/sql_parse.cc:5996
#24 0x000055f00d75f605 in mysql_parse (thd=thd@entry=0x55f0106f2c18, rawbuf=rawbuf@entry=0x55f010703a20 "ALTER TABLE `oltp3` REMOVE PARTITIONING /* QNO 45 CON_ID 47 */", length=length@entry=62, parser_state=parser_state@entry=0x44f041de8470) at /test/mtest/10.10_dbg/sql/sql_parse.cc:8036
#25 0x000055f00d75f00e in wsrep_mysql_parse (thd=thd@entry=0x55f0106f2c18, rawbuf=0x55f010703a20 "ALTER TABLE `oltp3` REMOVE PARTITIONING /* QNO 45 CON_ID 47 */", length=62, parser_state=parser_state@entry=0x44f041de8470) at /test/mtest/10.10_dbg/sql/sql_parse.cc:7849
#26 0x000055f00d76ccd8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x55f0106f2c18, packet=packet@entry=0x55f0106fb429 "ALTER TABLE `oltp3` REMOVE PARTITIONING /* QNO 45 CON_ID 47 */ ", packet_length=packet_length@entry=63, blocking=blocking@entry=true) at /test/mtest/10.10_dbg/sql/sql_class.h:1365
#27 0x000055f00d76f3d1 in do_command (thd=0x55f0106f2c18, blocking=blocking@entry=true) at /test/mtest/10.10_dbg/sql/sql_parse.cc:1407
#28 0x000055f00d8d1433 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55f0106d3648, put_in_cache=put_in_cache@entry=true) at /test/mtest/10.10_dbg/sql/sql_connect.cc:1418
#29 0x000055f00d8d193f in handle_one_connection (arg=0x55f0106d3648) at /test/mtest/10.10_dbg/sql/sql_connect.cc:1312
#30 0x00007a2c7c731609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#31 0x00007dd04f72a293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



 Comments   
Comment by Jan Lindström (Inactive) [ 2022-06-13 ]
  • We have error 1062 (ER_DUP_ENTRY) but we are not applying

    #4  0x000055f00dd2c5a3 in wsrep_ignored_error_code (ev=ev@entry=0x55f01073d868, error=error@entry=1062)
        at /test/mtest/10.10_dbg/sql/wsrep_mysqld.cc:3364
    3364	  DBUG_ASSERT(wsrep_thd_is_applying(thd) &&
    (rr) p thd->m_wsrep_client_state
    $1 = {<wsrep::client_state> = {_vptr.client_state = 0x55f00ea5d770 <vtable for Wsrep_client_state+16>, static n_modes_ = 6, 
        static state_max_ = 5, owning_thread_id_ = {thread_ = 75798687946496}, rollbacker_active_ = false, 
        mutex_ = @0x55f0106f9288, cond_ = @0x55f0106f9298, server_state_ = @0x55f010201f70, client_service_ = @0x55f0106f92a8, 
        id_ = {id_ = 47}, mode_ = wsrep::client_state::m_toi, toi_mode_ = wsrep::client_state::m_local, 
    

  • This means that WSREP_TO_ISOLATION_BEGIN()/END() is not called
Comment by Sergei Golubchik [ 2022-07-01 ]

Sorry, I failed to repeat this. gentest.pl succeeded.

What shall I do to repeat the crash?

Comment by Ramesh Sivaraman [ 2022-07-04 ]

serg I was able to reproduce the issue using latest bb-10.10-MDEV-16329 branch. I used only InnoDB tables in the RQG grammar file conf/mariadb/oltp.zz. That is the only change I made in RQG grammar. The assertion is slightly different on latest commit.

ramesh@galapq:~/framework/galera-qa/randgen$ diff conf/mariadb/oltp.zz randgen/conf/mariadb/oltp.zz
29c29
<         engines => [ 'InnoDB' ],
---
>         engines => [ 'InnoDB', 'MyISAM' ],
ramesh@galapq:~/framework/galera-qa/randgen$ 

10.10.0 49ad875902e03ebcf66618f8f4a5a9253c9769a3

mysqld: /test/mtest/10.10/sql/wsrep_mysqld.cc:3368: int wsrep_ignored_error_code(Log_event*, int): Assertion `wsrep_thd_is_applying(thd)' failed.

10.10.0 49ad875902e03ebcf66618f8f4a5a9253c9769a3

Core was generated by `/test/mtest/mariadb-10.10.0-linux-x86_64/bin/mysqld --defaults-file=/test/mtest'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x149efbfff700 (LWP 2347168))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000563161c2ccfa in my_write_core (sig=6) at /test/mtest/10.10/mysys/stacktrace.c:424
#2  0x000056316138144b in handle_fatal_signal (sig=6) at /test/mtest/10.10/sql/signal_handler.cc:345
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x0000149f51174859 in __GI_abort () at abort.c:79
#6  0x0000149f51174729 in __assert_fail_base (fmt=0x149f5130a588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x563162058886 "wsrep_thd_is_applying(thd)", file=0x5631620562c0 "/test/mtest/10.10/sql/wsrep_mysqld.cc", line=3368, function=<optimized out>) at assert.c:92
#7  0x0000149f51185f36 in __GI___assert_fail (assertion=0x563162058886 "wsrep_thd_is_applying(thd)", file=0x5631620562c0 "/test/mtest/10.10/sql/wsrep_mysqld.cc", line=3368, function=0x563162058858 "int wsrep_ignored_error_code(Log_event*, int)") at assert.c:101
#8  0x00005631617387f9 in wsrep_ignored_error_code (ev=0x149ec4073ca8, error=1062) at /test/mtest/10.10/sql/wsrep_mysqld.cc:3368
#9  0x000056316152a12f in Rows_log_event::do_apply_event (this=0x149ec4073ca8, rgi=0x149efbff70c0) at /test/mtest/10.10/sql/log_event_server.cc:6085
#10 0x0000563161511027 in Log_event::apply_event (this=0x149ec4073ca8, rgi=0x149efbff70c0) at /test/mtest/10.10/sql/log_event.cc:4152
#11 0x00005631610e03af in online_alter_read_from_binlog (thd=0x149ec4000db8, rgi=0x149efbff70c0, log=0x149ec404b820) at /test/mtest/10.10/sql/sql_table.cc:11442
#12 0x00005631610e1fdb in copy_data_between_tables (thd=0x149ec4000db8, from=0x149ec401e408, to=0x149ec4033f18, create=@0x149efbffd240: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x149ec4015120, last = 0x149ec4015638, elements = 4}, <No data fields>}, ignore=false, order_num=0, order=0x0, copied=0x149efbffafa0, deleted=0x149efbffafa8, keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x149efbffc4c0, online=true) at /test/mtest/10.10/sql/sql_table.cc:11865
#13 0x00005631610de856 in mysql_alter_table (thd=0x149ec4000db8, new_db=0x149ec4005a78, new_name=0x149ec4005e88, create_info=0x149efbffd2d0, table_list=0x149ec40141e0, alter_info=0x149efbffd1e0, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/mtest/10.10/sql/sql_table.cc:10900
#14 0x00005631611984ba in Sql_cmd_alter_table::execute (this=0x149ec4014b18, thd=0x149ec4000db8) at /test/mtest/10.10/sql/sql_alter.cc:552
#15 0x0000563160fbdc0e in mysql_execute_command (thd=0x149ec4000db8, is_called_from_prepared_stmt=false) at /test/mtest/10.10/sql/sql_parse.cc:5996
#16 0x0000563160fc3dd8 in mysql_parse (thd=0x149ec4000db8, rawbuf=0x149ec40140a0 "ALTER TABLE `oltp10` PARTITION BY KEY(`id`) /* QNO 2 CON_ID 22 */", length=65, parser_state=0x149efbffe480) at /test/mtest/10.10/sql/sql_parse.cc:8036
#17 0x0000563160fc3514 in wsrep_mysql_parse (thd=0x149ec4000db8, rawbuf=0x149ec40140a0 "ALTER TABLE `oltp10` PARTITION BY KEY(`id`) /* QNO 2 CON_ID 22 */", length=65, parser_state=0x149efbffe480) at /test/mtest/10.10/sql/sql_parse.cc:7849
#18 0x0000563160fb010a in dispatch_command (command=COM_QUERY, thd=0x149ec4000db8, packet=0x149ec400b9f9 "ALTER TABLE `oltp10` PARTITION BY KEY(`id`) /* QNO 2 CON_ID 22 */ ", packet_length=66, blocking=true) at /test/mtest/10.10/sql/sql_parse.cc:1881
#19 0x0000563160faebd0 in do_command (thd=0x149ec4000db8, blocking=true) at /test/mtest/10.10/sql/sql_parse.cc:1407
#20 0x000056316118cfad in do_handle_one_connection (connect=0x5631641a99f8, put_in_cache=true) at /test/mtest/10.10/sql/sql_connect.cc:1418
#21 0x000056316118cc3d in handle_one_connection (arg=0x5631641a99f8) at /test/mtest/10.10/sql/sql_connect.cc:1312
#22 0x00005631616a2e17 in pfs_spawn_thread (arg=0x56316445d448) at /test/mtest/10.10/storage/perfschema/pfs.cc:2201
#23 0x0000149f5169e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#24 0x0000149f51271293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Sergei Golubchik [ 2022-07-04 ]

Still doesn't crash even with your patch. What else can I do to try to repeat it?

Comment by Ramesh Sivaraman [ 2022-07-05 ]

Can you try to reproduce it in debug build? I was able to reproduce it on my local machine as well.

Comment by Sergei Golubchik [ 2022-07-05 ]

Yes, I used a debug build

Comment by Ramesh Sivaraman [ 2022-07-07 ]

Could you please try with attached cnf files to reproduce the issue n2.cnf n1.cnf

Test case

BASEDIR=/test/mtest/mariadb-10.10.0-linux-x86_64
rm -Rf ${BASEDIR}/node*
${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=${BASEDIR} --datadir=${BASEDIR}/node1
${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=${BASEDIR} --datadir=${BASEDIR}/node2
 
${BASEDIR}/bin/mysqld --defaults-file=${BASEDIR}/n1.cnf --wsrep-new-cluster > ${BASEDIR}/node1/node1.err 2>&1 & 
 
${BASEDIR}/bin/mysqld --defaults-file=${BASEDIR}/n2.cnf > ${BASEDIR}/node2/node2.err 2>&1 &
 
      
perl gendata.pl --dsn=dbi:mysql:host=127.0.0.1:port=11463:socket=/test/mtest/10.10/mariadb-10.10.0-linux-x86_64/node1/node1_socket.sock:user=root:database=test --spec=conf/mariadb/oltp.zz
 
perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=11463:socket=/test/mtest/10.10/mariadb-10.10.0-linux-x86_64/node1/node1_socket.sock:user=root:database=test --grammar=conf/mariadb/oltp_and_ddl.yy --threads=8 --duration=300 --queries=1000

Comment by Sergei Golubchik [ 2022-07-24 ]

Thanks, that helped. Still, doesn't repeat every time, but one in 5–10 reruns does crash.
I think I fixed it, but because it's that difficult to repeat, I'm not quite sure. Please, retest.

Branch: bb-10.10-MDEV-16329

Comment by Ramesh Sivaraman [ 2022-07-26 ]

Bug fix looks good. Could not reproduce the issue in bb-10.10-MDEV-16329

Generated at Thu Feb 08 10:03:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.