[MDEV-28141] Slave crashes with Packets out of order when connecting to a shutting down master Created: 2022-03-21  Updated: 2024-01-25

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.4, 10.5, 10.6, 10.7, 10.8, 10.9
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Brandon Nesterenko Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-32385 Semi-Sync Ack_Receiver Thread Should ... Open
relates to MDEV-11853 semisync thread can be killed after s... Closed
relates to MDEV-32551 "Read semi-sync reply magic number er... Closed

 Description   

If a slave is connecting to a master which is actively shutting down, the slave can crash with a "Packets out of order" assertion error.

Using a slave compiled with and running in debug mode, the following debug trace snippet shows the last packet from the master:

vio_read: read_data: Memory: 0x7f34d4086970 Bytes: (34)
1E 00 00 01 FF 87 07 23 37 30 31 30 30 43 6F 6E 6E 65 63 74 69 6F 6E 20 77 61
73 20 6B 69 6C 6C 65 64

Translation into plain-text reveals that the master sent the following error message to the slave:

#70100Connection was killed

Stack trace of the crashing thread at the Packets out of order error:

stack_bottom = 0x7f3540158c48 thread_stack 0x49000
mysys/stacktrace.c:174(my_print_stacktrace)[0x556349e95720]
sql/signal_handler.cc:222(handle_fatal_signal)[0x5563495a87e1]
sigaction.c:0(__restore_rt)[0x7f35490891f0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f3548b3efbb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x116)[0x7f3548b24864]
/lib/x86_64-linux-gnu/libc.so.6(+0x26749)[0x7f3548b24749]
/lib/x86_64-linux-gnu/libc.so.6(+0x383d6)[0x7f3548b363d6]
sql/net_serv.cc:1200(my_real_read(st_net*, unsigned long*, char))[0x5563497332fe]
sql/net_serv.cc:1256(my_net_read_packet_reallen)[0x5563497333cb]
sql-common/client.c:348(cli_safe_read_reallen)[0x55634955d138]
sql-common/client.c:338(cli_safe_read)[0x55634955d0e2]
sql-common/client.c:1663(cli_read_change_user_result)[0x556349560377]
sql-common/client.c:2483(run_plugin_auth)[0x5563495620e0]
sql-common/client.c:3090(mysql_real_connect)[0x556349563eb5]
sql/semisync_slave.cc:147(Repl_semi_sync_slave::kill_connection(st_mysql*))[0x5563494880e6]
sql/semisync_slave.cc:119(Repl_semi_sync_slave::slave_stop(Master_info*))[0x556349487fc9]
sql/slave.cc:4926(handle_slave_io)[0x55634915bb0f]
perfschema/pfs.cc:1871(pfs_spawn_thread)[0x55634992b83f]
nptl/pthread_create.c:474(start_thread)[0x7f354907e450]

Note the crash happens during the creation of a new mysql connection (specifically, the kill_mysql connection from within repl_semi_sync_slave::kill_connection()).


Generated at Thu Feb 08 09:58:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.