[MDEV-11853] semisync thread can be killed after sync binlog but before ACK in the sync state Created: 2017-01-20  Updated: 2023-10-23  Resolved: 2022-04-22

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.29, 10.1.21, 10.2.3
Fix Version/s: 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Major
Reporter: VAROQUI Stephane Assignee: Brandon Nesterenko
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-28114 Semi-sync Master ACK Receiver Thread ... Closed
relates to MDEV-28141 Slave crashes with Packets out of ord... Open
relates to MDEV-29369 rpl.rpl_semi_sync_shutdown_await_ack ... In Review
relates to MDEV-32551 "Read semi-sync reply magic number er... Closed

 Description   

Shutdown of master in semisync when it's in sync can let more transactions in the master than the slaves.
This is expected in case of a crash as the binlog write before send and wait for the ACK. But this is an issue in case of regular shutdown, as the workload can not be safely be moved to a slave.

We can make sure the semi sync thread is killed last in shutdown and stopped only if status move to no sync or after receiving the ACK , this would possibly delay shutdown for the maximum time of the rpl_semi_sync_master_timeout



 Comments   
Comment by Andrei Elkin [ 2021-10-06 ]

MDEV-18450 has partly fixed this issue. The remained work is to further delay the master shutdown until either an ack
arrives or rpl_semi_sync_master_timeout elapses.

Comment by Brandon Nesterenko [ 2022-01-05 ]

Hi Andrei!

This is ready for review.

GH: 515dc59
BB: bb-10.4-MDEV-11853

Comment by Andrei Elkin [ 2022-03-17 ]

Analyzed the issue & the current patch to request some improvements.

Comment by Brandon Nesterenko [ 2022-03-30 ]

Hi Andrei! My new patch is ready for review:

Patch: 55ec699
BB: bb-10.4-MDEV-11853

It addresses previous review remarks, as well as MDEV-28141. Particularly, test case 5 is the MTR test case which would reproduce the packets out of order problem (when built with -DEXTRA_DEBUG); and my comment here highlights its fix.

Comment by Andrei Elkin [ 2022-04-05 ]

MDEV-11853 fixes are approved.

Comment by Brandon Nesterenko [ 2022-04-22 ]

Note that mysqld.cc will have a merge conflict in 10.7.

Fix branch: 10.7-MDEV-11853-merge

Fix patch: 71c3725

Generated at Thu Feb 08 07:53:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.