[MDEV-30619] Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers Created: 2023-02-08  Updated: 2023-09-27  Resolved: 2023-07-25

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.8.8, 10.4.31, 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1

Type: Bug Priority: Critical
Reporter: Brandon Nesterenko Assignee: Andrei Elkin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
PartOf
includes MDEV-31749 New test rpl.rpl_parallel_sbm in bb-1... Closed
Problem/Incident
causes MDEV-31749 New test rpl.rpl_parallel_sbm in bb-1... Closed
is caused by MDEV-29639 Seconds_Behind_Master is incorrect fo... Closed
Relates
relates to MDEV-23021 rpl.rpl_parallel_optimistic_until fai... Closed
relates to MDEV-30608 rpl.rpl_delayed_parallel_slave_sbm so... Closed
relates to MDEV-31895 Report a Replica's Time Difference wi... Open
relates to MDEV-32265 seconds_behind_master is inaccurate f... Closed

 Description   

If the workers of a parallel replica are busy (potentially with long queues), but the SQL thread has no events left to distribute (so it goes idle). Then the next event that comes from the primary will update LMT with its timestamp, even if the workers may be quite far behind.

Proposed fix is for the SQL thread to additionally check if there are uncommitted events. That is, we should add an atomic counter (displayable as a new system status variable), which the SQL thread increments on reads, and that the workers decrement on commits. last_master_timestamp should only be updated by the SQL thread with the MDEV-29639 logic if this counter is 0.



 Comments   
Comment by Brandon Nesterenko [ 2023-06-29 ]

Hi Andrei!

This is ready for review as PR-2682

Comment by Roel Van de Paar [ 2023-07-22 ]

Except for MDEV-31749 this is OK to push.

Comment by Roel Van de Paar [ 2023-07-23 ]

Please note that rpl.rpl_parallel_optimistic_until test failures (ref MDEV-23021) may be more pronounced after the implementation of this patch.

Generated at Thu Feb 08 10:17:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.