[MDEV-30619] Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
Fix Version/s: 10.8.8, 10.4.31, 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1
Component/s: Replication
Labels:
None

Description

If the workers of a parallel replica are busy (potentially with long queues), but the SQL thread has no events left to distribute (so it goes idle). Then the next event that comes from the primary will update LMT with its timestamp, even if the workers may be quite far behind.

Proposed fix is for the SQL thread to additionally check if there are uncommitted events. That is, we should add an atomic counter (displayable as a new system status variable), which the SQL thread increments on reads, and that the workers decrement on commits. last_master_timestamp should only be updated by the SQL thread with the ~~MDEV-29639~~ logic if this counter is 0.

Attachments

Issue Links

causes

MDEV-31749 New test rpl.rpl_parallel_sbm in bb-10.4-MDEV-30619 sporadically fails in various locations (prepatch: lines 100, 177, 184) (postpatch_1: lines 180, 187)

Closed

includes

MDEV-31749 New test rpl.rpl_parallel_sbm in bb-10.4-MDEV-30619 sporadically fails in various locations (prepatch: lines 100, 177, 184) (postpatch_1: lines 180, 187)

Closed

is caused by

MDEV-29639 Seconds_Behind_Master is incorrect for Delayed, Parallel Replicas

Closed

relates to

MDEV-23021 rpl.rpl_parallel_optimistic_until fails on BB with various pattern

Closed

MDEV-30608 rpl.rpl_delayed_parallel_slave_sbm sometimes fails with Seconds_Behind_Master should not have used second transaction timestamp

Closed

MDEV-31895 Report a Replica's Time Difference with its Primary

Closed

MDEV-32265 seconds_behind_master is inaccurate for Delayed replication

Closed

MDEV-17516 Replication lag issue using parallel replication

Stalled

(3 relates to)

Activity

Ascending order - Click to sort in descending order

Brandon Nesterenko added a comment - 2023-06-29 15:56

Hi Andrei!

This is ready for review as PR-2682

Brandon Nesterenko added a comment - 2023-06-29 15:56 Hi Andrei! This is ready for review as PR-2682

Roel Van de Paar added a comment - 2023-07-22 03:23

Except for ~~MDEV-31749~~ this is OK to push.

Roel Van de Paar added a comment - 2023-07-22 03:23 Except for MDEV-31749 this is OK to push.

Roel Van de Paar added a comment - 2023-07-23 22:43

Please note that rpl.rpl_parallel_optimistic_until test failures (ref ~~MDEV-23021~~) may be more pronounced after the implementation of this patch.

Roel Van de Paar added a comment - 2023-07-23 22:43 Please note that rpl.rpl_parallel_optimistic_until test failures (ref MDEV-23021 ) may be more pronounced after the implementation of this patch.

People

Assignee:: Andrei Elkin

Reporter:: Brandon Nesterenko

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2023-02-08 19:10

Updated:: 2025-01-20 13:42

Resolved:: 2023-07-25 14:27

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server