[MDEV-5657] General cleanup of parallel replication event scheduling (was: Overlap (group) commit with next event group in parallel replication) - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.0.9
Component/s: None
Labels:
None

Description

Actually, this task is more of a general cleanup of the part of parallel
replication that handles scheduling of events to worker threads. The old code
has a large number of issues; the scheduling code is unnecessarily
complicated, and there are many corner cases, for example related to error
handling or slave stop, that are not handled correctly and which can lead to
various forms of corruption.

The below description is the user-visible part of the changes, but most of the
changes are actually needed to fix bugs that were also present in the old code
anyway.

In parallel replication, we record which event groups group-committed together
on the master, and are thus able to apply them in parallel on the slave.

Eg. if we have on the master A1 A2 A3 that group commit together, followed by
B1 and B2, we will run A1, A2, and A3 in parallel in each their own
worker.

But B1 will be queued for the same worker as A3, and B2, while queued for a
new worker, will wait for A3 to complete before it will start.

But actually, this is too pessimistic. B2 can start as soon as A1, A2, and A3
become ready to commit. Similarly, B1, could be spawned in a new worker and
also be allowed to start as soon as all event groups in the previous group
commit reach the commit stage.

This could be a big win if the slave is running with --log-slave-updates,
--sync-binlog=1, and/or innodb_flush_log_at_trx_commit=1; the slow fsync at
commit will not delay the execution of further events, and commit steps run in
parallel which gives more opportunity for group commit on the slave.

Attachments

Activity

Kristian Nielsen added a comment - 2014-02-26 18:12

Pushed to 10.0-base and 10.0

Kristian Nielsen added a comment - 2014-02-26 18:12 Pushed to 10.0-base and 10.0

People

Assignee:: Kristian Nielsen

Reporter:: Kristian Nielsen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2014-02-12 11:11

Updated:: 2014-02-26 18:12

Resolved:: 2014-02-26 18:12

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server