[MDEV-6676] Optimistic parallel replication - Jira

XML

Word

Printable

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.1.3
Component/s: Replication
Labels:
- parallelslave
- replication

Description

Parallel replication in 10.0 relies either on sufficient parallelism being
available during master commit, or alternatively on user-annotated parallelism
with GTID domain ids. These may not be sufficient to achive good parallelism
on a slave.

The original approach in parallel replication was to only run in parallel
transactions that were known to be able to safely replicate in parallel, due
to group-committing together on the master. However, it turned out that there
were some corner cases where it could be not safe even in spite of this. So
a general solution was implemented that allows to handle and recover from an
attempt to do non-safe parallel replication, by detecting a deadlock in commit
order and retrying the problem transaction.

With this general solution, it actually becomes safe to attempt to replicate
any transactions in parallel, as long as those transactions can be rolled
back and re-tried (eg. InnoDB/XtraDB DML). This opens the way for
speculatively replicating in parallel on the slave in an attempt to get more
parallelism. We can simply queue transactions in parallel regardless of
whether they have same commit id from the master. If there are no conflicts,
then great, parallelism will be improved. If there is a conflict, the enforced
commit order will cause it to be detected as a deadlock, and the later
transaction will be rolled back and retried.

To avoid excessive rollback and retry, and to avoid attempts to roll back
non-transactional updates, we could have some simple heuristics about when to
attempt the speculative parallel apply. For example:

Annotate transactions on the master (with a flag in the GTID event) that
are pure InnoDB DML, and only attempt to run those in parallel
speculatively on the slave. Or alternatively, detect this during
open_tables(), and let events wait for prior transactions if they touch
non-transactional table.

Annotate on the master transactions that ended up having row lock waits on
other transactions, indicating a potential conflict. Such transactions
might be likely to also conflict on the slave, so might be better to let
wait for prior transactions, rather than try speculative parallel apply.

If the number of rows affected becomes large, pause the replicating large
transaction and wait for prior transactions to complete first, to avoid
having to do a large rollback (which is expensive in InnoDB).

Attachments

Activity

People

Assignee:: Kristian Nielsen

Reporter:: Kristian Nielsen

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2014-09-01 14:51

Updated:: 2015-02-07 11:07

Resolved:: 2015-02-07 11:07