[MDEV-6680] Performance of domain_parallel replication is disappointing - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0.13
Fix Version/s: 10.0.15
Component/s: Replication
Labels:

Description

In March, Axel performed some benchmarks on parallel replication, as well as
comparison with MySQL 5.6 parallel replication.

MySQL 5.6 parallel replication corresponds more or less to setting domain_id
to different values in MariaDB. Axel's benchmarks showed though disappointing
performance compared to MySQL for this case, where one would expect similar
performance in either.

This needs to be investigated. It seems likely that there is a bottleneck or
locking mistake somewhere in the code, as this has not yet been much tested.

One possible explanation is related to the --slave-parallel-max-queued
parameter. When the SQL driver thread has queued this much events for a worker
thread, it will wait for more room in the queue for that thread. However, due
to batching of updates, that worker thread might not signal that the queue has
more room until it has completely emptied the queue. Meanwhile, other worker
threads will be stalled if they happen to complete their queue faster.

[Since Axel's benchmark works on an already generated master binlog, this
condition is likely to be hit]

This needs to be fixed somehow, for example simply by more frequently
signalling when events have been removed from the queue. Like whenever 1/4 of
the queue has been emptied or something (signalling for every event drained is
likely to be too expensive in terms of locking overhead).

There might be other issues as well, needs to be investigated.

Here is a pointer into the mail thread on maria-developers@ where this was
discussed:

https://lists.launchpad.net/maria-developers/msg07089.html

(I thought I had filed this bug already, but did not find it in search, sorry
if it is a duplicate).

Attachments

Activity

Ascending order - Click to sort in descending order

Daniel Black added a comment - 2014-10-09 11:29 - edited

While it seems there can be more that slave_parallel_max_queued used for various reasons in the documentation can we have a global status variable to indicate how much is actually in the queue?

I had a great success in increasing this to 512M (mixed replication where there are some fairly heavy multi row updates).

Daniel Black added a comment - 2014-10-09 11:29 - edited While it seems there can be more that slave_parallel_max_queued used for various reasons in the documentation can we have a global status variable to indicate how much is actually in the queue? I had a great success in increasing this to 512M (mixed replication where there are some fairly heavy multi row updates).

Kristian Nielsen added a comment - 2014-11-13 15:06

Pushed to 10.0.15:

http://lists.askmonty.org/pipermail/commits/2014-November/006975.html

Kristian Nielsen added a comment - 2014-11-13 15:06 Pushed to 10.0.15: http://lists.askmonty.org/pipermail/commits/2014-November/006975.html

People

Assignee:: Kristian Nielsen

Reporter:: Kristian Nielsen

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2014-09-02 11:59

Updated:: 2014-11-13 15:06

Resolved:: 2014-11-13 15:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server