Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.0.13
Description
In March, Axel performed some benchmarks on parallel replication, as well as
comparison with MySQL 5.6 parallel replication.
MySQL 5.6 parallel replication corresponds more or less to setting domain_id
to different values in MariaDB. Axel's benchmarks showed though disappointing
performance compared to MySQL for this case, where one would expect similar
performance in either.
This needs to be investigated. It seems likely that there is a bottleneck or
locking mistake somewhere in the code, as this has not yet been much tested.
One possible explanation is related to the --slave-parallel-max-queued
parameter. When the SQL driver thread has queued this much events for a worker
thread, it will wait for more room in the queue for that thread. However, due
to batching of updates, that worker thread might not signal that the queue has
more room until it has completely emptied the queue. Meanwhile, other worker
threads will be stalled if they happen to complete their queue faster.
[Since Axel's benchmark works on an already generated master binlog, this
condition is likely to be hit]
This needs to be fixed somehow, for example simply by more frequently
signalling when events have been removed from the queue. Like whenever 1/4 of
the queue has been emptied or something (signalling for every event drained is
likely to be too expensive in terms of locking overhead).
There might be other issues as well, needs to be investigated.
Here is a pointer into the mail thread on maria-developers@ where this was
discussed:
https://lists.launchpad.net/maria-developers/msg07089.html
(I thought I had filed this bug already, but did not find it in search, sorry
if it is a duplicate).