[MDEV-30458] Consolidate Serial Replica to Parallel Replica with 1 Worker Thread Created: 2023-01-24  Updated: 2023-10-30

Status: Open
Project: MariaDB Server
Component/s: Replication
Fix Version/s: 11.5

Type: Task Priority: Major
Reporter: Brandon Nesterenko Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File repl_benchmark-no_lsu.png     PNG File repl_benchmark.png    
Issue Links:
Blocks
is blocked by MDEV-13915 STOP SLAVE takes very long time on a ... Closed
Relates
relates to MDEV-16404 Balanced replication parallel applier Stalled
relates to MDEV-17516 Replication lag issue using parallel ... Stalled
relates to MDEV-29639 Seconds_Behind_Master is incorrect fo... Closed

 Description   

To reduce code duplication and behavioral inconsistencies, the serial slave's SQL thread can be replaced with the parallel replication thread pool, and only using a single thread. Some of these inconsistencies can be seen by MDEV-17516 and MDEV-29639, which showcase different and inaccurate Seconds_Behind_Master behaviors between the serial/parallel versions. Treating the serial replica as slave_parallel_threads=1 will allow for both consistent and accurate Seconds_Behind_Master values, as well as reduce code bloat.
A user setting slave_parallel_threads=0 should automatically change to 1, and issue a warning to the user.

The following items are additional improvements that can be made:
1. We can reduce the server memory footprint and improve load balancing by removing the parallel worker queues being in memory. Instead, we can treat the relay log file as a single worker queue which all worker threads pull. The workers would know where to pull from by introducing a new status variable to track the GTID state of queued events, and in-combination with MDEV-4991 (GTID Binlog Indexing), will know where to read in the relay log for the next events. Additional analysis needs to be done here to quantify these benefits.
2. Set the default slave_parallel_threads value to 2 to increase default concurrency, and set warnings when slave_parallel_threads is 1 or 0.



 Comments   
Comment by Brandon Nesterenko [ 2023-02-03 ]

Performed a quick benchmark to compare a serial slave, against a parallel replica with one worker thread. The workload of the benchmark consisted of updates to a table with a single row. The first graph uses log-slave-updates, and the second is without log-slave-updates. There doesn't seem to be a concrete performance difference between the two methods.

Generated at Thu Feb 08 10:16:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.