[MDEV-29639] Seconds_Behind_Master is incorrect for Delayed, Parallel Replicas Created: 2022-09-26 Updated: 2023-09-27 Resolved: 2023-01-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10 |
| Fix Version/s: | 10.11.2, 11.0.1, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Brandon Nesterenko | Assignee: | Brandon Nesterenko |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
Delayed Replicas, i.e. those using the MASTER_DELAY option of CHANGE MASTER TO, also configured to use parallel threads calculate Seconds_Behind_Master incorrectly. This commit changed parallel replicas to update Seconds_Behind_Master at the time of transaction commit. However, on a delayed replica, an event's Seconds_Behind_Master will not be calculated until after MASTER_DELAY seconds have passed and the event has finished executing. In other words, when a new event is received, the value of Seconds_Behind_Master will be calculated using the time of the last committed event, resulting in potentially very large values of Seconds_Behind_Master for the entire duration of MASTER_DELAY. This is especially prevalent for workloads with infrequent transactions. The following MTR test highlights this issue:
|
| Comments |
| Comment by Brandon Nesterenko [ 2022-10-11 ] |
|
Closing as duplicate because the underlying cause is the same as MDEV-17516 |
| Comment by Andrei Elkin [ 2022-10-25 ] |
|
Reopened as MDEV-17516 is a more general issue, also unrelated to the delayed replication option. |
| Comment by Brandon Nesterenko [ 2022-11-04 ] |
|
Hi Andrei! This is ready for review: |
| Comment by Andrei Elkin [ 2023-01-04 ] |
|
Brandon, please find a refined approach in bb-10.3- |
| Comment by Brandon Nesterenko [ 2023-01-13 ] |
|
Hi Andrei! The newest commit to PR-2323 is ready for your review. |
| Comment by Andrei Elkin [ 2023-01-23 ] |
|
Thanks for this work, Brandon! |
| Comment by Brandon Nesterenko [ 2023-01-24 ] |
|
Fixed as d69e835 in 10.3. No merge conflicts or test failures observed in local merge-up. |