[MDEV-7837] Seconds behind Master reports incorrect value when Parallel replication is used Created: 2015-03-25 Updated: 2022-09-13 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.0 |
| Fix Version/s: | 10.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Guillaume Lefranc | Assignee: | Kristian Nielsen |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | seconds-behind-master | ||
| Issue Links: |
|
||||||||
| Description |
|
I am not sure how parallel replication impacts SBM calculation but the computation is obviously wrong when using parallel threads.
SBM first reports 0 then will increase monotonically from this point until it reaches the actual delay value (whatever is reported in "Time" for a worker thread in the processlist, for example). |
| Comments |
| Comment by Elena Stepanova [ 2015-03-26 ] |
|
Hi, From the description it sounds like regular behavior with somewhat slow-ish IO thread: the slave starts, it takes time to read an event from master during which periods the SQL thread(s) are idle, and thus SBM is 0, then an SQL thread gets the even and SBM gets updated and starts increasing. |
| Comment by Guillaume Lefranc [ 2015-04-16 ] |
|
Yes, that is the case indeed. I am complaining of this behaviour because I happen to stop a slave fully, including the IO thread, then restart it a few seconds later, and it does not report the correct number of seconds behind master. As you said, that is more or less expected because it's based on events gotten from the IO thread. |
| Comment by Elena Stepanova [ 2015-04-16 ] |
|
I doubt that – not with the current design, anyway. |
| Comment by Kristian Nielsen [ 2015-04-16 ] |
|
Hm, I'm always confused about the seconds-behind-master value. From how I read the comments, it seems the seconds-behind-master value is I suppose it was made that way to be able to report "0 seconds" for an Though I also vaguely remember reading some code that tries to detect and I guess it would be possible to send master's current time in the heartbeat Another option to get an idea of how far behind the slave is, is to compare |
| Comment by Elena Stepanova [ 2015-04-16 ] |
I think more accurate would be not "last event executed" but "current event that is being executed". Otherwise it sounds right. Here is a partial description from MySQL manual:
I'm afraid it will be even less informative in terms of time delay, because there might be 1mln tiny little events which will execute in a minute, or there might be a single event that will take hours... Anyway, tanj, if you want to submit a feature request for a new delay indicator, please do so, but as Kristian said above, it's not something that's currently planned. |
| Comment by VAROQUI Stephane [ 2015-09-16 ] |
|
Using 10.0.16 GTID SBR we stop a slave for 3 days . We start the replication that get a correct value for second_behind_master 160K stop slave; second_behind_master report 10 and slowly increase to 50 . |
| Comment by VAROQUI Stephane [ 2015-09-16 ] |
|
All events have UTC timestamp , so sql thread can get local UTC timestamp and compute a diff with the event one. |
| Comment by Rick James (Inactive) [ 2015-10-09 ] |
|
The timestamp from the Master is the start time of the query on the Master, correct? So, this naturally leads to fluctuations in Seconds_behind_master. |
| Comment by VAROQUI Stephane [ 2015-10-10 ] |
|
that's the point Parallel replication stop computing like this , if you stop a slave for 10 days on a busy master , restarting the slave will report 5s Seconds_behind_master despite it is still 10 days late, that's not correct and it is show stopper for Parallel replication in many cases . |
| Comment by VAROQUI Stephane [ 2015-10-10 ] |
|
to make Seconds_behind_master correct the io thread should receive timestamp of event sent to the slave - sum ( following events response time until last commit ) the slave compute: current time - execution time on master + sum of pending event execution If we don't wan't to store sum of pending events exec time in each events this should be store in heartbeat |
| Comment by Kristian Nielsen [ 2015-10-15 ] |
|
See https://lists.launchpad.net/maria-developers/msg08958.html |
| Comment by VAROQUI Stephane [ 2022-09-13 ] |
|
This task is missing a variable in the equation, it should be possible on IO thread connection to fetch the time of the last binary log of the leader and refresh it using heartbeat . The correct SBM is then equal : *MAX(Time end transactions on all SQL threads) - MAX( last bin log first connection time , last IO thread fetch time) * With END time transaction the question remain for accounting the running transaction progress as for online DDL using START time we would ignore curent execution and this should be accurate with most transactional online DDL This should support microsecond delay because as the new default for any query time now At the end the heartbeat of a statement create or replace view select timestamp like proposed by many tools is probably the most elegant implementation |