[MDEV-16091] Seconds_Behind_Master spikes to millions of seconds Created: 2018-05-04  Updated: 2024-01-29  Resolved: 2022-01-05

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.28, 10.0.38, 10.4.19
Fix Version/s: 10.2.42, 10.3.33, 10.4.23, 10.5.14, 10.6.6, 10.7.2

Type: Bug Priority: Critical
Reporter: raul (Inactive) Assignee: Brandon Nesterenko
Resolution: Fixed Votes: 2
Labels: Seconds_Behind_Master, seconds-behind-master
Environment:

CentOS Linux release 7.4.1708 (Core)
Linux global-db 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Relates
relates to MDEV-31895 Report a Replica's Time Difference wi... Open
relates to MDEV-33327 rpl_seconds_behind_master_spike Sensi... Closed

 Description   

Hey everyone

Rarely, we have a weird issue happening. We have a master <-> master setup. The writes are going to only one of them. Every now and then, the replication lag on the master where we send writes jumps to millions of seconds (ex: 2879736). Then, after 1s, is back to 0. After all, there's no lag, as the writes and the binlog from the other master should be ignored, as this is the server where we write to.
Both servers have the same timezone and ntp running.
Replication is set with GTIDs, and parallel threads.

Sample config:
slave_net_timeout=2
gtid-strict-mode=1
gtid-domain-id=1
server_id = 101
sync_binlog = 1
binlog_format = ROW
log-slave-updates
replicate-same-server-id = 0
log_bin=/var/log/mysql/binlog/mysql-bin.log
log_bin_index=/var/log/mysql/binlog/mysql-bin.log.index
relay_log=/var/log/mysql/binlog/mysql-relay-bin
relay_log_index=/var/log/mysql/binlog/mysql-relay-bin.index
expire_logs_days=5
max_binlog_size=1G
slave_parallel_threads=10

We have a bunch of monitoring scripts looking at that value and triggering actions and alerts and we want to know what can cause this and if there's anything we can do to avoid this in the future.

Thank you



 Comments   
Comment by raul (Inactive) [ 2018-05-11 ]

Hey guys

And, 1 week after, we got it again on the same cluster ... and again, server logs seem fine
Is there anything that i can enable or do to debug this further? So next time when it happens, at least i get more info about what is causing this?

Thanks

Comment by Sachin Setiya (Inactive) [ 2018-05-15 ]

Hi genesisdigital

Can you provide ?(at the time when seconds behind master is >0 )
Show full processlist
Show slave status

Comment by raul (Inactive) [ 2018-05-15 ]

Hey

I'll enable that and gather info - so when next time when it happens i'll be able to post that back

Thank you!

Comment by raul (Inactive) [ 2018-05-30 ]

So it happened again, but to me the debug looks ok. Replication lag: 5095481
What happened is that immediately after "show slave status" reported that big number, our debug script issued a show processlist and show slave status again ... and here is the result (replaced user, db, tabels, hosts, etc)
Also, pasted the replication part of my config and some server info

421916257   REPL_USER  127.0.0.14:9530 NULL    Binlog Dump 4196334 Master has sent all binlog to slave; waiting for binlog to be updated   NULL    0.000
421926220   system user     NULL    Connect 4196246 Waiting for master to send event    NULL    0.000
421926221   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926222   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926223   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926224   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926225   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926226   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926227   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926228   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926229   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926230   system user     NULL    Connect 4196246 Waiting for work from SQL thread    NULL    0.000
421926231   system user     NULL    Connect 0   Slave has read all relay log; waiting for the slave I/O thread to update it NULL    0.000
1057462568  MYSQL_USER 127.0.0.1:7368  MYSQL_DB  Query   0   updating    DELETE FROM `table`\nWHERE `created_at` < '2018-05-23 00:15:02'\nAND `status` != 5 LIMIT 100   0.000
1057549650  MYSQL_USER 127.0.0.1:48560 MYSQL_DB  Sleep   294     NULL    0.000
1057553995  MYSQL_USER 127.0.0.1:49378 MYSQL_DB  Sleep   258     NULL    0.000
1057554917  MYSQL_USER 127.0.0.1:49474 MYSQL_DB  Sleep   251     NULL    0.000
1057555155  MYSQL_USER 127.0.0.1:49526 MYSQL_DB  Sleep   249     NULL    0.000
1057555274  MYSQL_USER 127.0.0.1:49662 MYSQL_DB  Sleep   248     NULL    0.000
1057555414  MYSQL_USER 127.0.0.1:49688 MYSQL_DB  Sleep   247     NULL    0.000
1057556192  MYSQL_USER 127.0.0.1:49924 MYSQL_DB  Sleep   242     NULL    0.000
1057556675  MYSQL_USER 127.0.0.1:49986 MYSQL_DB  Sleep   237     NULL    0.000
1057557241  MYSQL_USER 127.0.0.1:50114 MYSQL_DB  Sleep   233     NULL    0.000
1057557501  MYSQL_USER 127.0.0.1:33190 MYSQL_DB  Sleep   230     NULL    0.000
1057558468  MYSQL_USER 127.0.0.1:50392 MYSQL_DB  Sleep   224     NULL    0.000
1057558860  MYSQL_USER 127.0.0.1:50418 MYSQL_DB  Sleep   221     NULL    0.000
1057558970  MYSQL_USER 127.0.0.1:50432 MYSQL_DB  Sleep   220     NULL    0.000
1057559842  MYSQL_USER 127.0.0.1:50648 MYSQL_DB  Sleep   214     NULL    0.000
1057561231  MYSQL_USER 127.0.0.1:50852 MYSQL_DB  Sleep   206     NULL    0.000
1057561378  MYSQL_USER 127.0.0.1:50864 MYSQL_DB  Sleep   205     NULL    0.000
1057561491  MYSQL_USER 127.0.0.1:50878 MYSQL_DB  Sleep   204     NULL    0.000
1057561863  MYSQL_USER 127.0.0.1:50954 MYSQL_DB  Sleep   201     NULL    0.000
1057561993  MYSQL_USER 127.0.0.1:50964 MYSQL_DB  Sleep   199     NULL    0.000
1057563227  MYSQL_USER 127.0.0.1:51218 MYSQL_DB  Sleep   190     NULL    0.000
1057563572  MYSQL_USER 127.0.0.1:51406 MYSQL_DB  Sleep   187     NULL    0.000
1057564224  MYSQL_USER 127.0.0.1:42274 MYSQL_DB  Sleep   180     NULL    0.000
1057564260  MYSQL_USER 127.0.0.1:51436 MYSQL_DB  Sleep   180     NULL    0.000
1057564499  MYSQL_USER 127.0.0.1:51444 MYSQL_DB  Sleep   179     NULL    0.000
1057564804  MYSQL_USER 127.0.0.1:51484 MYSQL_DB  Sleep   178     NULL    0.000
1057565497  MYSQL_USER 127.0.0.1:51564 MYSQL_DB  Sleep   175     NULL    0.000
1057565698  MYSQL_USER 127.0.0.1:51580 MYSQL_DB  Sleep   174     NULL    0.000
1057566831  MYSQL_USER 127.0.0.1:51588 MYSQL_DB  Sleep   170     NULL    0.000
1057567650  MYSQL_USER 127.0.0.1:51686 MYSQL_DB  Sleep   166     NULL    0.000
1057568181  MYSQL_USER 127.0.0.1:56542 MYSQL_DB  Sleep   164     NULL    0.000
1057570431  MYSQL_USER 127.0.0.1:51910 MYSQL_DB  Sleep   156     NULL    0.000
1057570466  MYSQL_USER 127.0.0.1:64184 MYSQL_DB  Sleep   155     NULL    0.000
1057571020  MYSQL_USER 127.0.0.1:51934 MYSQL_DB  Sleep   153     NULL    0.000
1057571254  MYSQL_USER 127.0.0.1:51942 MYSQL_DB  Sleep   152     NULL    0.000
1057571360  MYSQL_USER 127.0.0.1:51958 MYSQL_DB  Sleep   151     NULL    0.000
1057572456  MYSQL_USER 127.0.0.1:51978 MYSQL_DB  Sleep   147     NULL    0.000
1057572784  MYSQL_USER 127.0.0.1:52042 MYSQL_DB  Sleep   145     NULL    0.000
1057573016  MYSQL_USER 127.0.0.1:52114 MYSQL_DB  Sleep   144     NULL    0.000
1057573064  MYSQL_USER 127.0.0.1:52122 MYSQL_DB  Sleep   144     NULL    0.000
1057573245  MYSQL_USER 127.0.0.1:52130 MYSQL_DB  Sleep   143     NULL    0.000
1057574297  MYSQL_USER 127.0.0.1:52198 MYSQL_DB  Sleep   137     NULL    0.000
1057575401  MYSQL_USER 127.0.0.1:52296 MYSQL_DB  Sleep   133     NULL    0.000
1057575734  MYSQL_USER 127.0.0.1:52304 MYSQL_DB  Sleep   132     NULL    0.000
1057576046  MYSQL_USER 127.0.0.1:52318 MYSQL_DB  Sleep   130     NULL    0.000
1057576298  MYSQL_USER 127.0.0.1:52334 MYSQL_DB  Sleep   129     NULL    0.000
1057576537  MYSQL_USER 127.0.0.1:52344 MYSQL_DB  Sleep   128     NULL    0.000
1057576861  MYSQL_USER 127.0.0.1:52434 MYSQL_DB  Sleep   127     NULL    0.000
1057577263  MYSQL_USER 127.0.0.1:52498 MYSQL_DB  Sleep   125     NULL    0.000
1057577524  MYSQL_USER 127.0.0.1:52514 MYSQL_DB  Sleep   123     NULL    0.000
1057577996  MYSQL_USER 127.0.0.1:52562 MYSQL_DB  Sleep   119     NULL    0.000
1057578103  MYSQL_USER 127.0.0.1:52594 MYSQL_DB  Sleep   118     NULL    0.000
1057578195  MYSQL_USER 127.0.0.1:52622 MYSQL_DB  Sleep   117     NULL    0.000
1057578383  MYSQL_USER 127.0.0.1:52666 MYSQL_DB  Sleep   116     NULL    0.000
1057578432  MYSQL_USER 127.0.0.1:52674 MYSQL_DB  Sleep   115     NULL    0.000
1057578559  MYSQL_USER 127.0.0.1:52682 MYSQL_DB  Sleep   114     NULL    0.000
1057578584  MYSQL_USER 127.0.0.1:21102 MYSQL_DB  Sleep   0       NULL    0.000
1057578728  MYSQL_USER 127.0.0.1:52692 MYSQL_DB  Sleep   113     NULL    0.000
1057579056  MYSQL_USER 127.0.0.1:52820 MYSQL_DB  Sleep   110     NULL    0.000
1057579184  MYSQL_USER 127.0.0.1:52834 MYSQL_DB  Sleep   109     NULL    0.000
1057579383  MYSQL_USER 127.0.0.1:52870 MYSQL_DB  Sleep   108     NULL    0.000
1057579456  MYSQL_USER 127.0.0.1:52928 MYSQL_DB  Sleep   107     NULL    0.000
1057580089  MYSQL_USER 127.0.0.1:23254 MYSQL_DB  Sleep   1       NULL    0.000
1057580481  MYSQL_USER 127.0.0.1:53184 MYSQL_DB  Sleep   100     NULL    0.000
1057582354  MYSQL_USER 127.0.0.1:53400 MYSQL_DB  Sleep   87      NULL    0.000
1057583232  MYSQL_USER 127.0.0.1:53798 MYSQL_DB  Sleep   80      NULL    0.000
1057583467  MYSQL_USER 127.0.0.1:53828 MYSQL_DB  Sleep   78      NULL    0.000
1057583666  MYSQL_USER 127.0.0.1:53888 MYSQL_DB  Sleep   77      NULL    0.000
1057583798  MYSQL_USER 127.0.0.1:53896 MYSQL_DB  Sleep   76      NULL    0.000
1057584060  MYSQL_USER 127.0.0.1:53942 MYSQL_DB  Sleep   74      NULL    0.000
1057584175  MYSQL_USER 127.0.0.1:53964 MYSQL_DB  Sleep   73      NULL    0.000
1057584229  MYSQL_USER 127.0.0.1:54022 MYSQL_DB  Sleep   72      NULL    0.000
1057584240  MYSQL_USER 127.0.0.1:54036 MYSQL_DB  Sleep   72      NULL    0.000
1057584905  MYSQL_USER 127.0.0.1:54130 MYSQL_DB  Sleep   69      NULL    0.000
1057585362  MYSQL_USER 127.0.0.1:54206 MYSQL_DB  Sleep   66      NULL    0.000
1057585474  MYSQL_USER 127.0.0.1:54244 MYSQL_DB  Sleep   66      NULL    0.000
1057585701  MYSQL_USER 127.0.0.1:54262 MYSQL_DB  Sleep   65      NULL    0.000
1057585731  MYSQL_USER 127.0.0.1:54276 MYSQL_DB  Sleep   65      NULL    0.000
1057585776  MYSQL_USER 127.0.0.1:54290 MYSQL_DB  Sleep   64      NULL    0.000
1057585912  MYSQL_USER 127.0.0.1:54304 MYSQL_DB  Sleep   64      NULL    0.000
1057585985  MYSQL_USER 127.0.0.1:54320 MYSQL_DB  Sleep   63      NULL    0.000
1057586037  MYSQL_USER 127.0.0.1:54334 MYSQL_DB  Sleep   63      NULL    0.000
1057586115  MYSQL_USER 127.0.0.1:54362 MYSQL_DB  Sleep   62      NULL    0.000
1057586279  MYSQL_USER 127.0.0.1:54428 MYSQL_DB  Sleep   61      NULL    0.000
1057586540  MYSQL_USER 127.0.0.1:54474 MYSQL_DB  Sleep   58      NULL    0.000
1057587012  MYSQL_USER 127.0.0.1:54534 MYSQL_DB  Sleep   55      NULL    0.000
1057587261  MYSQL_USER 127.0.0.1:54558 MYSQL_DB  Sleep   54      NULL    0.000
1057588096  MYSQL_USER 127.0.0.1:54570 MYSQL_DB  Sleep   50      NULL    0.000
1057588168  MYSQL_USER 127.0.0.1:32944 MYSQL_DB  Sleep   49      NULL    0.000
1057588174  MYSQL_USER 127.0.0.1:54582 MYSQL_DB  Sleep   49      NULL    0.000
1057588289  MYSQL_USER 127.0.0.1:54590 MYSQL_DB  Sleep   48      NULL    0.000
1057588520  MYSQL_USER 127.0.0.1:54606 MYSQL_DB  Sleep   46      NULL    0.000
1057588704  MYSQL_USER 127.0.0.1:54642 MYSQL_DB  Sleep   45      NULL    0.000
1057588707  MYSQL_USER 127.0.0.1:54650 MYSQL_DB  Sleep   45      NULL    0.000
1057588967  MYSQL_USER 127.0.0.1:54666 MYSQL_DB  Sleep   43      NULL    0.000
1057589221  MYSQL_USER 127.0.0.1:54674 MYSQL_DB  Sleep   41      NULL    0.000
1057589820  MYSQL_USER 127.0.0.1:54736 MYSQL_DB  Sleep   37      NULL    0.000
1057589886  MYSQL_USER 127.0.0.1:54790 MYSQL_DB  Sleep   37      NULL    0.000
1057590023  MYSQL_USER 127.0.0.1:54816 MYSQL_DB  Sleep   35      NULL    0.000
1057590293  MYSQL_USER 127.0.0.1:54892 MYSQL_DB  Sleep   33      NULL    0.000
1057590392  MYSQL_USER 127.0.0.1:54906 MYSQL_DB  Sleep   33      NULL    0.000
1057590419  MYSQL_USER 127.0.0.1:35634 MYSQL_DB  Sleep   33      NULL    0.000
1057590432  MYSQL_USER 127.0.0.1:54924 MYSQL_DB  Sleep   33      NULL    0.000
1057590510  MYSQL_USER 127.0.0.1:54934 MYSQL_DB  Sleep   32      NULL    0.000
1057590585  MYSQL_USER 127.0.0.1:54970 MYSQL_DB  Sleep   32      NULL    0.000
1057590588  MYSQL_USER 127.0.0.1:54972 MYSQL_DB  Sleep   32      NULL    0.000
1057590796  MYSQL_USER 127.0.0.1:54984 MYSQL_DB  Sleep   31      NULL    0.000
1057591136  MYSQL_USER 127.0.0.1:54998 MYSQL_DB  Sleep   29      NULL    0.000
1057591246  MYSQL_USER 127.0.0.1:55004 MYSQL_DB  Sleep   0       NULL    0.000
1057591324  MYSQL_USER 127.0.0.1:55020 MYSQL_DB  Sleep   28      NULL    0.000
1057591451  MYSQL_USER 127.0.0.1:55046 MYSQL_DB  Sleep   27      NULL    0.000
1057591552  MYSQL_USER 127.0.0.1:55068 MYSQL_DB  Sleep   27      NULL    0.000
1057591561  MYSQL_USER 127.0.0.1:55076 MYSQL_DB  Sleep   27      NULL    0.000
1057592017  MYSQL_USER 127.0.0.1:37252 MYSQL_DB  Sleep   25      NULL    0.000
1057592130  MYSQL_USER 127.0.0.1:37438 MYSQL_DB  Sleep   24      NULL    0.000
1057592151  MYSQL_USER 127.0.0.1:37464 MYSQL_DB  Sleep   24      NULL    0.000
1057592196  MYSQL_USER 127.0.0.1:55108 MYSQL_DB  Sleep   23      NULL    0.000
1057592351  MYSQL_USER 127.0.0.1:55116 MYSQL_DB  Sleep   22      NULL    0.000
1057592475  MYSQL_USER 127.0.0.1:55142 MYSQL_DB  Sleep   22      NULL    0.000
1057592564  MYSQL_USER 127.0.0.1:55222 MYSQL_DB  Sleep   21      NULL    0.000
1057592570  MYSQL_USER 127.0.0.1:55226 MYSQL_DB  Sleep   21      NULL    0.000
1057592579  MYSQL_USER 127.0.0.1:55242 MYSQL_DB  Sleep   21      NULL    0.000
1057592587  MYSQL_USER 127.0.0.1:55252 MYSQL_DB  Sleep   21      NULL    0.000
1057592588  MYSQL_USER 127.0.0.1:55254 MYSQL_DB  Sleep   21      NULL    0.000
1057592589  MYSQL_USER 127.0.0.1:55256 MYSQL_DB  Sleep   21      NULL    0.000
1057592592  MYSQL_USER 127.0.0.1:55258 MYSQL_DB  Sleep   21      NULL    0.000
1057592593  MYSQL_USER 127.0.0.1:55262 MYSQL_DB  Sleep   21      NULL    0.000
1057592599  MYSQL_USER 127.0.0.1:55274 MYSQL_DB  Sleep   21      NULL    0.000
1057592765  MYSQL_USER 127.0.0.1:55300 MYSQL_DB  Sleep   20      NULL    0.000
1057592973  MYSQL_USER 127.0.0.1:38384 MYSQL_DB  Sleep   19      NULL    0.000
1057592988  MYSQL_USER 127.0.0.1:38394 MYSQL_DB  Sleep   19      NULL    0.000
1057593187  MYSQL_USER 127.0.0.1:55312 MYSQL_DB  Sleep   18      NULL    0.000
1057593490  MYSQL_USER 127.0.0.1:55356 MYSQL_DB  Sleep   16      NULL    0.000
1057593493  MYSQL_USER 127.0.0.1:55358 MYSQL_DB  Sleep   16      NULL    0.000
1057593505  MYSQL_USER 127.0.0.1:55372 MYSQL_DB  Sleep   16      NULL    0.000
1057593508  MYSQL_USER 127.0.0.1:55374 MYSQL_DB  Sleep   16      NULL    0.000
1057593660  MYSQL_USER 127.0.0.1:55386 MYSQL_DB  Sleep   15      NULL    0.000
1057593691  MYSQL_USER 127.0.0.1:55408 MYSQL_DB  Sleep   14      NULL    0.000
1057593701  MYSQL_USER 127.0.0.1:55416 MYSQL_DB  Sleep   14      NULL    0.000
1057593803  MYSQL_USER 127.0.0.1:55438 MYSQL_DB  Sleep   13      NULL    0.000
1057593814  MYSQL_USER 127.0.0.1:55446 MYSQL_DB  Sleep   13      NULL    0.000
1057593923  MYSQL_USER 127.0.0.1:39730 MYSQL_DB  Sleep   13      NULL    0.000
1057593979  MYSQL_USER 127.0.0.1:55454 MYSQL_DB  Sleep   12      NULL    0.000
1057594170  MYSQL_USER 127.0.0.1:55488 MYSQL_DB  Sleep   11      NULL    0.000
1057594201  MYSQL_USER 127.0.0.1:55502 MYSQL_DB  Sleep   11      NULL    0.000
1057594269  MYSQL_USER 127.0.0.1:55516 MYSQL_DB  Sleep   10      NULL    0.000
1057594290  MYSQL_USER 127.0.0.1:55530 MYSQL_DB  Sleep   10      NULL    0.000
1057594404  MYSQL_USER 127.0.0.1:55538 MYSQL_DB  Sleep   9       NULL    0.000
1057594415  MYSQL_USER 127.0.0.1:40432 MYSQL_DB  Sleep   9       NULL    0.000
1057594492  MYSQL_USER 127.0.0.1:40520 MYSQL_DB  Sleep   8       NULL    0.000
1057594580  MYSQL_USER 127.0.0.1:55552 MYSQL_DB  Sleep   8       NULL    0.000
1057594616  MYSQL_USER 127.0.0.1:55568 MYSQL_DB  Sleep   8       NULL    0.000
1057594618  MYSQL_USER 127.0.0.1:55572 MYSQL_DB  Sleep   8       NULL    0.000
1057594629  MYSQL_USER 127.0.0.1:40710 MYSQL_DB  Sleep   8       NULL    0.000
1057594643  MYSQL_USER 127.0.0.1:55590 MYSQL_DB  Sleep   7       NULL    0.000
1057594654  MYSQL_USER 127.0.0.1:55598 MYSQL_DB  Sleep   7       NULL    0.000
1057594757  MYSQL_USER 127.0.0.1:40854 MYSQL_DB  Sleep   6       NULL    0.000
1057594761  MYSQL_USER 127.0.0.1:55606 MYSQL_DB  Sleep   6       NULL    0.000
1057595018  MYSQL_USER 127.0.0.1:55630 MYSQL_DB  Sleep   5       NULL    0.000
1057595063  MYSQL_USER 127.0.0.1:41272 MYSQL_DB  Sleep   4       NULL    0.000
1057595119  MYSQL_USER 127.0.0.1:55638 MYSQL_DB  Sleep   4       NULL    0.000
1057595134  MYSQL_USER 127.0.0.1:32392 MYSQL_DB  Sleep   4       NULL    0.000
1057595136  MYSQL_USER 127.0.0.1:32394 MYSQL_DB  Sleep   4       NULL    0.000
1057595159  MYSQL_USER 127.0.0.1:41382 MYSQL_DB  Sleep   3       NULL    0.000
1057595192  MYSQL_USER 127.0.0.1:41412 MYSQL_DB  Sleep   3       NULL    0.000
1057595253  MYSQL_USER 127.0.0.1:55644 MYSQL_DB  Sleep   3       NULL    0.000
1057595321  MYSQL_USER 127.0.0.1:41554 MYSQL_DB  Sleep   3       NULL    0.000
1057595468  MYSQL_USER 127.0.0.1:32882 MYSQL_DB  Sleep   2       NULL    0.000
1057595497  MYSQL_USER 127.0.0.1:41760 MYSQL_DB  Sleep   2       NULL    0.000
1057595533  MYSQL_USER 127.0.0.1:55660 MYSQL_DB  Sleep   1       NULL    0.000
1057595562  MYSQL_USER 127.0.0.1:41822 MYSQL_DB  Sleep   1       NULL    0.000
1057595611  MYSQL_USER 127.0.0.1:41922 MYSQL_DB  Sleep   1       NULL    0.000
1057595663  MYSQL_USER 127.0.0.1:33224 MYSQL_DB  Sleep   1       NULL    0.000
1057595717  MYSQL_USER 127.0.0.1:33288 MYSQL_DB  Sleep   0       NULL    0.000
1057595731  MYSQL_USER 127.0.0.1:42058 MYSQL_DB  Sleep   0       NULL    0.000
1057595756  MYSQL_USER 127.0.0.1:34440 MYSQL_DB  Sleep   0       NULL    0.000
1057595757  MYSQL_USER 127.0.0.1:33326 MYSQL_DB  Sleep   0       NULL    0.000
1057595760  MYSQL_USER 127.0.0.1:34448 MYSQL_DB  Sleep   0       NULL    0.000
1057595761  MYSQL_USER 127.0.0.1:34456 MYSQL_DB  Sleep   0       NULL    0.000
1057595780  MYSQL_USER 127.0.0.1:42108 MYSQL_DB  Sleep   0       NULL    0.000
1057595788  MYSQL_USER 127.0.0.1:42126 MYSQL_DB  Sleep   0       NULL    0.000
1057595790  MYSQL_USER 127.0.0.1:34518 MYSQL_DB  Sleep   0       NULL    0.000
1057595791  MYSQL_USER 127.0.0.1:34522 MYSQL_DB  Sleep   0       NULL    0.000
1057595792  MYSQL_USER 127.0.0.1:34524 MYSQL_DB  Sleep   0       NULL    0.000
1057595794  root    localhost   NULL    Query   0   init    SHOW FULL PROCESSLIST   0.000

               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.2
                  Master_User: REPL_USER
                  Master_Port: 3306
                Connect_Retry: 1
              Master_Log_File: mysql-bin.001753
          Read_Master_Log_Pos: 484555
               Relay_Log_File: mysql-relay-bin.003499
                Relay_Log_Pos: 654
        Relay_Master_Log_File: mysql-bin.001753
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 484555
              Relay_Log_Space: 995
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 103
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-548165707
      Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
                Parallel_Mode: conservative

MYSQL CONFIG

datadir = /var/lib/mysql/data/
socket = /var/lib/mysql/mysql.sock
group_concat_max_len = 1844674407370955161
max_connect_errors = 100k
thread_cache_size = 200
tmp_table_size = 64MB
max_heap_table_size = 64MB
innodb_stats_on_metadata = 0
innodb_stats_transient_sample_pages = 32
innodb_stats_persistent_sample_pages = 32
performance_schema = 0
 
max_connections=10000
symbolic-links=0
local-infile=0
skip-name-resolve
skip-external-locking
default-storage-engine = InnoDB
log_warnings = 2
 
max_allowed_packet = 128M
wait_timeout = 300
connect_timeout = 10
delayed_insert_timeout = 300
interactive_timeout = 300
net_read_timeout = 300
net_write_timeout = 300
query_cache_size=0
query_cache_type=0
 
mrr_buffer_size=128M
join_cache_level=6
join_buffer_size=128M
 
innodb_file_per_table
innodb_file_format=barracuda
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=1
innodb_log_file_size=1G
innodb_log_files_in_group=10
innodb_buffer_pool_size=90G
innodb_buffer_pool_instances=16
innodb_log_buffer_size=128M
innodb_io_capacity=100000
innodb_io_capacity_max=100000
innodb_doublewrite=0
innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1
innodb_flush_neighbors=0
innodb_lru_scan_depth=2000
innodb_open_files=5000000
open_files_limit=5000000
 
innodb_read_io_threads = 64
innodb_write_io_threads = 64
 
innodb_use_mtflush = 1
innodb_use_fallocate = 1
innodb_use_trim = 1
innodb_mtflush_threads = 8
innodb_compression_level = 1
innodb_log_compressed_pages = 0
innodb_purge_threads = 4
innodb_autoinc_lock_mode = 2
sql_mode='NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
table_cache=50000
table_open_cache = 50000
table_definition_cache = 10000
 
 
log-error=/var/lib/mysql/logs/server.err
master_retry_count=999999
slave_net_timeout=2
gtid-strict-mode=1
gtid-domain-id=1
server_id = 101
sync_binlog = 1
binlog_format = ROW
log-slave-updates
replicate-same-server-id = 0
log_bin=/var/lib/mysql/binlog/mysql-bin.log
log_bin_index=/var/lib/mysql/binlog/mysql-bin.log.index
relay_log=/var/lib/mysql/binlog/mysql-relay-bin
relay_log_index=/var/lib/mysql/binlog/mysql-relay-bin.index
expire_logs_days=5
max_binlog_size=1G
slave_parallel_threads=10

SERVER INFO
Rackspace OnMetal IO v2. Mem 128GB, Dual 1.6 TB PCIe flash cards in RAID0 (software - mdadm), Dual 2.6 GHz, 10 core Intel® Xeon® E5-2660 v3

Comment by raul (Inactive) [ 2018-06-04 ]

Hey guys

Let me know if there's any other debug that i might enable or anything at all that might help in finding out why this is happening
Thank you

Comment by raul (Inactive) [ 2018-06-11 ]

Hey guys
Last night it happened again
Everything seems exactly as i posted before, except the only query that was running was an update on the same table as before:

Opening tables	UPDATE `TABLE` SET deleted_at = NOW()\nWHERE `job_id` = 37318708\nAND `hostname` = 'xxx'

The table does have an index on the job_id + hostname combination.
Explain extended on the the query: https://goo.gl/VBNzk1

Any ideas what i could do to get rid of this error?
Also, we have 3-4 more setups JUST LIKE THIS and it does not happen ANYWHERE else, just on this one.
Any other info i could provide you to help me figure this out?

Thanks

Comment by raul (Inactive) [ 2018-07-05 ]

Hey guys

We've got the error again: Replication lag 1916436
However, this time, the processlist was just Sleeping threads, nothing running.

14	system user		NULL	Connect	0	Slave has read all relay log; waiting for the slave I/O thread to update it	NULL	0.000
15	USER	127.0.0.2:42374	NULL	Binlog Dump	1916463	Master has sent all binlog to slave; waiting for binlog to be updated	NULL	0.000

Also, from the log, it seems that "show slave status" actually recorded Seconds_Behind_Master as the Time since replication was started??

Comment by Nilnandan Joshi [ 2021-01-05 ]

One of our customer is observing almost same issue. They are seeing intermittent slave lag suddenly from 0 to 70000 and then 0 very next second.

1. slave replication lag seconds behind master = 0 at 04/01/2021 00:00:00am
2. slave replication lag seconds behind master = 70000 at 04/01/2021 00:00:01am
3. slave replication lag seconds behind master = 0 at 04/01/2021 00:00:02am

Btw, if they will disable slave_compressed_protocol then this issue doesn't happen.

MariaDB Version 10.4.14
binlog_format ROW
slave_parallel_threads 0
slave_parallel_workers 0

Comment by Rick James [ 2021-04-14 ]

I'm chiming in to point out how old and how elusive this problem is.

I first encountered this issue sometime around 2002 on MySQL 4.0. I bugged MySQL AB about it, but I could not provide a reliable test case.

Seconds_behind_master would jump to some ridiculous value, stay there for several minutes, then suddenly drop back to zero. Meanwhile, the Replicas would not be complaining – as if the number were bogus.

The problem continued into 4.1 and 5.1; after that, I got away from being aware of the details of Replication. I'm glad (or sad) to see that it is still unresolved. It's interesting that GTID did not cure it.

Comment by Andrei Elkin [ 2021-11-22 ]

Opened back after having more complaints on this behavior.
It looks rather probable at the SBM computation has an undetected yet flaw.
At times its value seems get set to now - replication_start_timestamp.

Comment by Brandon Nesterenko [ 2021-12-30 ]

Hi Andrei! This is ready for review.

My analysis:
A slave’s relay log format description event is used when calculating Seconds_Behind_Master. This forces its value to temporarily spike when processing these events, as their creation date is set to the timestamp that the IO thread begins.

Commit: e9b1c0e

BB: bb-10.2-MDEV-16091

Comment by Andrei Elkin [ 2022-01-04 ]

allen.lee@mariadb.com, the patch applies universally.

Comment by Andrei Elkin [ 2022-01-04 ]

Thanks for both the great analysis and the patch.

Comment by Brandon Nesterenko [ 2022-01-05 ]

This patch will have a merge conflict in 10.4 with sql/slave.cc. Here is a commit with its resolution: ddd79e2 (taken from 10.4-MDEV-16091-merge)

Comment by james wang [ 2022-04-10 ]

Hi All,

I have the same issue for 10.2.27 (not feasible to upgrade to 10.4)

How to walk around the issue please?

How to "The fix in 10.2.42 is to mark and remember the event as relaylog event instead use the last_master_timestamp from master again" please?

Thanks a lot in advance

Comment by james wang [ 2022-04-10 ]

@bnestere @Brandon Nesterenko we have many master-master replication configurations based on 10.2.27 and only one has such issue.

Is there any way to walk around the issue please? thanks

Comment by Valerii Kravchuk [ 2022-04-10 ]

The fix is visible in the related commit:

https://github.com/MariaDB/server/commit/96de6bfd5e0d08bd99a47e8a3d60b1c2900a38f8

"Problem:
========
A slave’s relay log format description event is used when
calculating Seconds_Behind_Master (SBM). This forces the SBM
value to spike when processing these events, as their creation
date is set to the timestamp that the IO thread begins.

Solution:
========
When the slave generates a format description event, mark the
event as a relay log event so it does not update the
rli->last_master_timestamp variable."

That one line added in sql/log.cc can probably be included into custom 10.2.27 code you probably use to build your specific version.

Comment by Rick James [ 2022-04-10 ]

I first saw this bug back in MySQL 4.0. It was sporadic and soon went away without any manual action.

See also https://bugs.mysql.com/bug.php?id=72376

Generated at Thu Feb 08 08:26:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.