Status: Closed (View Workflow)
10.1.28, 10.0.38, 10.4.19
CentOS Linux release 7.4.1708 (Core)
Linux global-db 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Rarely, we have a weird issue happening. We have a master <-> master setup. The writes are going to only one of them. Every now and then, the replication lag on the master where we send writes jumps to millions of seconds (ex: 2879736). Then, after 1s, is back to 0. After all, there's no lag, as the writes and the binlog from the other master should be ignored, as this is the server where we write to.
Both servers have the same timezone and ntp running.
Replication is set with GTIDs, and parallel threads.
server_id = 101
sync_binlog = 1
binlog_format = ROW
replicate-same-server-id = 0
We have a bunch of monitoring scripts looking at that value and triggering actions and alerts and we want to know what can cause this and if there's anything we can do to avoid this in the future.