[MDEV-12290] Wrong timestamps in binary log causes replication issues Created: 2017-03-17  Updated: 2017-09-21  Resolved: 2017-05-09

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0, 10.1, 10.2.4, 10.2
Fix Version/s: 10.2.6

Type: Bug Priority: Critical
Reporter: Guillaume Lefranc Assignee: Sachin Setiya (Inactive)
Resolution: Fixed Votes: 1
Labels: 10.2-ga

Attachments: HTML File patch     HTML File patch    

 Description   

I have configured a MariaDB 10.2.4 GTID Slave with master_delay=3600, and it seems to cause replication to lag behind more and more. Waits trigger even though Seconds_Behind_Master is > 3600.

        Seconds_Behind_Master: 21476
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
                  Gtid_IO_Pos: 
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
                    SQL_Delay: 3600
          SQL_Remaining_Delay: 3598



 Comments   
Comment by Elena Stepanova [ 2017-03-18 ]

Could you please paste or attach a few consequent slave status outputs, to see the dynamics, including Read_Master_Log_Pos and Exec_Master_Log_Pos (or just the full output).
Please also attach cnf file(s).

Comment by Guillaume Lefranc [ 2017-03-20 ]

The cnf files are unaltered defaults.

I forgot to mention that the master is a 10.1 MariaDB, but the manual doesn't mention incompatibility.

Will try to gather more info/write a reproduceable test case.

Comment by Guillaume Lefranc [ 2017-03-20 ]

In the example below, even I am really far behind the master, SQL_Delay still triggers.

                  Master_Host: 10.0.5.13
                  Master_User: replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000246
          Read_Master_Log_Pos: 40136843
               Relay_Log_File: mysqld-relay-bin.000007
                Relay_Log_Pos: 10794
        Relay_Master_Log_File: mariadb-bin.000122
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 10497
              Relay_Log_Space: 13043090327
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 289621
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
                  Gtid_IO_Pos: 
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
                    SQL_Delay: 10
          SQL_Remaining_Delay: 6
      Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event

Comment by Guillaume Lefranc [ 2017-03-20 ]

OK, I think I've got something.

If I look at the binary log entry where the delay is triggered (this entry is one of the first after binlog rotation) here is what I get:

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#170317  7:34:44 server id 3  end_log_pos 249   Start: binlog v 4, server v 10.1.21-MariaDB-1~jessie created 170317  7:34:44
BINLOG '
hIPLWA8DAAAA9QAAAPkAAAAAAAQAMTAuMS4yMS1NYXJpYURCLTF+amVzc2llAGxvZwAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAA3QAEGggAAAAICAgCAAAACgoKAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                                                                                                                               AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAEEwQAAKnQd0I=
'/*!*/;
*# at 10497
#700101  1:00:00 server id 3  end_log_pos 10538         Binlog checkpoint mariadb-bin.000122*
# at 10538
#170317  7:34:45 server id 0  end_log_pos 10576         GTID 1-0-8845285 trans
/*!100101 SET @@session.skip_parallel_replication=0*//*!*/;
/*!100001 SET @@session.gtid_domain_id=1*//*!*/;
/*!100001 SET @@session.server_id=0*//*!*/;
/*!100001 SET @@session.gtid_seq_no=8845285*//*!*/;
BEGIN
/*!*/;
# at 10576 

The timestamp of binlog entry 10497 seems completely wrong at "700101 1:00:00" (Unix time anyone?)

So, at each binlog rotation and/or binlog checkpoint, my slave gets an additional N seconds of delay...

Comment by Guillaume Lefranc [ 2017-03-20 ]

Before you ask, the time of my servers is in sync.

Comment by Elena Stepanova [ 2017-03-20 ]

Great catch, thanks.

plinux, so, the problem is two-fold. The timestamp obviously needs to be fixed in all 10.x versions; but also, we need to somehow make sure that 10.2 is able to replicate even from older 10.0/10.1 versions, before the timestamp fix.

Comment by Sachin Setiya (Inactive) [ 2017-05-02 ]

Hi tanj
I have created a patch for this issue which solves this wrong timestamp issue, This is because when binlog_background thread logs BInlog_checkpoint_log_event time is not set in the thread so , Binlog_checkpoint_log_event->when is zero.
But I am not able to simulate seconds behind master case. Can you please provide some reproducible test case ?
I tried this
master - slave configuration
both use binlog_format = row
I have used gtid
sql_delay=60

And run lacs of (insert )query on server 1 , with after every 10 insert there was flush logs

patch

Comment by Kristian Nielsen [ 2017-05-02 ]

Doesn't the set_time() need to go inside the loop, before the call to
mysql_bin_log.mark_xid_done()? Otherwise the time will always be the time
the binlog thread started, which is not correct - it needs to be the current
time.

But nice catch, good to get this fixed.

Comment by Sachin Setiya (Inactive) [ 2017-05-02 ]

Hi knielsen,

Patch updated
Regards
sachin patch

Comment by Marcus Zambrano [ 2017-09-20 ]

I am experiencing this issue when attaching a 10.2.8 Slave to a 10.0.24 Master. At every #700101 Binlog checkpoint event, the SQL Delay counter is reapplied and the Slave is continually delayed.

master# mysql -e 'select @@version;'
-----------------

@@version

-----------------

10.0.24-MariaDB

-----------------

master# mysqlbinlog bin-log.023901 | grep ^#700
#700101 2:00:00 server id 123 end_log_pos 423 Binlog checkpoint bin-log.023901

slave# mysql -e 'select @@version;'
--------------------

@@version

--------------------

10.2.8-MariaDB-log

--------------------

Seconds_Behind_Master: 2170
SQL_Delay: 900
SQL_Remaining_Delay: 825
Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event

Even if the Master is upgraded to 10.0.32 (latest), the bogus timestamps will still be present - as confirmed in a sandbox:

  1. ../sandbox.sh mysql -e 'select @@version;'
    -----------------
    @@version

    -----------------

    10.0.32-MariaDB

    -----------------

  1. mysqlbinlog bin-log.00000* | grep ^#700
    #700101 2:00:00 server id 10032 end_log_pos 347 Binlog checkpoint bin-log.000002
    #700101 2:00:00 server id 10032 end_log_pos 347 Binlog checkpoint bin-log.000003
Comment by Sachin Setiya (Inactive) [ 2017-09-21 ]

Hi zam1081!

This patched was pushed only to 10.2, I have back-ported this patch to 10.0 , by this merge this will also be available to 10.1.

  • Sachin
Generated at Thu Feb 08 07:56:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.