Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12290

Wrong timestamps in binary log causes replication issues

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.2.4, 10.0(EOL), 10.1(EOL), 10.2(EOL)
    • 10.2.6
    • Replication

    Description

      I have configured a MariaDB 10.2.4 GTID Slave with master_delay=3600, and it seems to cause replication to lag behind more and more. Waits trigger even though Seconds_Behind_Master is > 3600.

              Seconds_Behind_Master: 21476
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 3
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: No
                        Gtid_IO_Pos: 
            Replicate_Do_Domain_Ids: 
        Replicate_Ignore_Domain_Ids: 
                      Parallel_Mode: conservative
                          SQL_Delay: 3600
                SQL_Remaining_Delay: 3598
      

      Attachments

        1. patch
          0.6 kB
        2. patch
          0.4 kB

        Activity

          tanj Guillaume Lefranc created issue -

          Could you please paste or attach a few consequent slave status outputs, to see the dynamics, including Read_Master_Log_Pos and Exec_Master_Log_Pos (or just the full output).
          Please also attach cnf file(s).

          elenst Elena Stepanova added a comment - Could you please paste or attach a few consequent slave status outputs, to see the dynamics, including Read_Master_Log_Pos and Exec_Master_Log_Pos (or just the full output). Please also attach cnf file(s).
          elenst Elena Stepanova made changes -
          Field Original Value New Value
          Labels need_feedback

          The cnf files are unaltered defaults.

          I forgot to mention that the master is a 10.1 MariaDB, but the manual doesn't mention incompatibility.

          Will try to gather more info/write a reproduceable test case.

          tanj Guillaume Lefranc added a comment - The cnf files are unaltered defaults. I forgot to mention that the master is a 10.1 MariaDB, but the manual doesn't mention incompatibility. Will try to gather more info/write a reproduceable test case.

          In the example below, even I am really far behind the master, SQL_Delay still triggers.

                            Master_Host: 10.0.5.13
                            Master_User: replicator
                            Master_Port: 3306
                          Connect_Retry: 60
                        Master_Log_File: mariadb-bin.000246
                    Read_Master_Log_Pos: 40136843
                         Relay_Log_File: mysqld-relay-bin.000007
                          Relay_Log_Pos: 10794
                  Relay_Master_Log_File: mariadb-bin.000122
                       Slave_IO_Running: Yes
                      Slave_SQL_Running: Yes
                        Replicate_Do_DB: 
                    Replicate_Ignore_DB: 
                     Replicate_Do_Table: 
                 Replicate_Ignore_Table: 
                Replicate_Wild_Do_Table: 
            Replicate_Wild_Ignore_Table: 
                             Last_Errno: 0
                             Last_Error: 
                           Skip_Counter: 0
                    Exec_Master_Log_Pos: 10497
                        Relay_Log_Space: 13043090327
                        Until_Condition: None
                         Until_Log_File: 
                          Until_Log_Pos: 0
                     Master_SSL_Allowed: No
                     Master_SSL_CA_File: 
                     Master_SSL_CA_Path: 
                        Master_SSL_Cert: 
                      Master_SSL_Cipher: 
                         Master_SSL_Key: 
                  Seconds_Behind_Master: 289621
          Master_SSL_Verify_Server_Cert: No
                          Last_IO_Errno: 0
                          Last_IO_Error: 
                         Last_SQL_Errno: 0
                         Last_SQL_Error: 
            Replicate_Ignore_Server_Ids: 
                       Master_Server_Id: 3
                         Master_SSL_Crl: 
                     Master_SSL_Crlpath: 
                             Using_Gtid: No
                            Gtid_IO_Pos: 
                Replicate_Do_Domain_Ids: 
            Replicate_Ignore_Domain_Ids: 
                          Parallel_Mode: conservative
                              SQL_Delay: 10
                    SQL_Remaining_Delay: 6
                Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event
          

          tanj Guillaume Lefranc added a comment - In the example below, even I am really far behind the master, SQL_Delay still triggers. Master_Host: 10.0.5.13 Master_User: replicator Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mariadb-bin.000246 Read_Master_Log_Pos: 40136843 Relay_Log_File: mysqld-relay-bin.000007 Relay_Log_Pos: 10794 Relay_Master_Log_File: mariadb-bin.000122 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 10497 Relay_Log_Space: 13043090327 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 289621 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 3 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: No Gtid_IO_Pos: Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: conservative SQL_Delay: 10 SQL_Remaining_Delay: 6 Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event
          tanj Guillaume Lefranc added a comment - - edited

          OK, I think I've got something.

          If I look at the binary log entry where the delay is triggered (this entry is one of the first after binlog rotation) here is what I get:

          /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
          /*!40019 SET @@session.max_insert_delayed_threads=0*/;
          /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
          DELIMITER /*!*/;
          # at 4
          #170317  7:34:44 server id 3  end_log_pos 249   Start: binlog v 4, server v 10.1.21-MariaDB-1~jessie created 170317  7:34:44
          BINLOG '
          hIPLWA8DAAAA9QAAAPkAAAAAAAQAMTAuMS4yMS1NYXJpYURCLTF+amVzc2llAGxvZwAAAAAAAAAA
          AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAA3QAEGggAAAAICAgCAAAACgoKAAAAAAAA
          AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                                                                                                                               AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
          AAAAAAAAAAAEEwQAAKnQd0I=
          '/*!*/;
          *# at 10497
          #700101  1:00:00 server id 3  end_log_pos 10538         Binlog checkpoint mariadb-bin.000122*
          # at 10538
          #170317  7:34:45 server id 0  end_log_pos 10576         GTID 1-0-8845285 trans
          /*!100101 SET @@session.skip_parallel_replication=0*//*!*/;
          /*!100001 SET @@session.gtid_domain_id=1*//*!*/;
          /*!100001 SET @@session.server_id=0*//*!*/;
          /*!100001 SET @@session.gtid_seq_no=8845285*//*!*/;
          BEGIN
          /*!*/;
          # at 10576 
          

          The timestamp of binlog entry 10497 seems completely wrong at "700101 1:00:00" (Unix time anyone?)

          So, at each binlog rotation and/or binlog checkpoint, my slave gets an additional N seconds of delay...

          tanj Guillaume Lefranc added a comment - - edited OK, I think I've got something. If I look at the binary log entry where the delay is triggered (this entry is one of the first after binlog rotation) here is what I get: /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #170317 7:34:44 server id 3 end_log_pos 249 Start: binlog v 4, server v 10.1.21-MariaDB-1~jessie created 170317 7:34:44 BINLOG ' hIPLWA8DAAAA9QAAAPkAAAAAAAQAMTAuMS4yMS1NYXJpYURCLTF+amVzc2llAGxvZwAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAA3QAEGggAAAAICAgCAAAACgoKAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAEEwQAAKnQd0I= '/*!*/; *# at 10497 #700101 1:00:00 server id 3 end_log_pos 10538 Binlog checkpoint mariadb-bin.000122* # at 10538 #170317 7:34:45 server id 0 end_log_pos 10576 GTID 1-0-8845285 trans /*!100101 SET @@session.skip_parallel_replication=0*//*!*/; /*!100001 SET @@session.gtid_domain_id=1*//*!*/; /*!100001 SET @@session.server_id=0*//*!*/; /*!100001 SET @@session.gtid_seq_no=8845285*//*!*/; BEGIN /*!*/; # at 10576 The timestamp of binlog entry 10497 seems completely wrong at "700101 1:00:00" (Unix time anyone?) So, at each binlog rotation and/or binlog checkpoint, my slave gets an additional N seconds of delay...
          elenst Elena Stepanova made changes -
          Labels need_feedback

          Before you ask, the time of my servers is in sync.

          tanj Guillaume Lefranc added a comment - Before you ask, the time of my servers is in sync.
          elenst Elena Stepanova made changes -
          Status Open [ 1 ] Confirmed [ 10101 ]

          Great catch, thanks.

          plinux, so, the problem is two-fold. The timestamp obviously needs to be fixed in all 10.x versions; but also, we need to somehow make sure that 10.2 is able to replicate even from older 10.0/10.1 versions, before the timestamp fix.

          elenst Elena Stepanova added a comment - Great catch, thanks. plinux , so, the problem is two-fold. The timestamp obviously needs to be fixed in all 10.x versions; but also, we need to somehow make sure that 10.2 is able to replicate even from older 10.0/10.1 versions, before the timestamp fix.
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0 [ 16000 ]
          Fix Version/s 10.1 [ 16100 ]
          Fix Version/s 10.2 [ 14601 ]
          Affects Version/s 10.0 [ 16000 ]
          Affects Version/s 10.1 [ 16100 ]
          Affects Version/s 10.2 [ 14601 ]
          Assignee Lixun Peng [ plinux ]
          tanj Guillaume Lefranc made changes -
          Summary CHANGE MASTER TO ... MASTER_DELAY causes the slave to lag behind indefinitely Wrong timestamps in binary log causes replication issues
          elenst Elena Stepanova made changes -
          Labels 10.2-ga
          sachin.setiya.007 Sachin Setiya (Inactive) made changes -
          Assignee Lixun Peng [ plinux ] Sachin Setiya [ sachin.setiya.007 ]
          elenst Elena Stepanova made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Sprint 10.2.6 [ 158 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Rank Ranked lower
          ratzpo Rasmus Johansson (Inactive) made changes -
          Sprint 10.2.6 [ 158 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Rank Ranked higher
          sachin.setiya.007 Sachin Setiya (Inactive) made changes -
          Attachment patch [ 43603 ]

          Hi tanj
          I have created a patch for this issue which solves this wrong timestamp issue, This is because when binlog_background thread logs BInlog_checkpoint_log_event time is not set in the thread so , Binlog_checkpoint_log_event->when is zero.
          But I am not able to simulate seconds behind master case. Can you please provide some reproducible test case ?
          I tried this
          master - slave configuration
          both use binlog_format = row
          I have used gtid
          sql_delay=60

          And run lacs of (insert )query on server 1 , with after every 10 insert there was flush logs

          patch

          sachin.setiya.007 Sachin Setiya (Inactive) added a comment - Hi tanj I have created a patch for this issue which solves this wrong timestamp issue, This is because when binlog_background thread logs BInlog_checkpoint_log_event time is not set in the thread so , Binlog_checkpoint_log_event->when is zero. But I am not able to simulate seconds behind master case. Can you please provide some reproducible test case ? I tried this master - slave configuration both use binlog_format = row I have used gtid sql_delay=60 And run lacs of (insert )query on server 1 , with after every 10 insert there was flush logs patch

          Doesn't the set_time() need to go inside the loop, before the call to
          mysql_bin_log.mark_xid_done()? Otherwise the time will always be the time
          the binlog thread started, which is not correct - it needs to be the current
          time.

          But nice catch, good to get this fixed.

          knielsen Kristian Nielsen added a comment - Doesn't the set_time() need to go inside the loop, before the call to mysql_bin_log.mark_xid_done()? Otherwise the time will always be the time the binlog thread started, which is not correct - it needs to be the current time. But nice catch, good to get this fixed.
          sachin.setiya.007 Sachin Setiya (Inactive) made changes -
          Attachment patch [ 43606 ]
          sachin.setiya.007 Sachin Setiya (Inactive) added a comment - - edited

          Hi knielsen,

          Patch updated
          Regards
          sachin patch

          sachin.setiya.007 Sachin Setiya (Inactive) added a comment - - edited Hi knielsen , Patch updated Regards sachin patch
          sachin.setiya.007 Sachin Setiya (Inactive) made changes -
          Fix Version/s 10.2.6 [ 22527 ]
          Fix Version/s 10.2 [ 14601 ]
          Fix Version/s 10.0 [ 16000 ]
          Fix Version/s 10.1 [ 16100 ]
          Resolution Fixed [ 1 ]
          Status Confirmed [ 10101 ] Closed [ 6 ]

          I am experiencing this issue when attaching a 10.2.8 Slave to a 10.0.24 Master. At every #700101 Binlog checkpoint event, the SQL Delay counter is reapplied and the Slave is continually delayed.

          master# mysql -e 'select @@version;'
          -----------------

          @@version

          -----------------

          10.0.24-MariaDB

          -----------------

          master# mysqlbinlog bin-log.023901 | grep ^#700
          #700101 2:00:00 server id 123 end_log_pos 423 Binlog checkpoint bin-log.023901

          slave# mysql -e 'select @@version;'
          --------------------

          @@version

          --------------------

          10.2.8-MariaDB-log

          --------------------

          Seconds_Behind_Master: 2170
          SQL_Delay: 900
          SQL_Remaining_Delay: 825
          Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event

          Even if the Master is upgraded to 10.0.32 (latest), the bogus timestamps will still be present - as confirmed in a sandbox:

          1. ../sandbox.sh mysql -e 'select @@version;'
            -----------------
            @@version

            -----------------

            10.0.32-MariaDB

            -----------------

          1. mysqlbinlog bin-log.00000* | grep ^#700
            #700101 2:00:00 server id 10032 end_log_pos 347 Binlog checkpoint bin-log.000002
            #700101 2:00:00 server id 10032 end_log_pos 347 Binlog checkpoint bin-log.000003
          zam1081 Marcus Zambrano added a comment - I am experiencing this issue when attaching a 10.2.8 Slave to a 10.0.24 Master. At every #700101 Binlog checkpoint event, the SQL Delay counter is reapplied and the Slave is continually delayed. master# mysql -e 'select @@version;' ----------------- @@version ----------------- 10.0.24-MariaDB ----------------- master# mysqlbinlog bin-log.023901 | grep ^#700 #700101 2:00:00 server id 123 end_log_pos 423 Binlog checkpoint bin-log.023901 slave# mysql -e 'select @@version;' -------------------- @@version -------------------- 10.2.8-MariaDB-log -------------------- Seconds_Behind_Master: 2170 SQL_Delay: 900 SQL_Remaining_Delay: 825 Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event Even if the Master is upgraded to 10.0.32 (latest), the bogus timestamps will still be present - as confirmed in a sandbox: ../sandbox.sh mysql -e 'select @@version;' ----------------- @@version ----------------- 10.0.32-MariaDB ----------------- mysqlbinlog bin-log.00000* | grep ^#700 #700101 2:00:00 server id 10032 end_log_pos 347 Binlog checkpoint bin-log.000002 #700101 2:00:00 server id 10032 end_log_pos 347 Binlog checkpoint bin-log.000003

          Hi zam1081!

          This patched was pushed only to 10.2, I have back-ported this patch to 10.0 , by this merge this will also be available to 10.1.

          • Sachin
          sachin.setiya.007 Sachin Setiya (Inactive) added a comment - Hi zam1081 ! This patched was pushed only to 10.2, I have back-ported this patch to 10.0 , by this merge this will also be available to 10.1. Sachin
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 80017 ] MariaDB v4 [ 151820 ]

          People

            sachin.setiya.007 Sachin Setiya (Inactive)
            tanj Guillaume Lefranc
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.