Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17490

MariaDB 10.3 slave fails to replicate from a MariaDB 10.2 master

    XMLWordPrintable

    Details

      Description

      We have a 10.2.18 master and 4 slaves with 10.2.18. Because of this bug that I've reported last week (https://jira.mariadb.org/browse/MDEV-17420) we tried to upgrade one of the replicas to 10.3.10 to see if the bug was still there in that version.

      After the upgrade, the slave began to sync binary log events from the master, but after a few seconds it stopped with the following errors:

      mysqld[15138]: 2018-10-17  9:22:28 156 [ERROR] Slave IO thread did not receive an expected Rows-log end-of-statement for event starting at log 'main.010755' position 89390527 whose last block was seen at log 'main.010755' position 89390527. The end-of-statement should have been delivered before the current one at log 'main.010755' position 89390610
      mysqld[15138]: 2018-10-17  9:22:28 156 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595

      We checked the binary log in the master and does not seem corrupted and the other 3 replicas are working ok (still with 10.2.18). Also the relay log on the slave seems ok.

      The slave is stopped with the following status:

      mainro [(none)]> show slave status\G
      *************************** 1. row ***************************
                 Master_Log_File: main.010755
             Read_Master_Log_Pos: 89411517
                  Relay_Log_File: relay-bin.000002
                   Relay_Log_Pos: 550
           Relay_Master_Log_File: main.010755
                Slave_IO_Running: No
               Slave_SQL_Running: Yes
             Exec_Master_Log_Pos: 89410432
                 Relay_Log_Space: 1657
                   Last_IO_Errno: 1595
                   Last_IO_Error: Relay log write failure: could not queue event from master
                  Last_SQL_Errno: 0
                  Last_SQL_Error: 
         Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
                Slave_DDL_Groups: 16763
      Slave_Non_Transactional_Groups: 7974
          Slave_Transactional_Groups: 107415

      We have already tried with RESET SLAVE, clearing relay logs and executing CHANGE MASTER TO again. But the result is the same.

      I've dumped the master binary log at that position and, interestingly, slave IO thread stops in a binlog block (master binlog format is MIXED).

      The master/slave set that is failing is the main database server for our organization. It has 1.2 TB of data with 38 databases, views, stored procedures, triggers and complex queries.

      In the same servers, we have 3 other small sets of master/slaves with the same combination (10.2.18 master and 10.3.10 slaves) and are working perfectly. Of course, these sets are smaller and simpler than the instance that is failing.

      Thanks

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Elkin Andrei Elkin
              Reporter:
              gomita Gabriel Gomiz
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated: