[MDEV-4475] Replication from MariaDB 10.0 to 5.5 does not work Created: 2013-05-03  Updated: 2013-05-25  Resolved: 2013-05-24

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.2
Fix Version/s: 10.0.3

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-26 Global transaction ID Closed

 Description   

Note1: I'm not a big believer in NM->OS replication, but I think it's not good that it doesn't start at all, and moreover it floods the error log rather than produce a clear fatal error and give up.
Note2: I've set the guilty version to 10.0.2 as i assume it's something to fix on master rather than on old slave; please adjust if it's wrong.

I start 5.5 server with

--server-id=1 --port=3306 --log-slave-updates --log-bin

and 10.0 server with

--server-id=2 --port=3307 --log-slave-updates --log-bin

then I execute on 5.5 server

MariaDB [test]> change master to master_port=3307, master_host='127.0.0.1', master_user='root';
Query OK, 0 rows affected (0.74 sec)
 
MariaDB [test]> start slave;
Query OK, 0 rows affected (0.01 sec)

Slave status shows everything is okay (it's on the current trees built from source; on 5.5.30 release build + 10.0.2 release builds I saw IO thread staying in 'Connecting' state):

*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: ubuntu12-04-bin.000001
          Read_Master_Log_Pos: 248
               Relay_Log_File: ubuntu12-04-relay-bin.000027
                Relay_Log_Pos: 294
        Relay_Master_Log_File: ubuntu12-04-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 248
              Relay_Log_Space: 1480
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 60
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
1 row in set (0.19 sec)

But note how the relay logs rotates in a rapid fashion, while the master log position remains the same:

*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: ubuntu12-04-bin.000001
          Read_Master_Log_Pos: 248
               Relay_Log_File: ubuntu12-04-relay-bin.000031
                Relay_Log_Pos: 294
        Relay_Master_Log_File: ubuntu12-04-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 248
              Relay_Log_Space: 1480
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 63
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
1 row in set (0.05 sec)

and the error log keeps saying

...
130503 17:00:46 [Note] next log './ubuntu12-04-relay-bin.000062' is not active
130503 17:00:46 [ERROR] Error reading packet from server: Failed to replace binlog checkpoint or gtid list event with dummy: too small event. ( server_errno=1105)
130503 17:00:46 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'ubuntu12-04-bin.000001' at position 248
130503 17:00:47 [Note] next log './ubuntu12-04-relay-bin.000063' is not active
....

bzr version-info

revision-id: knielsen@knielsen-hq.org-20130503092729-gedp152b19k5wdnj
revno: 3626
branch-nick: 10.0-base

revision-id: psergey@askmonty.org-20130502201043-q7wgvntcpf2zjx9f
revno: 3740
branch-nick: 5.5



 Comments   
Comment by Kristian Nielsen [ 2013-05-03 ]

> [ERROR] Error reading packet from server: Failed to replace binlog checkpoint or gtid list event with dummy: too small event. ( server_errno=1105)

Ouch. Right, I see, thanks for catching this. I will try to fix it next week,
should not be hard I think.

Comment by Kristian Nielsen [ 2013-05-24 ]

The empty Gtid_list event which appears in the very first binlog, before any GTID events are logged, was too short (4 bytes in body) for there to be a possible dummy event to send instead to old servers.

The fix is to pad such event with two extra bytes.

Fix pushed to 10.0-base.

Generated at Thu Feb 08 06:56:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.