[MDEV-8507] gtid_ignore_duplicates doesn't work with log_slave_updates and multi-source replication Created: 2015-07-20  Updated: 2015-08-01  Resolved: 2015-08-01

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.20, 10.1.5
Fix Version/s: 10.0.21, 10.1.6

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Kristian Nielsen
Resolution: Fixed Votes: 2
Labels: replication

Issue Links:
Relates
relates to MDEV-8354 out-of-order error with --gtid-ignore... Closed

 Description   

When log_slave_updates is enabled, gtid_ignore_duplicates doesn't appear to work with multi-source replication.

When using a circular replication topology, this means some transactions may be executed twice.

Let's say we have a topology like this using multi-source replication:

master101 <=> master102 <=> master105 <=> master101

master101 and master102 are running MariaDB 10.1.5.

master105 is running MariaDB 10.0.20.

Let's insert some data on master101:

MariaDB [(none)]> INSERT INTO db1.test_table (str) VALUES ('str8');
Query OK, 1 row affected (0.02 sec)

Now lets see if the slaves are functional.

master101 looks OK:

MariaDB [(none)]> show slave 'master102' status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.102
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: localhost-bin.000002
          Read_Master_Log_Pos: 485
               Relay_Log_File: localhost-relay-bin-master102.000002
                Relay_Log_Pos: 621
        Relay_Master_Log_File: localhost-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 485
              Relay_Log_Space: 932
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 1-1-10,3-3-8
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
1 row in set (0.00 sec)
 
MariaDB [(none)]> show slave 'master105' status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.105
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: localhost-bin.000004
          Read_Master_Log_Pos: 501
               Relay_Log_File: localhost-relay-bin-master105.000002
                Relay_Log_Pos: 637
        Relay_Master_Log_File: localhost-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 501
              Relay_Log_Space: 948
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-1-10,3-3-8
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
1 row in set (0.00 sec)

master102 looks OK too:

MariaDB [(none)]> show slave 'master101' status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.101
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: localhost-bin.000002
          Read_Master_Log_Pos: 485
               Relay_Log_File: localhost-relay-bin-master101.000002
                Relay_Log_Pos: 776
        Relay_Master_Log_File: localhost-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 485
              Relay_Log_Space: 1087
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 1-1-10,3-3-8
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
1 row in set (0.00 sec)
 
MariaDB [(none)]> show slave 'master105' status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.105
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: localhost-bin.000004
          Read_Master_Log_Pos: 501
               Relay_Log_File: localhost-relay-bin-master105.000002
                Relay_Log_Pos: 792
        Relay_Master_Log_File: localhost-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 501
              Relay_Log_Space: 1103
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-1-10,3-3-8
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
1 row in set (0.00 sec)

master105 got a duplicate key error from master102:

MariaDB [(none)]> show slave 'master101' status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.101
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: localhost-bin.000002
          Read_Master_Log_Pos: 485
               Relay_Log_File: localhost-relay-bin-master101.000002
                Relay_Log_Pos: 776
        Relay_Master_Log_File: localhost-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 485
              Relay_Log_Space: 1087
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-1-10,3-3-8
1 row in set (0.00 sec)
 
MariaDB [(none)]> show slave 'master102' status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.102
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: localhost-bin.000002
          Read_Master_Log_Pos: 485
               Relay_Log_File: localhost-relay-bin-master102.000002
                Relay_Log_Pos: 621
        Relay_Master_Log_File: localhost-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows_v1 event on table db1.test_table; Duplicate entry '8' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log localhost-bin.000002, end_log_pos 458
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 330
              Relay_Log_Space: 1087
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows_v1 event on table db1.test_table; Duplicate entry '8' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log localhost-bin.000002, end_log_pos 458
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-1-10,3-3-8
1 row in set (0.00 sec)

Looking at the relay log on master105 from master101, we see:

#150720 13:29:57 server id 1  end_log_pos 368   GTID 1-1-10
/*!100001 SET @@session.gtid_domain_id=1*//*!*/;
/*!100001 SET @@session.server_id=1*//*!*/;
/*!100001 SET @@session.gtid_seq_no=10*//*!*/;
BEGIN
/*!*/;
# at 659
# at 710
#150720 13:29:57 server id 1  end_log_pos 419   Table_map: `db1`.`test_table` mapped to number 18
#150720 13:29:57 server id 1  end_log_pos 458   Write_rows: table id 18 flags: STMT_END_F
### INSERT INTO `db1`.`test_table`
### SET
###   @1=8
###   @2='str8'
# at 749
#150720 13:29:57 server id 1  end_log_pos 485   Xid = 39
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Looking at the relay log on master105 from master102, we see:

#150720 13:29:57 server id 1  end_log_pos 368   GTID 1-1-10
/*!100001 SET @@session.gtid_domain_id=1*//*!*/;
/*!100001 SET @@session.server_id=1*//*!*/;
/*!100001 SET @@session.gtid_seq_no=10*//*!*/;
BEGIN
/*!*/;
# at 659
# at 710
#150720 13:29:57 server id 1  end_log_pos 419   Table_map: `db1`.`test_table` mapped to number 18
#150720 13:29:57 server id 1  end_log_pos 458   Write_rows: table id 18 flags: STMT_END_F
### INSERT INTO `db1`.`test_table`
### SET
###   @1=8
###   @2='str8'
# at 749
#150720 13:29:57 server id 1  end_log_pos 485   Xid = 32
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

The GTID information appears to be identical between the relay logs. However, MariaDB is executing both transactions, rather than ignoring one of them.



 Comments   
Comment by Elena Stepanova [ 2015-07-21 ]

I am not getting the error. Do you observe it all the time, or was it a one-time thing?
Can you provide a full clean set of binlogs and general logs (create table + insert into the table) so that I could see what's different on my side?
And please attach the cnf files, maybe there is something in there that I don't take into account.
Thanks.

Comment by Thierry Laurier [ 2015-07-22 ]

it's my initial bug testing with 10.0.17 and 10.1.5 reported to mariadb support.

I'm observe on randomize insert with watch command.
Stop all slave/start all slave temporary resolve a error.

3 server mariadb 10.1.5 datamod01, datamod02, datamod03

root@datamod01 12:24:10 [test]> select @@global.gtid_domain_id,@@global.server_id,@@global.gtid_ignore_duplicates \G
        @@global.gtid_domain_id: 1
             @@global.server_id: 101
@@global.gtid_ignore_duplicates: 1
 
root@datamod02 12:03:11 [test]> select @@global.gtid_domain_id,@@global.server_id,@@global.gtid_ignore_duplicates \G
        @@global.gtid_domain_id: 2
             @@global.server_id: 102
@@global.gtid_ignore_duplicates: 1
 
root@datamod03 12:03:40 [test]> select @@global.gtid_domain_id,@@global.server_id,@@global.gtid_ignore_duplicates \G
        @@global.gtid_domain_id: 3
             @@global.server_id: 103
@@global.gtid_ignore_duplicates: 1

root@datamod01 12:24:27 [test]>  show slave hosts;
+-----------+----------+------+-----------+
| Server_id | Host     | Port | Master_id |
+-----------+----------+------+-----------+
|       103 | sqlmod03 | 3306 |       101 |
|       102 | sqlmod02 | 3306 |       101 |
+-----------+----------+------+-----------+
 
root@datamod02 12:24:48 [test]>  show slave hosts;
+-----------+----------+------+-----------+
| Server_id | Host     | Port | Master_id |
+-----------+----------+------+-----------+
|       103 | sqlmod03 | 3306 |       102 |
|       101 | sqlmod01 | 3306 |       102 |
+-----------+----------+------+-----------+
 
root@datamod03 12:25:31 [test]> show slave hosts;
+-----------+----------+------+-----------+
| Server_id | Host     | Port | Master_id |
+-----------+----------+------+-----------+
|       101 | sqlmod01 | 3306 |       103 |
|       102 | sqlmod02 | 3306 |       103 |
+-----------+----------+------+-----------+

root@datamod01 13:21:49 [test]> stop all slaves;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
 
root@datamod01 13:21:51 [test]> flush slave 'slave-of-sqlmod02';
Query OK, 0 rows affected (0.00 sec)
 
root@datamod01 13:22:00 [test]> flush slave 'slave-of-sqlmod03';
Query OK, 0 rows affected (0.00 sec)
 
root@datamod01 13:28:18 [test]> flush master;
Query OK, 0 rows affected (0.00 sec)
 
root@datamod01 13:29:20 [test]> change master 'slave-of-sqlmod02' to master_use_gtid=slave_pos ;
Query OK, 0 rows affected (0.00 sec)
root@datamod01 13:30:13 [test]> change master 'slave-of-sqlmod03' to master_use_gtid=slave_pos ;
Query OK, 0 rows affected (0.01 sec)
 
root@datamod01 13:31:52 [test]> start all slaves;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

root@datamod02 12:25:48 [test]>  stop all slaves;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
 
root@datamod02 13:22:23 [test]>  flush slave 'slave-of-sqlmod01';
Query OK, 0 rows affected (0.00 sec)
 
root@datamod02 13:22:40 [test]>  flush slave 'slave-of-sqlmod03';
Query OK, 0 rows affected (0.00 sec)
 
root@datamod02 13:26:42 [test]> flush master;
Query OK, 0 rows affected (0.00 sec)
 
root@datamod02 13:28:36 [test]> change master 'slave-of-sqlmod01' to master_use_gtid=slave_pos ;
Query OK, 0 rows affected (0.00 sec)
 
root@datamod02 13:32:30 [test]> change master 'slave-of-sqlmod03' to master_use_gtid=slave_pos ;
Query OK, 0 rows affected (0.00 sec)
 
root@datamod02 13:32:34 [test]> start all slaves;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

root@datamod03 12:29:05 [test]>  stop all slaves;
Query OK, 0 rows affected, 2 warnings (0.00 sec)
 
root@datamod03 13:22:25 [test]>  flush slave 'slave-of-sqlmod01';
Query OK, 0 rows affected (0.00 sec)
 
root@datamod03 13:22:53 [test]>  flush slave 'slave-of-sqlmod02';
Query OK, 0 rows affected (0.00 sec)
 
root@datamod03 13:26:44 [test]> flush master;
Query OK, 0 rows affected (0.01 sec)
 
root@datamod03 13:28:37 [test]> change master 'slave-of-sqlmod01' to master_use_gtid=slave_pos ;
Query OK, 0 rows affected (0.01 sec)
 
root@datamod03 13:33:11 [test]> change master 'slave-of-sqlmod02' to master_use_gtid=slave_pos ;
Query OK, 0 rows affected (0.00 sec)
 
root@datamod03 13:33:16 [test]> start all slaves;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

On all server :

show create table start_dump\G
       Table: start_dump
Create Table: CREATE TABLE `start_dump` (
  `ID` tinyint(3) unsigned NOT NULL DEFAULT '1',
  `DUMP_OK` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`ID`)
) ENGINE=MEMORY DEFAULT CHARSET=latin1 MAX_ROWS=1

Before test:

Every 1.0s: echo hostname:datamod01; mysql -uroot -e 'show all slaves status\G' |grep -E '(Connection_name:|Master_Lo|Last_.*_Error|Running|Gtid)'                                                                                                             

Mon Jul 20 14:16:46 2015
 
hostname:datamod01
              Connection_name: slave-of-sqlmod02
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 310
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 310
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-399,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-399,2-102-41,3-103-72
              Connection_name: slave-of-sqlmod03
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 310
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 310
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-399,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-399,2-102-41,3-103-72

Every 1.0s: echo hostname:datamod02; mysql -uroot -e 'show all slaves status\G' |grep -E '(Connection_name:|Master_Lo|Last_.*_Error|Running|Gtid)'                                                                                                             

Mon Jul 20 14:17:00 2015
 
hostname:datamod02
              Connection_name: slave-of-sqlmod01
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 310
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 310
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-399,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-399,2-102-41,3-103-72
              Connection_name: slave-of-sqlmod03
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 310
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 310
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-399,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-399,2-102-41,3-103-72

Every 1.0s: echo hostname:datamod03; mysql -uroot -e 'show all slaves status\G' |grep -E '(Connection_name:|Master_Lo|Last_.*_Error|Running|Gtid)'                                                                                                             Mon Jul 20 14:17:08 2015
 
hostname:datamod03
              Connection_name: slave-of-sqlmod01
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 310
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 310
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-399,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-399,2-102-41,3-103-72
              Connection_name: slave-of-sqlmod02
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 310
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 310
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-399,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-399,2-102-41,3-103-72

test:

root@datamod01:/srv/mariadb# watch -n 1 "mysql -uroot -e 'truncate table test.start_dump ; insert into test.start_dump set DUMP_OK=1;'"

after many truncate, insert:

Every 1.0s: echo hostname:datamod01; mysql -uroot -e 'show all slaves status\G' |grep -E '(Connection_name:|Master_Lo|Last_.*_Error|Running|Gtid)'                                                                                                             Mon Jul 20 16:13:25 2015

hostname:datamod01
              Connection_name: slave-of-sqlmod02
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 4968
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 4968
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-433,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-433,2-102-41,3-103-72
              Connection_name: slave-of-sqlmod03
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 4968
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 4968
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-433,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-433,2-102-41,3-103-72

Every 1.0s: echo hostname:datamod02; mysql -uroot -e 'show all slaves status\G' |grep -E '(Connection_name:|Master_Lo|Last_.*_Error|Running|Gtid)'                                                                                                             

Mon Jul 20 16:13:52 2015
 
hostname:datamod02
              Connection_name: slave-of-sqlmod01
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 68494
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
          Exec_Master_Log_Pos: 63351
                Last_IO_Error:
               Last_SQL_Error: Could not execute Write_rows_v1 event on table test.start_dump; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log bin-log.000001, end_log_pos 63469
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-433,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-433,2-102-41,3-103-72
              Connection_name: slave-of-sqlmod03
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 4968
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 4968
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-433,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-433,2-102-41,3-103-72

Every 1.0s: echo hostname:datamod03; mysql -uroot -e 'show all slaves status\G' |grep -E '(Connection_name:|Master_Lo|Last_.*_Error|Running|Gtid)'                                                                                                             Mon Jul 20 16:14:05 2015

hostname:datamod03
              Connection_name: slave-of-sqlmod01
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 68494
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 68494
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-433,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-433,2-102-41,3-103-72
              Connection_name: slave-of-sqlmod02
              Master_Log_File: bin-log.000001
          Read_Master_Log_Pos: 4968
        Relay_Master_Log_File: bin-log.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 4968
                Last_IO_Error:
               Last_SQL_Error:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-101-433,2-102-41,3-103-72
               Gtid_Slave_Pos: 1-101-433,2-102-41,3-103-72

show master status:

root@datamod01 16:25:22 [(none)]> show master status \G
            File: bin-log.000001
        Position: 68494
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)
 
root@datamod02 16:26:13 [(none)]> show master status \G
            File: bin-log.000001
        Position: 4968
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
 
root@datamod03 16:26:32 [(none)]> show master status \G
            File: bin-log.000001
        Position: 4968
    Binlog_Do_DB: 
Binlog_Ignore_DB: 

root@datamod02:/srv/mariadb# tail -4 ../data/sqlmod/log/mysqld.err 
2015-07-20 13:33:42 140478747219712 [Note] Master 'slave-of-sqlmod03': Slave I/O thread: connected to master 'rsqlmod@datamod03:3306',replication starts at GTID position '1-101-399,2-102-41,3-103-72'
2015-07-20 16:09:31 140478747518720 [ERROR] Master 'slave-of-sqlmod01': Slave SQL: Could not execute Write_rows_v1 event on table test.start_dump; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log bin-log.000001, end_log_pos 63469, Gtid 1-101-401, Internal MariaDB error code: 1062
2015-07-20 16:09:31 140478747518720 [Warning] Master 'slave-of-sqlmod01': Slave: Duplicate entry '1' for key 'PRIMARY' Error_code: 1062
2015-07-20 16:09:31 140478747518720 [ERROR] Master 'slave-of-sqlmod01': Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'bin-log.000001' position 63351; GTID position '1-101-400,2-102-41,3-103-72'

on datanode01:

mysqlbinlog bin-log.000001 --base64-output=DECODE-ROWS --verbose
 
# at 68311
#150720 16:09:48 server id 101  end_log_pos 68349 	GTID 1-101-433
/*!100001 SET @@session.gtid_seq_no=433*//*!*/;
BEGIN
/*!*/;
# at 68349
# at 68398
#150720 16:09:48 server id 101  end_log_pos 68398 	Table_map: `test`.`start_dump` mapped to number 439 (has triggers)
#150720 16:09:48 server id 101  end_log_pos 68429 	Write_rows: table id 439 flags: STMT_END_F
### INSERT INTO `test`.`start_dump`
### SET
###   @1=1
# at 68429
#150720 16:09:48 server id 101  end_log_pos 68494 	Query	thread_id=9361	exec_time=0	error_code=0
SET TIMESTAMP=1437401388/*!*/;
COMMIT
/*!*/;

on datanode02:

# at 4821
#150720 16:09:48 server id 101  end_log_pos 4859 	GTID 1-101-433 trans
/*!100001 SET @@session.gtid_seq_no=433*//*!*/;
BEGIN
/*!*/;
# at 4859
# at 4909
#150720 16:09:48 server id 101  end_log_pos 4909 	Table_map: `test`.`start_dump` mapped to number 44 (has triggers)
#150720 16:09:48 server id 101  end_log_pos 4941 	Write_rows: table id 44 flags: STMT_END_F
### INSERT INTO `test`.`start_dump`
### SET
###   @1=1
###   @2=0
# at 4941
#150720 16:09:48 server id 101  end_log_pos 4968 	Xid = 27517
COMMIT/*!*/;

on datanode03:

# at 4821
#150720 16:09:48 server id 101  end_log_pos 4859 	GTID 1-101-433 trans
/*!100001 SET @@session.gtid_seq_no=433*//*!*/;
BEGIN
/*!*/;
# at 4859
# at 4909
#150720 16:09:48 server id 101  end_log_pos 4909 	Table_map: `test`.`start_dump` mapped to number 48 (has triggers)
#150720 16:09:48 server id 101  end_log_pos 4941 	Write_rows: table id 48 flags: STMT_END_F
### INSERT INTO `test`.`start_dump`
### SET
###   @1=1
###   @2=0
# at 4941
#150720 16:09:48 server id 101  end_log_pos 4968 	Xid = 27120
COMMIT/*!*/;

on datanode02:

relay log for connection slave sqlnode01

 mysqlbinlog mysqld-relay-bin-slave@002dof@002dsqlmod01.000002 --base64-output=DECODE-ROWS --verbose
 
# at 68596
#150720 16:09:48 server id 101  end_log_pos 68349 	GTID 1-101-433
/*!100001 SET @@session.gtid_seq_no=433*//*!*/;
BEGIN
/*!*/;
# at 68634
# at 68683
#150720 16:09:48 server id 101  end_log_pos 68398 	Table_map: `test`.`start_dump` mapped to number 439 (has triggers)
#150720 16:09:48 server id 101  end_log_pos 68429 	Write_rows: table id 439 flags: STMT_END_F
### INSERT INTO `test`.`start_dump`
### SET
###   @1=1
# at 68714
#150720 16:09:48 server id 101  end_log_pos 68494 	Query	thread_id=9361	exec_time=0	error_code=0
SET TIMESTAMP=1437401388/*!*/;
COMMIT
/*!*/;

relay log for connection slave sqlnode03

mysqlbinlog mysqld-relay-bin-slave@002dof@002dsqlmod03.000002 --base64-output=DECODE-ROWS --verbose
 
# at 5106
#150720 16:09:48 server id 101  end_log_pos 4859 	GTID 1-101-433 trans
/*!100001 SET @@session.gtid_seq_no=433*//*!*/;
BEGIN
/*!*/;
# at 5144
# at 5194
#150720 16:09:48 server id 101  end_log_pos 4909 	Table_map: `test`.`start_dump` mapped to number 48 (has triggers)
#150720 16:09:48 server id 101  end_log_pos 4941 	Write_rows: table id 48 flags: STMT_END_F
### INSERT INTO `test`.`start_dump`
### SET
###   @1=1
###   @2=0
# at 5226
#150720 16:09:48 server id 101  end_log_pos 4968 	Xid = 27120
COMMIT/*!*/;

Comment by Elena Stepanova [ 2015-07-31 ]

Hi,

The event 1-101-433 is not interesting in itself. We need the set of binary/relay logs up to the moment when the problem happened, which, according to the error message, is 1-101-401. But better still, just attach the complete logs, don't extract anything from them (binary logs, relay logs, error logs). They should be tiny since the problem happens soon after flush.
Could you please provide those?

Also, not to spoil the experiment, it's better to be more careful about flush/start slaves. Since you have the all-directions replication, when you reset and start slaves on the first server, it might start picking up something from other slaves which are not reset yet, etc.

Comment by Elena Stepanova [ 2015-08-01 ]

GeoffMontee, thanks.
It turns out I could not reproduce it because I was trying to use the current tree, while it has already been fixed there in the scope of MDEV-8354. At least I can reproduce it easily enough before the fix – not in 100% cases, but the probability seems to be high, – and cannot reproduce afterwards. 10.0.21 has not been released yet, but 10.1.6 has, please try it (the version mix is not required to reproduce the problem).

Meanwhile, I'm closing it as fixed along with MDEV-8354. Please comment to re-open if you keep experiencing it after the upgrade.

Generated at Thu Feb 08 07:27:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.