[MDEV-7984] Galera doesn't ignore duplicates properly with GTID replication Created: 2015-04-14  Updated: 2017-12-25  Resolved: 2017-12-25

Status: Closed
Project: MariaDB Server
Component/s: wsrep
Affects Version/s: 10.0.17-galera
Fix Version/s: 10.0.34-galera

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Sachin Setiya (Inactive)
Resolution: Won't Fix Votes: 2
Labels: galera, replication

Issue Links:
Relates
relates to MDEV-20720 Galera: Replicate MariaDB GTID to oth... Closed

 Description   

When replicating between two Galera clusters using GTID replication, duplicate DDL does not get ignored, even with gtid_ignore_duplicates turned on, if each node has its own server_id.

Let's say we have 2 clusters, each with 3 nodes.

Cluster 1, node1:

MariaDB [(none)]> SHOW STATUS WHERE Variable_name IN('wsrep_connected', 'wsrep_cluster_status', 'wsrep_cluster_size', 'wsrep_ready');
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| wsrep_cluster_size   | 3       |
| wsrep_cluster_status | Primary |
| wsrep_connected      | ON      |
| wsrep_ready          | ON      |
+----------------------+---------+
4 rows in set (0.01 sec)

Cluster 2, node1:

MariaDB [(none)]> SHOW STATUS WHERE Variable_name IN('wsrep_connected', 'wsrep_cluster_status', 'wsrep_cluster_size', 'wsrep_ready');
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| wsrep_cluster_size   | 3       |
| wsrep_cluster_status | Primary |
| wsrep_connected      | ON      |
| wsrep_ready          | ON      |
+----------------------+---------+
4 rows in set (0.00 sec)

Lets view the current GTID positions of each.

Cluster 1, node 1:

MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
            File: mariadb-bin.000002
        Position: 787
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)
 
MariaDB [(none)]> SELECT BINLOG_GTID_POS("mariadb-bin.000002", 787);
+--------------------------------------------+
| BINLOG_GTID_POS("mariadb-bin.000002", 787) |
+--------------------------------------------+
| 21-1-3,22-5-1                              |
+--------------------------------------------+
1 row in set (0.00 sec)

Cluster 2, node 1:

MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
            File: mariadb-bin.000002
        Position: 551
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)
 
MariaDB [(none)]> SELECT BINLOG_GTID_POS("mariadb-bin.000002", 551);
+--------------------------------------------+
| BINLOG_GTID_POS("mariadb-bin.000002", 551) |
+--------------------------------------------+
| 21-1-3,22-5-1                              |
+--------------------------------------------+
1 row in set (0.00 sec)

Now let's set up replication between each cluster:

Cluster 1, node 1:

MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [(none)]> RESET SLAVE;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [(none)]> SET GLOBAL GTID_SLAVE_POS='21-1-3,22-5-1';
Query OK, 0 rows affected (0.11 sec)
 
MariaDB [(none)]> CHANGE MASTER TO master_host="192.168.1.55", master_use_gtid=current_pos, master_user='sst', master_password='sst';
Query OK, 0 rows affected (0.02 sec)
 
MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.55
                  Master_User: sst
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000002
          Read_Master_Log_Pos: 551
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 697
        Relay_Master_Log_File: mariadb-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 551
              Relay_Log_Space: 998
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 4
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 21-1-3,22-5-1
1 row in set (0.00 sec)

Cluster 2, node 1:

MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.03 sec)
 
MariaDB [(none)]> RESET SLAVE;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [(none)]> SET GLOBAL GTID_SLAVE_POS='21-1-3,22-5-1';
Query OK, 0 rows affected (0.03 sec)
 
MariaDB [(none)]> CHANGE MASTER TO master_host="192.168.1.52", master_use_gtid=current_pos, master_user='sst', master_password='sst';
Query OK, 0 rows affected (0.02 sec)
 
MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.52
                  Master_User: sst
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000002
          Read_Master_Log_Pos: 787
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 697
        Relay_Master_Log_File: mariadb-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 787
              Relay_Log_Space: 998
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 21-1-3,22-5-1
1 row in set (0.00 sec)

Now let's create a database on cluster 1.

Cluster 1, node 1:

MariaDB [(none)]> CREATE DATABASE db1;
Query OK, 1 row affected (0.02 sec)
 
MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| db1                |
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
5 rows in set (0.00 sec)

So far, everything looks good on cluster 2.

Cluster 2, node 1:

MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| db1                |
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
5 rows in set (0.00 sec)
 
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.52
                  Master_User: sst
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000002
          Read_Master_Log_Pos: 906
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 816
        Relay_Master_Log_File: mariadb-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 906
              Relay_Log_Space: 1117
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 21-1-4,22-5-1
1 row in set (0.00 sec)

However, now let's try dropping the database on one of the nodes in cluster 2:

Cluster 2, node 2:

MariaDB [(none)]> DROP DATABASE db1;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)

Everything looks good on cluster 1.

Cluster 1, node 1:

MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)
 
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.55
                  Master_User: sst
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000002
          Read_Master_Log_Pos: 787
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 814
        Relay_Master_Log_File: mariadb-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 787
              Relay_Log_Space: 1115
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 4
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 21-1-4,22-5-2
1 row in set (0.00 sec)

However, it seems to have broken the replication back to cluster 2:

Cluster 2, node 1:

MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)
 
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.52
                  Master_User: sst
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000002
          Read_Master_Log_Pos: 1023
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 816
        Relay_Master_Log_File: mariadb-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1008
                   Last_Error: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 906
              Relay_Log_Space: 1234
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1008
               Last_SQL_Error: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1'
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 21-1-4,22-5-2
1 row in set (0.00 sec)

The "drop database" command appears to have been executed twice: once via Galera, and once via GTID replication.

The error log for Cluster 2, node 1 shows:

150413 17:18:32 [ERROR] Slave SQL: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1', Gtid 22-5-2, Internal MariaDB error code: 1008
150413 17:18:32 [Warning] Slave: Can't drop database 'db1'; database doesn't exist Error_code: 1008
150413 17:18:32 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000002' position 906; GTID position '21-1-4,22-5-1'

The error logs for other nodes in Cluster 2 show:

150413 17:18:32 [ERROR] Slave SQL: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1', Internal MariaDB error code: 1008
150413 17:18:32 [Warning] WSREP: RBR event 1 Query apply warning: 1, 6
150413 17:18:32 [Warning] WSREP: Ignoring error for TO isolated action: source: f371bbe5-e214-11e4-acfd-7a10f557098b version: 3 local: 0 state: APPLYING flags: 65 conn_id: 13 trx_id: -1 seqnos (l: 8, g: 6, s: 5, d: 5, ts: 9034574114945)

The statement only appears in the binary log of Cluster 1, node 1 once:

# at 551
#150413 17:15:29 server id 1  end_log_pos 589   GTID 21-1-4
/*!100001 SET @@session.gtid_domain_id=21*//*!*/;
/*!100001 SET @@session.server_id=1*//*!*/;
/*!100001 SET @@session.gtid_seq_no=4*//*!*/;
# at 589
#150413 17:15:29 server id 1  end_log_pos 670   Query   thread_id=9     exec_time=0     error_code=0
SET TIMESTAMP=1428959729/*!*/;
CREATE DATABASE db1
/*!*/;
# at 670
#150413 17:18:32 server id 5  end_log_pos 708   GTID 22-5-2
/*!100001 SET @@session.gtid_domain_id=22*//*!*/;
/*!100001 SET @@session.server_id=5*//*!*/;
/*!100001 SET @@session.gtid_seq_no=2*//*!*/;
# at 708
#150413 17:18:32 server id 5  end_log_pos 787   Query   thread_id=8     exec_time=0     error_code=0
SET TIMESTAMP=1428959912/*!*/;
DROP DATABASE db1
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;



 Comments   
Comment by Geoff Montee (Inactive) [ 2015-04-14 ]

The duplicate operation was properly ignored when server_id was set to the same value for all nodes in a particular cluster (i.e. in cluster 1, server_id is set to 1, in cluster 2, server_id is set to 2).

Is that the only supported method to use GTID replication with Galera?

Comment by Sachin Setiya (Inactive) [ 2017-12-25 ]

10.1.31 wont have this issue(MDEV-10715)

Generated at Thu Feb 08 07:23:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.