Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7984

Galera doesn't ignore duplicates properly with GTID replication

    XMLWordPrintable

Details

    Description

      When replicating between two Galera clusters using GTID replication, duplicate DDL does not get ignored, even with gtid_ignore_duplicates turned on, if each node has its own server_id.

      Let's say we have 2 clusters, each with 3 nodes.

      Cluster 1, node1:

      MariaDB [(none)]> SHOW STATUS WHERE Variable_name IN('wsrep_connected', 'wsrep_cluster_status', 'wsrep_cluster_size', 'wsrep_ready');
      +----------------------+---------+
      | Variable_name        | Value   |
      +----------------------+---------+
      | wsrep_cluster_size   | 3       |
      | wsrep_cluster_status | Primary |
      | wsrep_connected      | ON      |
      | wsrep_ready          | ON      |
      +----------------------+---------+
      4 rows in set (0.01 sec)

      Cluster 2, node1:

      MariaDB [(none)]> SHOW STATUS WHERE Variable_name IN('wsrep_connected', 'wsrep_cluster_status', 'wsrep_cluster_size', 'wsrep_ready');
      +----------------------+---------+
      | Variable_name        | Value   |
      +----------------------+---------+
      | wsrep_cluster_size   | 3       |
      | wsrep_cluster_status | Primary |
      | wsrep_connected      | ON      |
      | wsrep_ready          | ON      |
      +----------------------+---------+
      4 rows in set (0.00 sec)

      Lets view the current GTID positions of each.

      Cluster 1, node 1:

      MariaDB [(none)]> SHOW MASTER STATUS\G
      *************************** 1. row ***************************
                  File: mariadb-bin.000002
              Position: 787
          Binlog_Do_DB: 
      Binlog_Ignore_DB: 
      1 row in set (0.00 sec)
       
      MariaDB [(none)]> SELECT BINLOG_GTID_POS("mariadb-bin.000002", 787);
      +--------------------------------------------+
      | BINLOG_GTID_POS("mariadb-bin.000002", 787) |
      +--------------------------------------------+
      | 21-1-3,22-5-1                              |
      +--------------------------------------------+
      1 row in set (0.00 sec)

      Cluster 2, node 1:

      MariaDB [(none)]> SHOW MASTER STATUS\G
      *************************** 1. row ***************************
                  File: mariadb-bin.000002
              Position: 551
          Binlog_Do_DB: 
      Binlog_Ignore_DB: 
      1 row in set (0.00 sec)
       
      MariaDB [(none)]> SELECT BINLOG_GTID_POS("mariadb-bin.000002", 551);
      +--------------------------------------------+
      | BINLOG_GTID_POS("mariadb-bin.000002", 551) |
      +--------------------------------------------+
      | 21-1-3,22-5-1                              |
      +--------------------------------------------+
      1 row in set (0.00 sec)

      Now let's set up replication between each cluster:

      Cluster 1, node 1:

      MariaDB [(none)]> STOP SLAVE;
      Query OK, 0 rows affected (0.01 sec)
       
      MariaDB [(none)]> RESET SLAVE;
      Query OK, 0 rows affected (0.00 sec)
       
      MariaDB [(none)]> SET GLOBAL GTID_SLAVE_POS='21-1-3,22-5-1';
      Query OK, 0 rows affected (0.11 sec)
       
      MariaDB [(none)]> CHANGE MASTER TO master_host="192.168.1.55", master_use_gtid=current_pos, master_user='sst', master_password='sst';
      Query OK, 0 rows affected (0.02 sec)
       
      MariaDB [(none)]> START SLAVE;
      Query OK, 0 rows affected (0.01 sec)
       
      MariaDB [(none)]> SHOW SLAVE STATUS\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 192.168.1.55
                        Master_User: sst
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000002
                Read_Master_Log_Pos: 551
                     Relay_Log_File: localhost-relay-bin.000002
                      Relay_Log_Pos: 697
              Relay_Master_Log_File: mariadb-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 551
                    Relay_Log_Space: 998
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 4
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 21-1-3,22-5-1
      1 row in set (0.00 sec)

      Cluster 2, node 1:

      MariaDB [(none)]> STOP SLAVE;
      Query OK, 0 rows affected (0.03 sec)
       
      MariaDB [(none)]> RESET SLAVE;
      Query OK, 0 rows affected (0.00 sec)
       
      MariaDB [(none)]> SET GLOBAL GTID_SLAVE_POS='21-1-3,22-5-1';
      Query OK, 0 rows affected (0.03 sec)
       
      MariaDB [(none)]> CHANGE MASTER TO master_host="192.168.1.52", master_use_gtid=current_pos, master_user='sst', master_password='sst';
      Query OK, 0 rows affected (0.02 sec)
       
      MariaDB [(none)]> START SLAVE;
      Query OK, 0 rows affected (0.01 sec)
       
      MariaDB [(none)]> SHOW SLAVE STATUS\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 192.168.1.52
                        Master_User: sst
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000002
                Read_Master_Log_Pos: 787
                     Relay_Log_File: localhost-relay-bin.000002
                      Relay_Log_Pos: 697
              Relay_Master_Log_File: mariadb-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 787
                    Relay_Log_Space: 998
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 21-1-3,22-5-1
      1 row in set (0.00 sec)

      Now let's create a database on cluster 1.

      Cluster 1, node 1:

      MariaDB [(none)]> CREATE DATABASE db1;
      Query OK, 1 row affected (0.02 sec)
       
      MariaDB [(none)]> SHOW DATABASES;
      +--------------------+
      | Database           |
      +--------------------+
      | db1                |
      | information_schema |
      | mysql              |
      | performance_schema |
      | test               |
      +--------------------+
      5 rows in set (0.00 sec)

      So far, everything looks good on cluster 2.

      Cluster 2, node 1:

      MariaDB [(none)]> SHOW DATABASES;
      +--------------------+
      | Database           |
      +--------------------+
      | db1                |
      | information_schema |
      | mysql              |
      | performance_schema |
      | test               |
      +--------------------+
      5 rows in set (0.00 sec)
       
      MariaDB [(none)]> SHOW SLAVE STATUS\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 192.168.1.52
                        Master_User: sst
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000002
                Read_Master_Log_Pos: 906
                     Relay_Log_File: localhost-relay-bin.000002
                      Relay_Log_Pos: 816
              Relay_Master_Log_File: mariadb-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 906
                    Relay_Log_Space: 1117
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 21-1-4,22-5-1
      1 row in set (0.00 sec)

      However, now let's try dropping the database on one of the nodes in cluster 2:

      Cluster 2, node 2:

      MariaDB [(none)]> DROP DATABASE db1;
      Query OK, 0 rows affected (0.01 sec)
       
      MariaDB [(none)]> SHOW DATABASES;
      +--------------------+
      | Database           |
      +--------------------+
      | information_schema |
      | mysql              |
      | performance_schema |
      | test               |
      +--------------------+
      4 rows in set (0.00 sec)

      Everything looks good on cluster 1.

      Cluster 1, node 1:

      MariaDB [(none)]> SHOW DATABASES;
      +--------------------+
      | Database           |
      +--------------------+
      | information_schema |
      | mysql              |
      | performance_schema |
      | test               |
      +--------------------+
      4 rows in set (0.00 sec)
       
      MariaDB [(none)]> SHOW SLAVE STATUS\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 192.168.1.55
                        Master_User: sst
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000002
                Read_Master_Log_Pos: 787
                     Relay_Log_File: localhost-relay-bin.000002
                      Relay_Log_Pos: 814
              Relay_Master_Log_File: mariadb-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 787
                    Relay_Log_Space: 1115
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 4
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 21-1-4,22-5-2
      1 row in set (0.00 sec)

      However, it seems to have broken the replication back to cluster 2:

      Cluster 2, node 1:

      MariaDB [(none)]> SHOW DATABASES;
      +--------------------+
      | Database           |
      +--------------------+
      | information_schema |
      | mysql              |
      | performance_schema |
      | test               |
      +--------------------+
      4 rows in set (0.00 sec)
       
      MariaDB [(none)]> SHOW SLAVE STATUS\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 192.168.1.52
                        Master_User: sst
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000002
                Read_Master_Log_Pos: 1023
                     Relay_Log_File: localhost-relay-bin.000002
                      Relay_Log_Pos: 816
              Relay_Master_Log_File: mariadb-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1008
                         Last_Error: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1'
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 906
                    Relay_Log_Space: 1234
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1008
                     Last_SQL_Error: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1'
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 21-1-4,22-5-2
      1 row in set (0.00 sec)

      The "drop database" command appears to have been executed twice: once via Galera, and once via GTID replication.

      The error log for Cluster 2, node 1 shows:

      150413 17:18:32 [ERROR] Slave SQL: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1', Gtid 22-5-2, Internal MariaDB error code: 1008
      150413 17:18:32 [Warning] Slave: Can't drop database 'db1'; database doesn't exist Error_code: 1008
      150413 17:18:32 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000002' position 906; GTID position '21-1-4,22-5-1'

      The error logs for other nodes in Cluster 2 show:

      150413 17:18:32 [ERROR] Slave SQL: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1', Internal MariaDB error code: 1008
      150413 17:18:32 [Warning] WSREP: RBR event 1 Query apply warning: 1, 6
      150413 17:18:32 [Warning] WSREP: Ignoring error for TO isolated action: source: f371bbe5-e214-11e4-acfd-7a10f557098b version: 3 local: 0 state: APPLYING flags: 65 conn_id: 13 trx_id: -1 seqnos (l: 8, g: 6, s: 5, d: 5, ts: 9034574114945)

      The statement only appears in the binary log of Cluster 1, node 1 once:

      # at 551
      #150413 17:15:29 server id 1  end_log_pos 589   GTID 21-1-4
      /*!100001 SET @@session.gtid_domain_id=21*//*!*/;
      /*!100001 SET @@session.server_id=1*//*!*/;
      /*!100001 SET @@session.gtid_seq_no=4*//*!*/;
      # at 589
      #150413 17:15:29 server id 1  end_log_pos 670   Query   thread_id=9     exec_time=0     error_code=0
      SET TIMESTAMP=1428959729/*!*/;
      CREATE DATABASE db1
      /*!*/;
      # at 670
      #150413 17:18:32 server id 5  end_log_pos 708   GTID 22-5-2
      /*!100001 SET @@session.gtid_domain_id=22*//*!*/;
      /*!100001 SET @@session.server_id=5*//*!*/;
      /*!100001 SET @@session.gtid_seq_no=2*//*!*/;
      # at 708
      #150413 17:18:32 server id 5  end_log_pos 787   Query   thread_id=8     exec_time=0     error_code=0
      SET TIMESTAMP=1428959912/*!*/;
      DROP DATABASE db1
      /*!*/;
      DELIMITER ;
      # End of log file
      ROLLBACK /* added by mysqlbinlog */;
      /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
      /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

      Attachments

        Issue Links

          Activity

            People

              sachin.setiya.007 Sachin Setiya (Inactive)
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.