[MDEV-7984] Galera doesn't ignore duplicates properly with GTID replication - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 10.0.17-galera
Fix Version/s: 10.0.34-galera
Component/s: wsrep
Labels:
- galera
- replication

Description

When replicating between two Galera clusters using GTID replication, duplicate DDL does not get ignored, even with gtid_ignore_duplicates turned on, if each node has its own server_id.

Let's say we have 2 clusters, each with 3 nodes.

Cluster 1, node1:

MariaDB [(none)]> SHOW STATUS WHERE Variable_name IN('wsrep_connected', 'wsrep_cluster_status', 'wsrep_cluster_size', 'wsrep_ready');

+----------------------+---------+

| Variable_name        | Value   |

+----------------------+---------+

| wsrep_cluster_size   | 3       |

| wsrep_cluster_status | Primary |

| wsrep_connected      | ON      |

| wsrep_ready          | ON      |

+----------------------+---------+

4 rows in set (0.01 sec)

Cluster 2, node1:

MariaDB [(none)]> SHOW STATUS WHERE Variable_name IN('wsrep_connected', 'wsrep_cluster_status', 'wsrep_cluster_size', 'wsrep_ready');

+----------------------+---------+

| Variable_name        | Value   |

+----------------------+---------+

| wsrep_cluster_size   | 3       |

| wsrep_cluster_status | Primary |

| wsrep_connected      | ON      |

| wsrep_ready          | ON      |

+----------------------+---------+

4 rows in set (0.00 sec)

Lets view the current GTID positions of each.

Cluster 1, node 1:

MariaDB [(none)]> SHOW MASTER STATUS\G

*************************** 1. row ***************************

            File: mariadb-bin.000002

        Position: 787

    Binlog_Do_DB:

Binlog_Ignore_DB:

1 row in set (0.00 sec)

MariaDB [(none)]> SELECT BINLOG_GTID_POS("mariadb-bin.000002", 787);

+--------------------------------------------+

| BINLOG_GTID_POS("mariadb-bin.000002", 787) |

+--------------------------------------------+

| 21-1-3,22-5-1                              |

+--------------------------------------------+

1 row in set (0.00 sec)

Cluster 2, node 1:

MariaDB [(none)]> SHOW MASTER STATUS\G

*************************** 1. row ***************************

            File: mariadb-bin.000002

        Position: 551

    Binlog_Do_DB:

Binlog_Ignore_DB:

1 row in set (0.00 sec)

MariaDB [(none)]> SELECT BINLOG_GTID_POS("mariadb-bin.000002", 551);

+--------------------------------------------+

| BINLOG_GTID_POS("mariadb-bin.000002", 551) |

+--------------------------------------------+

| 21-1-3,22-5-1                              |

+--------------------------------------------+

1 row in set (0.00 sec)

Now let's set up replication between each cluster:

Cluster 1, node 1:

MariaDB [(none)]> STOP SLAVE;

Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> RESET SLAVE;

Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> SET GLOBAL GTID_SLAVE_POS='21-1-3,22-5-1';

Query OK, 0 rows affected (0.11 sec)

MariaDB [(none)]> CHANGE MASTER TO master_host="192.168.1.55", master_use_gtid=current_pos, master_user='sst', master_password='sst';

Query OK, 0 rows affected (0.02 sec)

MariaDB [(none)]> START SLAVE;

Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.1.55

                  Master_User: sst

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mariadb-bin.000002

          Read_Master_Log_Pos: 551

               Relay_Log_File: localhost-relay-bin.000002

                Relay_Log_Pos: 697

        Relay_Master_Log_File: mariadb-bin.000002

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 551

              Relay_Log_Space: 998

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 4

               Master_SSL_Crl:

           Master_SSL_Crlpath:

                   Using_Gtid: Current_Pos

                  Gtid_IO_Pos: 21-1-3,22-5-1

1 row in set (0.00 sec)

Cluster 2, node 1:

MariaDB [(none)]> STOP SLAVE;

Query OK, 0 rows affected (0.03 sec)

MariaDB [(none)]> RESET SLAVE;

Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> SET GLOBAL GTID_SLAVE_POS='21-1-3,22-5-1';

Query OK, 0 rows affected (0.03 sec)

MariaDB [(none)]> CHANGE MASTER TO master_host="192.168.1.52", master_use_gtid=current_pos, master_user='sst', master_password='sst';

Query OK, 0 rows affected (0.02 sec)

MariaDB [(none)]> START SLAVE;

Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.1.52

                  Master_User: sst

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mariadb-bin.000002

          Read_Master_Log_Pos: 787

               Relay_Log_File: localhost-relay-bin.000002

                Relay_Log_Pos: 697

        Relay_Master_Log_File: mariadb-bin.000002

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 787

              Relay_Log_Space: 998

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1

               Master_SSL_Crl:

           Master_SSL_Crlpath:

                   Using_Gtid: Current_Pos

                  Gtid_IO_Pos: 21-1-3,22-5-1

1 row in set (0.00 sec)

Now let's create a database on cluster 1.

Cluster 1, node 1:

MariaDB [(none)]> CREATE DATABASE db1;

Query OK, 1 row affected (0.02 sec)

MariaDB [(none)]> SHOW DATABASES;

+--------------------+

| Database           |

+--------------------+

| db1                |

| information_schema |

| mysql              |

| performance_schema |

| test               |

+--------------------+

5 rows in set (0.00 sec)

So far, everything looks good on cluster 2.

Cluster 2, node 1:

MariaDB [(none)]> SHOW DATABASES;

+--------------------+

| Database           |

+--------------------+

| db1                |

| information_schema |

| mysql              |

| performance_schema |

| test               |

+--------------------+

5 rows in set (0.00 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.1.52

                  Master_User: sst

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mariadb-bin.000002

          Read_Master_Log_Pos: 906

               Relay_Log_File: localhost-relay-bin.000002

                Relay_Log_Pos: 816

        Relay_Master_Log_File: mariadb-bin.000002

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 906

              Relay_Log_Space: 1117

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1

               Master_SSL_Crl:

           Master_SSL_Crlpath:

                   Using_Gtid: Current_Pos

                  Gtid_IO_Pos: 21-1-4,22-5-1

1 row in set (0.00 sec)

However, now let's try dropping the database on one of the nodes in cluster 2:

Cluster 2, node 2:

MariaDB [(none)]> DROP DATABASE db1;

Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW DATABASES;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| mysql              |

| performance_schema |

| test               |

+--------------------+

4 rows in set (0.00 sec)

Everything looks good on cluster 1.

Cluster 1, node 1:

MariaDB [(none)]> SHOW DATABASES;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| mysql              |

| performance_schema |

| test               |

+--------------------+

4 rows in set (0.00 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.1.55

                  Master_User: sst

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mariadb-bin.000002

          Read_Master_Log_Pos: 787

               Relay_Log_File: localhost-relay-bin.000002

                Relay_Log_Pos: 814

        Relay_Master_Log_File: mariadb-bin.000002

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 787

              Relay_Log_Space: 1115

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 4

               Master_SSL_Crl:

           Master_SSL_Crlpath:

                   Using_Gtid: Current_Pos

                  Gtid_IO_Pos: 21-1-4,22-5-2

1 row in set (0.00 sec)

However, it seems to have broken the replication back to cluster 2:

Cluster 2, node 1:

MariaDB [(none)]> SHOW DATABASES;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| mysql              |

| performance_schema |

| test               |

+--------------------+

4 rows in set (0.00 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.1.52

                  Master_User: sst

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mariadb-bin.000002

          Read_Master_Log_Pos: 1023

               Relay_Log_File: localhost-relay-bin.000002

                Relay_Log_Pos: 816

        Relay_Master_Log_File: mariadb-bin.000002

             Slave_IO_Running: Yes

            Slave_SQL_Running: No

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 1008

                   Last_Error: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1'

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 906

              Relay_Log_Space: 1234

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 1008

               Last_SQL_Error: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1'

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1

               Master_SSL_Crl:

           Master_SSL_Crlpath:

                   Using_Gtid: Current_Pos

                  Gtid_IO_Pos: 21-1-4,22-5-2

1 row in set (0.00 sec)

The "drop database" command appears to have been executed twice: once via Galera, and once via GTID replication.

The error log for Cluster 2, node 1 shows:

150413 17:18:32 [ERROR] Slave SQL: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1', Gtid 22-5-2, Internal MariaDB error code: 1008

150413 17:18:32 [Warning] Slave: Can't drop database 'db1'; database doesn't exist Error_code: 1008

150413 17:18:32 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000002' position 906; GTID position '21-1-4,22-5-1'

The error logs for other nodes in Cluster 2 show:

150413 17:18:32 [ERROR] Slave SQL: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1', Internal MariaDB error code: 1008

150413 17:18:32 [Warning] WSREP: RBR event 1 Query apply warning: 1, 6

150413 17:18:32 [Warning] WSREP: Ignoring error for TO isolated action: source: f371bbe5-e214-11e4-acfd-7a10f557098b version: 3 local: 0 state: APPLYING flags: 65 conn_id: 13 trx_id: -1 seqnos (l: 8, g: 6, s: 5, d: 5, ts: 9034574114945)

The statement only appears in the binary log of Cluster 1, node 1 once:

# at 551

#150413 17:15:29 server id 1  end_log_pos 589   GTID 21-1-4

/*!100001 SET @@session.gtid_domain_id=21*//*!*/;

/*!100001 SET @@session.server_id=1*//*!*/;

/*!100001 SET @@session.gtid_seq_no=4*//*!*/;

# at 589

#150413 17:15:29 server id 1  end_log_pos 670   Query   thread_id=9     exec_time=0     error_code=0

SET TIMESTAMP=1428959729/*!*/;

CREATE DATABASE db1

/*!*/;

# at 670

#150413 17:18:32 server id 5  end_log_pos 708   GTID 22-5-2

/*!100001 SET @@session.gtid_domain_id=22*//*!*/;

/*!100001 SET @@session.server_id=5*//*!*/;

/*!100001 SET @@session.gtid_seq_no=2*//*!*/;

# at 708

#150413 17:18:32 server id 5  end_log_pos 787   Query   thread_id=8     exec_time=0     error_code=0

SET TIMESTAMP=1428959912/*!*/;

DROP DATABASE db1

/*!*/;

DELIMITER ;

# End of log file

ROLLBACK /* added by mysqlbinlog */;

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Attachments

Issue Links

relates to

MDEV-20720 Galera: Replicate MariaDB GTID to other nodes in the cluster

Closed

Activity

People

Assignee:: Sachin Setiya (Inactive)

Reporter:: Geoff Montee (Inactive)

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2015-04-14 00:28

Updated:: 2017-12-25 15:50

Resolved:: 2017-12-25 15:50

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.