Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33610

Circular replication breaks after upgrading

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11.7, 10.11.8
    • 10.11
    • Replication
    • None
    • RockyLinux 8.9

    Description

      We have a circular replication between a 3 node Galera Cluster and a standalone MariaDB node. Replication in both direction was working well in 10.11.6.

      After upgrading to 10.11.7, the following error occur and 1 of the cluster node was automatically removed from the cluster:

      2024-03-06 13:34:19 792 [Note] Slave I/O thread: Start asynchronous replication to master 'repl_user@sunny:3306' in log 'mysql-bin.000302' at position 752714051
      2024-03-06 13:34:19 793 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000302' at position 752714051, relay log './rezel-relay-bin.000001' position: 4
      2024-03-06 13:34:19 793 [Note] WSREP: ready state reached
      2024-03-06 13:34:19 792 [Note] Slave I/O thread: connected to master 'repl_user@sunny:3306',replication started in log 'mysql-bin.000302' at position 752714051
      2024-03-06 13:34:20 0 [Note] WSREP: Member 2(unicorn) initiates vote on e171100d-322d-11e8-9957-624639ca8561:397983341,cb28eb6c0c3860ee:  Duplicate entry '0-121295558' for key 'PRIMARY', Error_code: 1062;
      2024-03-06 13:34:20 0 [Note] WSREP: Votes over e171100d-322d-11e8-9957-624639ca8561:397983341:
         cb28eb6c0c3860ee:   1/3
      Waiting for more votes.
      2024-03-06 13:34:20 2 [Note] WSREP: Got vote request for seqno e171100d-322d-11e8-9957-624639ca8561:397983341
      2024-03-06 13:34:20 0 [Note] WSREP: Member 0(phenex) initiates vote on e171100d-322d-11e8-9957-624639ca8561:397983341,cb28eb6c0c3860ee:  Duplicate entry '0-121295558' for key 'PRIMARY', Error_code: 1062;
      2024-03-06 13:34:20 0 [Note] WSREP: Votes over e171100d-322d-11e8-9957-624639ca8561:397983341:
         cb28eb6c0c3860ee:   2/3
      Winner: cb28eb6c0c3860ee
      2024-03-06 13:34:20 0 [Note] WSREP: Recovering vote result from history: e171100d-322d-11e8-9957-624639ca8561:397983341,cb28eb6c0c3860ee
      2024-03-06 13:34:20 2 [ERROR] WSREP: Vote 0 (success) on e171100d-322d-11e8-9957-624639ca8561:397983341 is inconsistent with group. Leaving cluster.
      

      Rolling back to 10.11.6 resolves the issue.

      Attachments

        Issue Links

          Activity

            eric@geniqtech.com It seems this happens because wsrep_gtid_mode=OFF as in this setting GTIDs are not unique across cluster.

            janlindstrom Jan Lindström added a comment - eric@geniqtech.com It seems this happens because wsrep_gtid_mode=OFF as in this setting GTIDs are not unique across cluster.
            eric@geniqtech.com Eric Ang added a comment -

            I ran the following statements on all cluster nodes and all are reporting the same values.
            According to the documentation from https://mariadb.com/kb/en/using-mariadb-replication-with-mariadb-galera-cluster-configuring-mariadb-r/, all cluster nodes need to use the same "server-id" and "wsrep_gtid_domain_id". Only "gtid_domain_id" is different for each node.

            SHOW GLOBAL VARIABLES LIKE 'wsrep_gtid_mode';
            +-----------------+-------+
            | Variable_name   | Value |
            +-----------------+-------+
            | wsrep_gtid_mode | ON    |
            +-----------------+-------+
            1 row in set (0.001 sec)
             
            MariaDB [mysql]> SHOW GLOBAL VARIABLES LIKE 'wsrep_gtid_domain_id';
            +----------------------+-------+
            | Variable_name        | Value |
            +----------------------+-------+
            | wsrep_gtid_domain_id | 999   |
            +----------------------+-------+
            1 row in set (0.001 sec)
            

            eric@geniqtech.com Eric Ang added a comment - I ran the following statements on all cluster nodes and all are reporting the same values. According to the documentation from https://mariadb.com/kb/en/using-mariadb-replication-with-mariadb-galera-cluster-configuring-mariadb-r/ , all cluster nodes need to use the same "server-id" and "wsrep_gtid_domain_id". Only "gtid_domain_id" is different for each node. SHOW GLOBAL VARIABLES LIKE 'wsrep_gtid_mode'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | wsrep_gtid_mode | ON | +-----------------+-------+ 1 row in set (0.001 sec)   MariaDB [mysql]> SHOW GLOBAL VARIABLES LIKE 'wsrep_gtid_domain_id'; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | wsrep_gtid_domain_id | 999 | +----------------------+-------+ 1 row in set (0.001 sec)
            eric@geniqtech.com Eric Ang added a comment -

            One weird thing that I noticed when checking entries in gtid_slave_pos table, why some entries have value 0 for server_id?
            In my server configuration, there's only 2 server_ids
            999 => Cluster
            8 => Standalone

            SELECT * FROM mysql.gtid_slave_pos;
            +-----------+-----------+-----------+--------+
            | domain_id | sub_id    | server_id | seq_no |
            +-----------+-----------+-----------+--------+
            |         8 | 160091987 |         8 | 298010 |
            |         8 | 160091988 |         8 | 298011 |
            |         8 | 160091989 |         8 | 298012 |
            |         8 | 160091990 |         8 | 298013 |
            |         8 | 160091991 |         8 | 298014 |
            |         8 | 160092621 |         8 | 298015 |
            |         8 | 160092666 |         8 | 298016 |
            |         8 | 160092692 |         8 | 298017 |
            |         8 | 160092800 |         8 | 298018 |
            |         8 | 160092954 |         8 | 298019 |
            |         8 | 160093111 |         8 | 298020 |
            |         8 | 160093113 |         8 | 298021 |
            |         8 | 160093194 |         8 | 298022 |
            |       999 | 160091992 |       999 | 603666 |
            |       999 | 160091993 |       999 | 603667 |
            |       999 | 160091994 |         0 | 603674 |
            |       999 | 160091995 |       999 | 603677 |
            |       999 | 160091996 |         0 | 603674 |
            |       999 | 160091997 |         0 | 603674 |
            |       999 | 160091998 |         0 | 603674 |
            |       999 | 160091999 |         0 | 603674 |
            |       999 | 160092000 |         0 | 603674 |
            |       999 | 160092001 |         0 | 603674 |
            ...
            |       999 | 160093362 |         0 | 603674 |
            +-----------+-----------+-----------+--------+
            1375 rows in set (0.001 sec)
            

            eric@geniqtech.com Eric Ang added a comment - One weird thing that I noticed when checking entries in gtid_slave_pos table, why some entries have value 0 for server_id? In my server configuration, there's only 2 server_ids 999 => Cluster 8 => Standalone SELECT * FROM mysql.gtid_slave_pos; +-----------+-----------+-----------+--------+ | domain_id | sub_id | server_id | seq_no | +-----------+-----------+-----------+--------+ | 8 | 160091987 | 8 | 298010 | | 8 | 160091988 | 8 | 298011 | | 8 | 160091989 | 8 | 298012 | | 8 | 160091990 | 8 | 298013 | | 8 | 160091991 | 8 | 298014 | | 8 | 160092621 | 8 | 298015 | | 8 | 160092666 | 8 | 298016 | | 8 | 160092692 | 8 | 298017 | | 8 | 160092800 | 8 | 298018 | | 8 | 160092954 | 8 | 298019 | | 8 | 160093111 | 8 | 298020 | | 8 | 160093113 | 8 | 298021 | | 8 | 160093194 | 8 | 298022 | | 999 | 160091992 | 999 | 603666 | | 999 | 160091993 | 999 | 603667 | | 999 | 160091994 | 0 | 603674 | | 999 | 160091995 | 999 | 603677 | | 999 | 160091996 | 0 | 603674 | | 999 | 160091997 | 0 | 603674 | | 999 | 160091998 | 0 | 603674 | | 999 | 160091999 | 0 | 603674 | | 999 | 160092000 | 0 | 603674 | | 999 | 160092001 | 0 | 603674 | ... | 999 | 160093362 | 0 | 603674 | +-----------+-----------+-----------+--------+ 1375 rows in set (0.001 sec)

            eric@geniqtech.com Just to confirm do you have different wsrep-gtid-domain-id values on second cluster? Value needs to be same in a cluster but different clusters should have different value. In circular replication you have at least two clusters, right?

            janlindstrom Jan Lindström added a comment - eric@geniqtech.com Just to confirm do you have different wsrep-gtid-domain-id values on second cluster? Value needs to be same in a cluster but different clusters should have different value. In circular replication you have at least two clusters, right?
            eric@geniqtech.com Eric Ang added a comment -

            My setup is a circular replication between Galera cluster and a standalone MariaDB server.
            For the cluster, it is using same value for "wsrep_gtid_domain_id" on all nodes.
            The standalone MariaDB server does not have "wsrep_gtid_domain_id" as it is not a cluster.

            eric@geniqtech.com Eric Ang added a comment - My setup is a circular replication between Galera cluster and a standalone MariaDB server. For the cluster, it is using same value for "wsrep_gtid_domain_id" on all nodes. The standalone MariaDB server does not have "wsrep_gtid_domain_id" as it is not a cluster.

            People

              janlindstrom Jan Lindström
              eric@geniqtech.com Eric Ang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.