Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13431

wsrep_gtid_mode uses wrong GTID for transaction committed by slave thread

Details

    Description

      This is similar to MDEV-8458, but this issue also effects MariaDB 10.1 with wsrep_gtid_mode=ON.

      When wsrep_gtid_mode is enabled, transactions that are replicated within a cluster by Galera receive a GTID where the domain_id is specified by wsrep_gtid_domain_id, the server_id is specified by server_id, and the seq_no is incremented for each transaction that is committed in the domain.

      It does not seem to work this way for transactions that are replicated by an asynchronous slave thread within a Galera cluster.

      For example, let's say that we have two clusters and one cluster replicates from the other using GTID replication.

      On cluster1, we see the following:

      MariaDB [(none)]> show global variables like '%gtid%';
      +------------------------+-------+
      | Variable_name          | Value |
      +------------------------+-------+
      | gtid_binlog_pos        | 1-1-1 |
      | gtid_binlog_state      | 1-1-1 |
      | gtid_current_pos       | 1-1-1 |
      | gtid_domain_id         | 1     |
      | gtid_ignore_duplicates | OFF   |
      | gtid_slave_pos         |       |
      | gtid_strict_mode       | OFF   |
      | wsrep_gtid_domain_id   | 1     |
      | wsrep_gtid_mode        | ON    |
      +------------------------+-------+
      9 rows in set (0.00 sec)
      

      On cluster2, we see the following:

      MariaDB [(none)]> show global variables like '%gtid%';
      +------------------------+-------------+
      | Variable_name          | Value       |
      +------------------------+-------------+
      | gtid_binlog_pos        | 2-2-2       |
      | gtid_binlog_state      | 2-2-2       |
      | gtid_current_pos       | 1-1-1,2-2-2 |
      | gtid_domain_id         | 2           |
      | gtid_ignore_duplicates | OFF         |
      | gtid_slave_pos         | 1-1-1       |
      | gtid_strict_mode       | OFF         |
      | wsrep_gtid_domain_id   | 2           |
      | wsrep_gtid_mode        | ON          |
      +------------------------+-------------+
      9 rows in set (0.00 sec)
      

      One node in cluster2 is a slave of one node in cluster1:

      MariaDB [(none)]> show slave status\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 172.30.0.32
                        Master_User: repl
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000002
                Read_Master_Log_Pos: 428
                     Relay_Log_File: ip-172-30-0-46-relay-bin.000002
                      Relay_Log_Pos: 644
              Relay_Master_Log_File: mariadb-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB:
                Replicate_Ignore_DB:
                 Replicate_Do_Table:
             Replicate_Ignore_Table:
            Replicate_Wild_Do_Table:
        Replicate_Wild_Ignore_Table:
                         Last_Errno: 0
                         Last_Error:
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 428
                    Relay_Log_Space: 951
                    Until_Condition: None
                     Until_Log_File:
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File:
                 Master_SSL_CA_Path:
                    Master_SSL_Cert:
                  Master_SSL_Cipher:
                     Master_SSL_Key:
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error:
                     Last_SQL_Errno: 0
                     Last_SQL_Error:
        Replicate_Ignore_Server_Ids:
                   Master_Server_Id: 1
                     Master_SSL_Crl:
                 Master_SSL_Crlpath:
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 1-1-1
            Replicate_Do_Domain_Ids:
        Replicate_Ignore_Domain_Ids:
                      Parallel_Mode: conservative
      1 row in set (0.00 sec)
      

      If we commit a transaction on cluster1, we would expect it to have the GTID 1-1-2 on cluster1, and either 2-1-3 or 2-2-3 on cluster2, depending on whether it uses the server_id of the originating cluster or replaces it with its own. Does that actually happen?:

      Let's say that we execute the following on cluster1:

      MariaDB [(none)]> insert into db1.tab values (1, 'str1');
      Query OK, 1 row affected (0.00 sec)
      

      What GTID does this transaction have on each cluster?

      Here's the binlog event on the node in cluster1 where the transaction originated:

      # at 428
      #170802 16:51:02 server id 1  end_log_pos 466   GTID 1-1-2 trans
      /*!100001 SET @@session.gtid_seq_no=2*//*!*/;
      BEGIN
      /*!*/;
      # at 466
      # at 523
      #170802 16:51:02 server id 1  end_log_pos 523   Annotate_rows:
      #Q> insert into db1.tab values (1, 'str1')
      #170802 16:51:02 server id 1  end_log_pos 567   Table_map: `db1`.`tab` mapped to number 18
      # at 567
      #170802 16:51:02 server id 1  end_log_pos 606   Write_rows: table id 18 flags: STMT_END_F
       
      BINLOG '
      NjuCWRMBAAAALAAAADcCAAAAABIAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
      NjuCWRcBAAAAJwAAAF4CAAAAABIAAAAAAAEAAv/8AQAAAARzdHIx
      '/*!*/;
      ### INSERT INTO `db1`.`tab`
      ### SET
      ###   @1=1
      ###   @2='str1'
      # at 606
      #170802 16:51:02 server id 1  end_log_pos 633   Xid = 10
      COMMIT/*!*/;
      

      And here's the binlog event on the node in cluster2 that is acting as a slave to cluster1:

      # at 617
      #170802 16:51:02 server id 1  end_log_pos 655   GTID 2-1-2 trans
      /*!100001 SET @@session.server_id=1*//*!*/;
      /*!100001 SET @@session.gtid_seq_no=2*//*!*/;
      BEGIN
      /*!*/;
      # at 655
      #170802 16:51:02 server id 1  end_log_pos 699   Table_map: `db1`.`tab` mapped to number 21
      # at 699
      #170802 16:51:02 server id 1  end_log_pos 738   Write_rows: table id 21 flags: STMT_END_F
       
      BINLOG '
      NjuCWRMBAAAALAAAALsCAAAAABUAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
      NjuCWRcBAAAAJwAAAOICAAAAABUAAAAAAAEAAgP8AQAAAARzdHIx
      '/*!*/;
      ### INSERT INTO `db1`.`tab`
      ### SET
      ###   @1=1
      ###   @2='str1'
      # at 738
      #170802 16:51:02 server id 1  end_log_pos 765   Xid = 3
      COMMIT/*!*/;
      

      And here's the binlog event on another node in cluster2:

      # at 576
      #170802 16:51:02 server id 1  end_log_pos 614   GTID 2-1-3 trans
      /*!100001 SET @@session.server_id=1*//*!*/;
      /*!100001 SET @@session.gtid_seq_no=3*//*!*/;
      BEGIN
      /*!*/;
      # at 614
      #170802 16:51:02 server id 1  end_log_pos 658   Table_map: `db1`.`tab` mapped to number 20
      # at 658
      #170802 16:51:02 server id 1  end_log_pos 697   Write_rows: table id 20 flags: STMT_END_F
       
      BINLOG '
      NjuCWRMBAAAALAAAAJICAAAAABQAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
      NjuCWRcBAAAAJwAAALkCAAAAABQAAAAAAAEAAgP8AQAAAARzdHIx
      '/*!*/;
      ### INSERT INTO `db1`.`tab`
      ### SET
      ###   @1=1
      ###   @2='str1'
      # at 697
      #170802 16:51:02 server id 1  end_log_pos 724   Xid = 3
      COMMIT/*!*/;
      

      So the transaction has the expected GTID in cluster1, and it has the expected GTID for the non-slave nodes in cluster2, but it has an unexpected GTID for the slave node in cluster2.

      Attachments

        Issue Links

          Activity

            anikitin Andrii Nikitin (Inactive) added a comment - - edited

            I confirm the problem with script below, which sets up two local clusters on (3307,3308) and (3310,3311)
            3310 replicates with gtid from 3307
            Below are aliases of the nodes :
            m1 - 3307
            m2 - 3308
            m4 - 3310
            m5 - 3311

            So topology is like below:

            (m1,m2)  - cluster "c1"
             ^ gtid master
             |
             V gtid slave
            (m4,m5) - cluster "c2"
            

            The script will also download and unpack tar 10.1.25 into _depot/m-tar/10.1.25

            set -e
            M7VER=10.1.25
             
            # just use current directory if called from framework
            if [ ! -f common.sh ] ; then
              [ -d mariadb-environs ] || git clone http://github.com/AndriiNikitin/mariadb-environs
              cd mariadb-environs
              ./get_plugin.sh galera
            fi
             
            echo CLEANING UP ...
            c1/cleanup.sh || :
            c2/cleanup.sh || :
            rm -rf c1/*
            rm -rf c2/*
             
            echo GENERATE TEMPLATES ...
            _template/plant_cluster.sh c1
            _template/plant_cluster.sh c2
            echo m1 > c1/nodes.lst
            echo m2 >> c1/nodes.lst
            echo m4 > c2/nodes.lst
            echo m5 >> c2/nodes.lst
            c1/replant.sh $M7VER
            c2/replant.sh $M7VER
             
            echo DOWNLOAD IF NEEDED ...
            ./build_or_download.sh m1
             
            echo INSTALL DATA DIRECTORY ...
            c1/gen_cnf.sh
            c2/gen_cnf.sh
            c1/install_db.sh
            c2/install_db.sh
             
            echo SET UP ACCOUNTS ...
            c1/galera_setup_acl.sh
            c2/galera_setup_acl.sh
             
            echo ENABLE GTID ...
            c1/gen_cnf.sh log-bin=qq log_slave_updates=1
            c2/gen_cnf.sh log-bin=qq log_slave_updates=1 wsrep_gtid_domain_id=5 gtid_domain_id=5 wsrep_gtid_mode=1
             
            echo START CLUSTERS ...
            c1/galera_start_new.sh
            c2/galera_start_new.sh
             
            echo SETUP REPLICATION
            m4*/replicate_gtid.sh m1
             
            echo PERFORM TRANSACTION
            m1*/sql.sh 'create table test.t1(i int)'
             
            m1*/sql.sh 'insert into test.t1 select 1'
            m2*/sql.sh 'insert into test.t1 select 2'
             
            sleep 1
             
            m5*/sql.sh 'create table test.t2(a int)'
            m1*/sql.sh 'insert into test.t1 select 3'
             
            sleep 1
             
            echo CHECK DATA
            c1/sql.sh 'select * from t1'
            c2/sql.sh 'select * from t1'
             
            set +e
             
            echo GTID in m1:
            _depot/m-tar/$M7VER/bin/mysqlbinlog -vvv m1*/dt/qq*.00* | grep -E 'GTID|INSERT|create'
            echo GTID in m2:
            _depot/m-tar/$M7VER/bin/mysqlbinlog -vvv m2*/dt/qq*.00* | grep -E 'GTID|INSERT|create'
            echo GTID in m4:
            _depot/m-tar/$M7VER/bin/mysqlbinlog -vvv m4*/dt/qq*.00* | grep -E 'GTID|INSERT|create'
            echo GTID in m5:
            _depot/m-tar/$M7VER/bin/mysqlbinlog -vvv m5*/dt/qq*.00* | grep -E 'GTID|INSERT|create'
             
            c1/sql.sh 'show variables like "gtid_%_pos"'
            c2/sql.sh 'show variables like "gtid_%_pos"'
            

            Please note that initially gtid is correct in cluster 2 for two first inserts into both nodes of cluster1:

            GTID in m1:
            #170808 21:43:45 server id 1  end_log_pos 650   GTID 0-1-3 ddl
            create table test.t1(i int)
            #170808 21:43:45 server id 1  end_log_pos 778   GTID 0-1-4 trans
            ### INSERT INTO `test`.`t1`
            #170808 21:43:45 server id 2  end_log_pos 918   GTID 0-2-5 trans
            ### INSERT INTO `test`.`t1`
            

            Direct Slave:

            GTID in m4:
            #170808 22:06:30 server id 1  end_log_pos 650   GTID 5-1-3 ddl
            create table test.t1(i int)
            #170808 22:06:30 server id 1  end_log_pos 778   GTID 5-1-4 trans
            ### INSERT INTO `test`.`t1`
            #170808 22:06:30 server id 2  end_log_pos 918   GTID 5-2-5 trans
            ### INSERT INTO `test`.`t1`
            

            And the other node in cluster2:

            GTID in m5:
            #170808 22:06:30 server id 1  end_log_pos 634   GTID 5-1-3 ddl
            create table test.t1(i int)
            #170808 22:06:30 server id 1  end_log_pos 762   GTID 5-1-4 trans
            ### INSERT INTO `test`.`t1`
            #170808 22:06:30 server id 2  end_log_pos 902   GTID 5-2-5 trans
            

            So far so good and gtid works as expected. (5-1-4 and 5-2-5 on both nodes in cluster2). But once we executed some transaction directly in cluster2 - the other node in cluster2 shows different gtid (below 5-1-6 vs 5-1-7):

            Master from cluster 1 (m1):

            #170808 22:06:31 server id 1  end_log_pos 1058  GTID 0-1-6 trans
            ### INSERT INTO `test`.`t1`
            

            Direct Slave from cluster2 (m4):

            #170808 22:06:31 server id 5  end_log_pos 1058  GTID 5-5-6 ddl
            create table test.t2(a int)
            #170808 22:06:31 server id 1  end_log_pos 1186  GTID 5-1-6 trans
            ### INSERT INTO `test`.`t1`
            

            The other node from cluster2 (m5):

            #170808 22:06:31 server id 5  end_log_pos 1042  GTID 5-5-6 ddl
            create table test.t2(a int)
            #170808 22:06:31 server id 1  end_log_pos 1170  GTID 5-1-7 trans
            ### INSERT INTO `test`.`t1`
            

            And content of gtid pos variables at the end:

            m1 :gtid_binlog_pos 0-1-6 gtid_current_pos 0-1-6 gtid_slave_pos
            m2 :gtid_binlog_pos 0-1-6 gtid_current_pos gtid_slave_pos
            m4 :gtid_binlog_pos 5-1-6 gtid_current_pos 0-1-6 gtid_slave_pos 0-1-6
            m5 :gtid_binlog_pos 5-1-7 gtid_current_pos gtid_slave_pos
            

            anikitin Andrii Nikitin (Inactive) added a comment - - edited I confirm the problem with script below, which sets up two local clusters on (3307,3308) and (3310,3311) 3310 replicates with gtid from 3307 Below are aliases of the nodes : m1 - 3307 m2 - 3308 m4 - 3310 m5 - 3311 So topology is like below: (m1,m2) - cluster "c1" ^ gtid master | V gtid slave (m4,m5) - cluster "c2" The script will also download and unpack tar 10.1.25 into _depot/m-tar/10.1.25 set -e M7VER=10.1.25   # just use current directory if called from framework if [ ! -f common.sh ] ; then [ -d mariadb-environs ] || git clone http: //github .com /AndriiNikitin/mariadb-environs cd mariadb-environs . /get_plugin .sh galera fi   echo CLEANING UP ... c1 /cleanup .sh || : c2 /cleanup .sh || : rm -rf c1/* rm -rf c2/*   echo GENERATE TEMPLATES ... _template /plant_cluster .sh c1 _template /plant_cluster .sh c2 echo m1 > c1 /nodes .lst echo m2 >> c1 /nodes .lst echo m4 > c2 /nodes .lst echo m5 >> c2 /nodes .lst c1 /replant .sh $M7VER c2 /replant .sh $M7VER   echo DOWNLOAD IF NEEDED ... . /build_or_download .sh m1   echo INSTALL DATA DIRECTORY ... c1 /gen_cnf .sh c2 /gen_cnf .sh c1 /install_db .sh c2 /install_db .sh   echo SET UP ACCOUNTS ... c1 /galera_setup_acl .sh c2 /galera_setup_acl .sh   echo ENABLE GTID ... c1 /gen_cnf .sh log-bin=qq log_slave_updates=1 c2 /gen_cnf .sh log-bin=qq log_slave_updates=1 wsrep_gtid_domain_id=5 gtid_domain_id=5 wsrep_gtid_mode=1   echo START CLUSTERS ... c1 /galera_start_new .sh c2 /galera_start_new .sh   echo SETUP REPLICATION m4* /replicate_gtid .sh m1   echo PERFORM TRANSACTION m1* /sql .sh 'create table test.t1(i int)'   m1* /sql .sh 'insert into test.t1 select 1' m2* /sql .sh 'insert into test.t1 select 2'   sleep 1   m5* /sql .sh 'create table test.t2(a int)' m1* /sql .sh 'insert into test.t1 select 3'   sleep 1   echo CHECK DATA c1 /sql .sh 'select * from t1' c2 /sql .sh 'select * from t1'   set +e   echo GTID in m1: _depot /m-tar/ $M7VER /bin/mysqlbinlog -vvv m1* /dt/qq *.00* | grep -E 'GTID|INSERT|create' echo GTID in m2: _depot /m-tar/ $M7VER /bin/mysqlbinlog -vvv m2* /dt/qq *.00* | grep -E 'GTID|INSERT|create' echo GTID in m4: _depot /m-tar/ $M7VER /bin/mysqlbinlog -vvv m4* /dt/qq *.00* | grep -E 'GTID|INSERT|create' echo GTID in m5: _depot /m-tar/ $M7VER /bin/mysqlbinlog -vvv m5* /dt/qq *.00* | grep -E 'GTID|INSERT|create'   c1 /sql .sh 'show variables like "gtid_%_pos"' c2 /sql .sh 'show variables like "gtid_%_pos"' Please note that initially gtid is correct in cluster 2 for two first inserts into both nodes of cluster1: GTID in m1: #170808 21:43:45 server id 1 end_log_pos 650 GTID 0-1-3 ddl create table test.t1(i int) #170808 21:43:45 server id 1 end_log_pos 778 GTID 0-1-4 trans ### INSERT INTO `test`.`t1` #170808 21:43:45 server id 2 end_log_pos 918 GTID 0-2-5 trans ### INSERT INTO `test`.`t1` Direct Slave: GTID in m4: #170808 22:06:30 server id 1 end_log_pos 650 GTID 5-1-3 ddl create table test.t1(i int) #170808 22:06:30 server id 1 end_log_pos 778 GTID 5-1-4 trans ### INSERT INTO `test`.`t1` #170808 22:06:30 server id 2 end_log_pos 918 GTID 5-2-5 trans ### INSERT INTO `test`.`t1` And the other node in cluster2: GTID in m5: #170808 22:06:30 server id 1 end_log_pos 634 GTID 5-1-3 ddl create table test.t1(i int) #170808 22:06:30 server id 1 end_log_pos 762 GTID 5-1-4 trans ### INSERT INTO `test`.`t1` #170808 22:06:30 server id 2 end_log_pos 902 GTID 5-2-5 trans So far so good and gtid works as expected. (5-1-4 and 5-2-5 on both nodes in cluster2). But once we executed some transaction directly in cluster2 - the other node in cluster2 shows different gtid (below 5-1-6 vs 5-1-7): Master from cluster 1 (m1): #170808 22:06:31 server id 1 end_log_pos 1058 GTID 0-1-6 trans ### INSERT INTO `test`.`t1` Direct Slave from cluster2 (m4): #170808 22:06:31 server id 5 end_log_pos 1058 GTID 5-5-6 ddl create table test.t2(a int) #170808 22:06:31 server id 1 end_log_pos 1186 GTID 5-1-6 trans ### INSERT INTO `test`.`t1` The other node from cluster2 (m5): #170808 22:06:31 server id 5 end_log_pos 1042 GTID 5-5-6 ddl create table test.t2(a int) #170808 22:06:31 server id 1 end_log_pos 1170 GTID 5-1-7 trans ### INSERT INTO `test`.`t1` And content of gtid pos variables at the end: m1 :gtid_binlog_pos 0-1-6 gtid_current_pos 0-1-6 gtid_slave_pos m2 :gtid_binlog_pos 0-1-6 gtid_current_pos gtid_slave_pos m4 :gtid_binlog_pos 5-1-6 gtid_current_pos 0-1-6 gtid_slave_pos 0-1-6 m5 :gtid_binlog_pos 5-1-7 gtid_current_pos gtid_slave_pos
            przemek@mysqlmaniac.com Przemek added a comment -

            I can confirm this problem with MariaDB Server 10.1.25, also on simpler case - 1 async master and 2 node Galera cluster, where one of the nodes is async slave.
            I was using same gtid_domain_id and wsrep_gtid_domain_id on all nodes, and tried wsrep_gtid_mode=ON and OFF, but problem happens in both.

            Initial state - async replication up and running, based on GTID, from MariaDB node to node0. Node0 and node2 are part of the same Galera cluster:

            MariaDB [(none)]> show global variables like 'gtid_bi%';
            +-------------------+-------+
            | Variable_name     | Value |
            +-------------------+-------+
            | gtid_binlog_pos   |       |
            | gtid_binlog_state |       |
            +-------------------+-------+
            2 rows in set (0.00 sec)
             
            node0>  show global variables like 'gtid_b%';
            +-------------------+-------+
            | Variable_name     | Value |
            +-------------------+-------+
            | gtid_binlog_pos   |       |
            | gtid_binlog_state |       |
            +-------------------+-------+
            2 rows in set (0.00 sec)
             
            node2>  show global variables like '%gtid_b%';
            +-------------------+-------+
            | Variable_name     | Value |
            +-------------------+-------+
            | gtid_binlog_pos   |       |
            | gtid_binlog_state |       |
            +-------------------+-------+
            2 rows in set (0.01 sec)
            

            First insert on async master:

            MariaDB [(none)]> insert into test.t1 values (null);
            Query OK, 1 row affected (0.00 sec)
             
            MariaDB [(none)]> show global variables like 'gtid_bi%';
            +-------------------+---------+
            | Variable_name     | Value   |
            +-------------------+---------+
            | gtid_binlog_pos   | 0-100-1 |
            | gtid_binlog_state | 0-100-1 |
            +-------------------+---------+
            2 rows in set (0.00 sec)
             
            node0>  show global variables like 'gtid_b%';
            +-------------------+---------+
            | Variable_name     | Value   |
            +-------------------+---------+
            | gtid_binlog_pos   | 0-100-1 |
            | gtid_binlog_state | 0-100-1 |
            +-------------------+---------+
            2 rows in set (0.00 sec)
             
            node2> show variables like 'gtid_b%';
            +-------------------+---------+
            | Variable_name     | Value   |
            +-------------------+---------+
            | gtid_binlog_pos   | 0-100-1 |
            | gtid_binlog_state | 0-100-1 |
            +-------------------+---------+
            2 rows in set (0.00 sec)
            

            So far, so good - positions in sync.
            Then, insert on second cluster node:

            node2> insert into test.t1 values (null);
            Query OK, 1 row affected (0.01 sec)
             
            node2>  show global variables like '%gtid_b%';
            +-------------------+----------------+
            | Variable_name     | Value          |
            +-------------------+----------------+
            | gtid_binlog_pos   | 0-12-2         |
            | gtid_binlog_state | 0-100-1,0-12-2 |
            +-------------------+----------------+
            2 rows in set (0.00 sec)
             
            node0>  show global variables like 'gtid_b%';
            +-------------------+----------------+
            | Variable_name     | Value          |
            +-------------------+----------------+
            | gtid_binlog_pos   | 0-12-2         |
            | gtid_binlog_state | 0-100-1,0-12-2 |
            +-------------------+----------------+
            2 rows in set (0.00 sec)
            

            Again so far "looks" good, GTID is consistent with the first node.
            But let's do another insert on async master:

            MariaDB [(none)]> insert into test.t1 values (null);
            Query OK, 1 row affected (0.00 sec)
             
            MariaDB [(none)]> show global variables like 'gtid_bi%';
            +-------------------+---------+
            | Variable_name     | Value   |
            +-------------------+---------+
            | gtid_binlog_pos   | 0-100-2 |
            | gtid_binlog_state | 0-100-2 |
            +-------------------+---------+
            2 rows in set (0.00 sec)
             
            node0>  show global variables like 'gtid_b%';
            +-------------------+----------------+
            | Variable_name     | Value          |
            +-------------------+----------------+
            | gtid_binlog_pos   | 0-100-2        |
            | gtid_binlog_state | 0-12-2,0-100-2 |
            +-------------------+----------------+
            2 rows in set (0.00 sec)
             
            node2>  show global variables like '%gtid_b%';
            +-------------------+----------------+
            | Variable_name     | Value          |
            +-------------------+----------------+
            | gtid_binlog_pos   | 0-100-3        |
            | gtid_binlog_state | 0-12-2,0-100-3 |
            +-------------------+----------------+
            2 rows in set (0.00 sec)
            

            And now - for some reason, position on node2 related to async master notation, increased by two! While data is same on both nodes:

            node0> select * from test.t1;
            +----+
            | id |
            +----+
            |  1 |
            |  2 |
            |  7 |
            +----+
            3 rows in set (0.00 sec)
             
            node2> select * from test.t1;
            +----+
            | id |
            +----+
            |  1 |
            |  2 |
            |  7 |
            +----+
            3 rows in set (0.00 sec)
            

            And the more updates we do inside the cluster, the more inconsistencies are introduced in GTID sequences here.
            For example, after few more inserts:

            node0>  show global variables like 'gtid_b%';
            +-------------------+------------------------------+
            | Variable_name     | Value                        |
            +-------------------+------------------------------+
            | gtid_binlog_pos   | 0-10-9                       |
            | gtid_binlog_state | 0-100-2,0-12-2,0-11-7,0-10-9 |
            +-------------------+------------------------------+
            2 rows in set (0.00 sec)
             
            node2>  show global variables like '%gtid_b%';
            +-------------------+-------------------------------+
            | Variable_name     | Value                         |
            +-------------------+-------------------------------+
            | gtid_binlog_pos   | 0-10-10                       |
            | gtid_binlog_state | 0-100-3,0-12-2,0-11-8,0-10-10 |
            +-------------------+-------------------------------+
            2 rows in set (0.00 sec)
            

            So it seems that when async channel is involved to replicate inside a Galera cluster, even for a brief moment, GTID sequences inside the cluster are getting quite messy.

            przemek@mysqlmaniac.com Przemek added a comment - I can confirm this problem with MariaDB Server 10.1.25, also on simpler case - 1 async master and 2 node Galera cluster, where one of the nodes is async slave. I was using same gtid_domain_id and wsrep_gtid_domain_id on all nodes, and tried wsrep_gtid_mode=ON and OFF, but problem happens in both. Initial state - async replication up and running, based on GTID, from MariaDB node to node0. Node0 and node2 are part of the same Galera cluster: MariaDB [(none)]> show global variables like 'gtid_bi%' ; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | gtid_binlog_pos | | | gtid_binlog_state | | +-------------------+-------+ 2 rows in set ( 0.00 sec)   node0> show global variables like 'gtid_b%' ; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | gtid_binlog_pos | | | gtid_binlog_state | | +-------------------+-------+ 2 rows in set ( 0.00 sec)   node2> show global variables like '%gtid_b%' ; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | gtid_binlog_pos | | | gtid_binlog_state | | +-------------------+-------+ 2 rows in set ( 0.01 sec) First insert on async master: MariaDB [(none)]> insert into test.t1 values ( null ); Query OK, 1 row affected ( 0.00 sec)   MariaDB [(none)]> show global variables like 'gtid_bi%' ; +-------------------+---------+ | Variable_name | Value | +-------------------+---------+ | gtid_binlog_pos | 0 - 100 - 1 | | gtid_binlog_state | 0 - 100 - 1 | +-------------------+---------+ 2 rows in set ( 0.00 sec)   node0> show global variables like 'gtid_b%' ; +-------------------+---------+ | Variable_name | Value | +-------------------+---------+ | gtid_binlog_pos | 0 - 100 - 1 | | gtid_binlog_state | 0 - 100 - 1 | +-------------------+---------+ 2 rows in set ( 0.00 sec)   node2> show variables like 'gtid_b%' ; +-------------------+---------+ | Variable_name | Value | +-------------------+---------+ | gtid_binlog_pos | 0 - 100 - 1 | | gtid_binlog_state | 0 - 100 - 1 | +-------------------+---------+ 2 rows in set ( 0.00 sec) So far, so good - positions in sync. Then, insert on second cluster node: node2> insert into test.t1 values ( null ); Query OK, 1 row affected ( 0.01 sec)   node2> show global variables like '%gtid_b%' ; +-------------------+----------------+ | Variable_name | Value | +-------------------+----------------+ | gtid_binlog_pos | 0 - 12 - 2 | | gtid_binlog_state | 0 - 100 - 1 , 0 - 12 - 2 | +-------------------+----------------+ 2 rows in set ( 0.00 sec)   node0> show global variables like 'gtid_b%' ; +-------------------+----------------+ | Variable_name | Value | +-------------------+----------------+ | gtid_binlog_pos | 0 - 12 - 2 | | gtid_binlog_state | 0 - 100 - 1 , 0 - 12 - 2 | +-------------------+----------------+ 2 rows in set ( 0.00 sec) Again so far "looks" good, GTID is consistent with the first node. But let's do another insert on async master: MariaDB [(none)]> insert into test.t1 values ( null ); Query OK, 1 row affected ( 0.00 sec)   MariaDB [(none)]> show global variables like 'gtid_bi%' ; +-------------------+---------+ | Variable_name | Value | +-------------------+---------+ | gtid_binlog_pos | 0 - 100 - 2 | | gtid_binlog_state | 0 - 100 - 2 | +-------------------+---------+ 2 rows in set ( 0.00 sec)   node0> show global variables like 'gtid_b%' ; +-------------------+----------------+ | Variable_name | Value | +-------------------+----------------+ | gtid_binlog_pos | 0 - 100 - 2 | | gtid_binlog_state | 0 - 12 - 2 , 0 - 100 - 2 | +-------------------+----------------+ 2 rows in set ( 0.00 sec)   node2> show global variables like '%gtid_b%' ; +-------------------+----------------+ | Variable_name | Value | +-------------------+----------------+ | gtid_binlog_pos | 0 - 100 - 3 | | gtid_binlog_state | 0 - 12 - 2 , 0 - 100 - 3 | +-------------------+----------------+ 2 rows in set ( 0.00 sec) And now - for some reason, position on node2 related to async master notation, increased by two! While data is same on both nodes: node0> select * from test.t1; +----+ | id | +----+ | 1 | | 2 | | 7 | +----+ 3 rows in set ( 0.00 sec)   node2> select * from test.t1; +----+ | id | +----+ | 1 | | 2 | | 7 | +----+ 3 rows in set ( 0.00 sec) And the more updates we do inside the cluster, the more inconsistencies are introduced in GTID sequences here. For example, after few more inserts: node0> show global variables like 'gtid_b%' ; +-------------------+------------------------------+ | Variable_name | Value | +-------------------+------------------------------+ | gtid_binlog_pos | 0 - 10 - 9 | | gtid_binlog_state | 0 - 100 - 2 , 0 - 12 - 2 , 0 - 11 - 7 , 0 - 10 - 9 | +-------------------+------------------------------+ 2 rows in set ( 0.00 sec)   node2> show global variables like '%gtid_b%' ; +-------------------+-------------------------------+ | Variable_name | Value | +-------------------+-------------------------------+ | gtid_binlog_pos | 0 - 10 - 10 | | gtid_binlog_state | 0 - 100 - 3 , 0 - 12 - 2 , 0 - 11 - 8 , 0 - 10 - 10 | +-------------------+-------------------------------+ 2 rows in set ( 0.00 sec) So it seems that when async channel is involved to replicate inside a Galera cluster, even for a brief moment, GTID sequences inside the cluster are getting quite messy.

            Hi przemek@mysqlmaniac.com!

            Actually this is also wrong gtid 0-12-2,0-100-2 , But patch for 10715 will fix this issue.

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - Hi przemek@mysqlmaniac.com ! Actually this is also wrong gtid 0-12-2,0-100-2 , But patch for 10715 will fix this issue.

            People

              sachin.setiya.007 Sachin Setiya (Inactive)
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.