Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13431

wsrep_gtid_mode uses wrong GTID for transaction committed by slave thread

    XMLWordPrintable

    Details

      Description

      This is similar to MDEV-8458, but this issue also effects MariaDB 10.1 with wsrep_gtid_mode=ON.

      When wsrep_gtid_mode is enabled, transactions that are replicated within a cluster by Galera receive a GTID where the domain_id is specified by wsrep_gtid_domain_id, the server_id is specified by server_id, and the seq_no is incremented for each transaction that is committed in the domain.

      It does not seem to work this way for transactions that are replicated by an asynchronous slave thread within a Galera cluster.

      For example, let's say that we have two clusters and one cluster replicates from the other using GTID replication.

      On cluster1, we see the following:

      MariaDB [(none)]> show global variables like '%gtid%';
      +------------------------+-------+
      | Variable_name          | Value |
      +------------------------+-------+
      | gtid_binlog_pos        | 1-1-1 |
      | gtid_binlog_state      | 1-1-1 |
      | gtid_current_pos       | 1-1-1 |
      | gtid_domain_id         | 1     |
      | gtid_ignore_duplicates | OFF   |
      | gtid_slave_pos         |       |
      | gtid_strict_mode       | OFF   |
      | wsrep_gtid_domain_id   | 1     |
      | wsrep_gtid_mode        | ON    |
      +------------------------+-------+
      9 rows in set (0.00 sec)
      

      On cluster2, we see the following:

      MariaDB [(none)]> show global variables like '%gtid%';
      +------------------------+-------------+
      | Variable_name          | Value       |
      +------------------------+-------------+
      | gtid_binlog_pos        | 2-2-2       |
      | gtid_binlog_state      | 2-2-2       |
      | gtid_current_pos       | 1-1-1,2-2-2 |
      | gtid_domain_id         | 2           |
      | gtid_ignore_duplicates | OFF         |
      | gtid_slave_pos         | 1-1-1       |
      | gtid_strict_mode       | OFF         |
      | wsrep_gtid_domain_id   | 2           |
      | wsrep_gtid_mode        | ON          |
      +------------------------+-------------+
      9 rows in set (0.00 sec)
      

      One node in cluster2 is a slave of one node in cluster1:

      MariaDB [(none)]> show slave status\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 172.30.0.32
                        Master_User: repl
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000002
                Read_Master_Log_Pos: 428
                     Relay_Log_File: ip-172-30-0-46-relay-bin.000002
                      Relay_Log_Pos: 644
              Relay_Master_Log_File: mariadb-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB:
                Replicate_Ignore_DB:
                 Replicate_Do_Table:
             Replicate_Ignore_Table:
            Replicate_Wild_Do_Table:
        Replicate_Wild_Ignore_Table:
                         Last_Errno: 0
                         Last_Error:
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 428
                    Relay_Log_Space: 951
                    Until_Condition: None
                     Until_Log_File:
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File:
                 Master_SSL_CA_Path:
                    Master_SSL_Cert:
                  Master_SSL_Cipher:
                     Master_SSL_Key:
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error:
                     Last_SQL_Errno: 0
                     Last_SQL_Error:
        Replicate_Ignore_Server_Ids:
                   Master_Server_Id: 1
                     Master_SSL_Crl:
                 Master_SSL_Crlpath:
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 1-1-1
            Replicate_Do_Domain_Ids:
        Replicate_Ignore_Domain_Ids:
                      Parallel_Mode: conservative
      1 row in set (0.00 sec)
      

      If we commit a transaction on cluster1, we would expect it to have the GTID 1-1-2 on cluster1, and either 2-1-3 or 2-2-3 on cluster2, depending on whether it uses the server_id of the originating cluster or replaces it with its own. Does that actually happen?:

      Let's say that we execute the following on cluster1:

      MariaDB [(none)]> insert into db1.tab values (1, 'str1');
      Query OK, 1 row affected (0.00 sec)
      

      What GTID does this transaction have on each cluster?

      Here's the binlog event on the node in cluster1 where the transaction originated:

      # at 428
      #170802 16:51:02 server id 1  end_log_pos 466   GTID 1-1-2 trans
      /*!100001 SET @@session.gtid_seq_no=2*//*!*/;
      BEGIN
      /*!*/;
      # at 466
      # at 523
      #170802 16:51:02 server id 1  end_log_pos 523   Annotate_rows:
      #Q> insert into db1.tab values (1, 'str1')
      #170802 16:51:02 server id 1  end_log_pos 567   Table_map: `db1`.`tab` mapped to number 18
      # at 567
      #170802 16:51:02 server id 1  end_log_pos 606   Write_rows: table id 18 flags: STMT_END_F
       
      BINLOG '
      NjuCWRMBAAAALAAAADcCAAAAABIAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
      NjuCWRcBAAAAJwAAAF4CAAAAABIAAAAAAAEAAv/8AQAAAARzdHIx
      '/*!*/;
      ### INSERT INTO `db1`.`tab`
      ### SET
      ###   @1=1
      ###   @2='str1'
      # at 606
      #170802 16:51:02 server id 1  end_log_pos 633   Xid = 10
      COMMIT/*!*/;
      

      And here's the binlog event on the node in cluster2 that is acting as a slave to cluster1:

      # at 617
      #170802 16:51:02 server id 1  end_log_pos 655   GTID 2-1-2 trans
      /*!100001 SET @@session.server_id=1*//*!*/;
      /*!100001 SET @@session.gtid_seq_no=2*//*!*/;
      BEGIN
      /*!*/;
      # at 655
      #170802 16:51:02 server id 1  end_log_pos 699   Table_map: `db1`.`tab` mapped to number 21
      # at 699
      #170802 16:51:02 server id 1  end_log_pos 738   Write_rows: table id 21 flags: STMT_END_F
       
      BINLOG '
      NjuCWRMBAAAALAAAALsCAAAAABUAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
      NjuCWRcBAAAAJwAAAOICAAAAABUAAAAAAAEAAgP8AQAAAARzdHIx
      '/*!*/;
      ### INSERT INTO `db1`.`tab`
      ### SET
      ###   @1=1
      ###   @2='str1'
      # at 738
      #170802 16:51:02 server id 1  end_log_pos 765   Xid = 3
      COMMIT/*!*/;
      

      And here's the binlog event on another node in cluster2:

      # at 576
      #170802 16:51:02 server id 1  end_log_pos 614   GTID 2-1-3 trans
      /*!100001 SET @@session.server_id=1*//*!*/;
      /*!100001 SET @@session.gtid_seq_no=3*//*!*/;
      BEGIN
      /*!*/;
      # at 614
      #170802 16:51:02 server id 1  end_log_pos 658   Table_map: `db1`.`tab` mapped to number 20
      # at 658
      #170802 16:51:02 server id 1  end_log_pos 697   Write_rows: table id 20 flags: STMT_END_F
       
      BINLOG '
      NjuCWRMBAAAALAAAAJICAAAAABQAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
      NjuCWRcBAAAAJwAAALkCAAAAABQAAAAAAAEAAgP8AQAAAARzdHIx
      '/*!*/;
      ### INSERT INTO `db1`.`tab`
      ### SET
      ###   @1=1
      ###   @2='str1'
      # at 697
      #170802 16:51:02 server id 1  end_log_pos 724   Xid = 3
      COMMIT/*!*/;
      

      So the transaction has the expected GTID in cluster1, and it has the expected GTID for the non-slave nodes in cluster2, but it has an unexpected GTID for the slave node in cluster2.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sachin.setiya.007 Sachin Setiya
                Reporter:
                GeoffMontee Geoff Montee
              • Votes:
                2 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: