[MDEV-8458] Galera Cluster replication stream doesn't pass along MariaDB's GTID Created: 2015-07-13  Updated: 2017-12-25  Resolved: 2017-12-25

Status: Closed
Project: MariaDB Server
Component/s: Galera, Replication, wsrep
Affects Version/s: 10.0.20-galera
Fix Version/s: 10.0.34-galera

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Sachin Setiya (Inactive)
Resolution: Won't Fix Votes: 7
Labels: galera, replication

Issue Links:
Problem/Incident
is caused by MDEV-20720 Galera: Replicate MariaDB GTID to oth... Closed
Relates
relates to MDEV-13431 wsrep_gtid_mode uses wrong GTID for t... Closed

 Description   

If a MariaDB Galera Cluster 10.0 cluster is configured as a slave of MariaDB 10.0 using GTID, GTIDs from the master do not propagate to the WSREP replication stream. This makes it difficult to use master_use_gtid=current_pos if you ever need to switch slaves, since the non-slave nodes in the cluster won't be aware of the master's GTIDs.

Let's say we have a replication topology like this:

| MariaDB 10.0 | --- master_use_gtid=current_pos ---> | MariaDB Galera Cluster 10.0 (galera1) | <--- WSREP ---> | MariaDB Galera Cluster 10.0 (galera2) |

On the master:

MariaDB [tmp]> show global variables like 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 2     |
+---------------+-------+
1 row in set (0.00 sec)
 
MariaDB [tmp]> show global variables like 'gtid_domain_id';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| gtid_domain_id | 22    |
+----------------+-------+
1 row in set (0.00 sec)
 
MariaDB [tmp]> show global variables like 'gtid_current_pos';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| gtid_current_pos |       |
+------------------+-------+
1 row in set (0.00 sec)

On galera1:

MariaDB [(none)]> show global variables like 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 1     |
+---------------+-------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show global variables like 'gtid_domain_id';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| gtid_domain_id | 21    |
+----------------+-------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| gtid_current_pos | 21-1-3 |
+------------------+--------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.33
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000001
          Read_Master_Log_Pos: 314
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 603
        Relay_Master_Log_File: mariadb-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 314
              Relay_Log_Space: 904
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 21-1-3
1 row in set (0.00 sec)

On galera2:

MariaDB [(none)]> show global variables like 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 1     |
+---------------+-------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show global variables like 'gtid_domain_id';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| gtid_domain_id | 21    |
+----------------+-------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| gtid_current_pos | 21-1-10 |
+------------------+---------+
1 row in set (0.00 sec)

Now let's insert some data on the master to see how it affects gtid_current_pos:

MariaDB [tmp]> INSERT INTO tmp.test_table VALUES (1, 'str');
Query OK, 1 row affected (0.00 sec)
 
MariaDB [tmp]> show global variables like 'gtid_current_pos';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| gtid_current_pos | 22-2-1 |
+------------------+--------+

On galera1:

MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+---------------+
| Variable_name    | Value         |
+------------------+---------------+
| gtid_current_pos | 22-2-1,21-1-3 |
+------------------+---------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.33
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000001
          Read_Master_Log_Pos: 485
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 774
        Relay_Master_Log_File: mariadb-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 485
              Relay_Log_Space: 1075
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 21-1-3,22-2-1
1 row in set (0.00 sec)

On galera2 the value is now empty:

MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| gtid_current_pos |       |
+------------------+-------+
1 row in set (0.00 sec)



 Comments   
Comment by Geoff Montee (Inactive) [ 2015-07-13 ]

Also, galera1 binlog:

#150713 15:54:49 server id 2  end_log_pos 382   GTID 22-2-1
/*!100001 SET @@session.gtid_domain_id=22*//*!*/;
/*!100001 SET @@session.server_id=2*//*!*/;
/*!100001 SET @@session.gtid_seq_no=1*//*!*/;
BEGIN
/*!*/;
# at 382
# at 433
#150713 15:54:49 server id 2  end_log_pos 433   Table_map: `tmp`.`test_table` mapped to number 70
#150713 15:54:49 server id 2  end_log_pos 471   Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `tmp`.`test_table`
### SET
###   @1=1
###   @2='str'
# at 471
#150713 15:54:49 server id 2  end_log_pos 498   Xid = 21
COMMIT/*!*/;

galera2 binlog:

#150713 15:54:49 server id 2  end_log_pos 398   GTID 21-2-11
/*!100001 SET @@session.gtid_domain_id=21*//*!*/;
/*!100001 SET @@session.server_id=2*//*!*/;
/*!100001 SET @@session.gtid_seq_no=11*//*!*/;
BEGIN
/*!*/;
# at 398
# at 449
#150713 15:54:49 server id 2  end_log_pos 449   Table_map: `tmp`.`test_table` mapped to number 70
#150713 15:54:49 server id 2  end_log_pos 487   Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `tmp`.`test_table`
### SET
###   @1=1
###   @2='str'
# at 487
#150713 15:54:49 server id 2  end_log_pos 514   Xid = 21
COMMIT/*!*/;

Comment by Geoff Montee (Inactive) [ 2015-07-14 ]

This might only happen if binlog_format on the MariaDB 10.0 master is set to STATEMENT.

If it's set to ROW (which is required for Galera Cluster anyway), the GTIDs in the replication events seem to get passed along, and gtid_current_pos is updated.

This might be working as expected.

Comment by Geoff Montee (Inactive) [ 2015-07-14 ]

This does also appear to happen in some instances when binlog_format is set to ROW.

Comment by Geoff Montee (Inactive) [ 2015-07-14 ]

It looks like the way to reproduce this with binlog_format set to ROW is by setting slave_parallel_threads on the slave. Is it required for Galera to set slave_parallel_threads to 0?

e.g. on the master:

MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| gtid_current_pos | 22-2-3 |
+------------------+--------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> INSERT INTO tmp.test_table VALUES (4, 'str');
Query OK, 1 row affected (0.05 sec)
 
MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| gtid_current_pos | 22-2-4 |
+------------------+--------+
1 row in set (0.00 sec)

On galera1:

MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+---------------+
| Variable_name    | Value         |
+------------------+---------------+
| gtid_current_pos | 22-2-4,21-1-3 |
+------------------+---------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show global variables like 'slave_parallel_threads';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| slave_parallel_threads | 2     |
+------------------------+-------+
1 row in set (0.00 sec)

On galera2:

MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| gtid_current_pos |       |
+------------------+-------+
1 row in set (0.00 sec)

master binlog:

#150714 11:00:01 server id 2  end_log_pos 520   GTID 22-2-4
/*!100001 SET @@session.gtid_seq_no=4*//*!*/;
BEGIN
/*!*/;
# at 520
# at 571
#150714 11:00:01 server id 2  end_log_pos 571   Table_map: `tmp`.`test_table` mapped to number 70
#150714 11:00:01 server id 2  end_log_pos 609   Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `tmp`.`test_table`
### SET
###   @1=4
###   @2='str'
# at 609
#150714 11:00:01 server id 2  end_log_pos 636   Xid = 38
COMMIT/*!*/;

galera1 binlog:

#150714 11:00:01 server id 2  end_log_pos 382   GTID 22-2-4
/*!100001 SET @@session.gtid_domain_id=22*//*!*/;
/*!100001 SET @@session.server_id=2*//*!*/;
/*!100001 SET @@session.gtid_seq_no=4*//*!*/;
BEGIN
/*!*/;
# at 382
# at 433
#150714 11:00:01 server id 2  end_log_pos 433   Table_map: `tmp`.`test_table` mapped to number 70
#150714 11:00:01 server id 2  end_log_pos 471   Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `tmp`.`test_table`
### SET
###   @1=4
###   @2='str'
# at 471
#150714 11:00:01 server id 2  end_log_pos 498   Xid = 24
COMMIT/*!*/;

galera2 binlog:

#150714 11:00:01 server id 2  end_log_pos 398   GTID 21-2-14
/*!100001 SET @@session.gtid_domain_id=21*//*!*/;
/*!100001 SET @@session.server_id=2*//*!*/;
/*!100001 SET @@session.gtid_seq_no=14*//*!*/;
BEGIN
/*!*/;
# at 398
# at 449
#150714 11:00:01 server id 2  end_log_pos 449   Table_map: `tmp`.`test_table` mapped to number 70
#150714 11:00:01 server id 2  end_log_pos 487   Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `tmp`.`test_table`
### SET
###   @1=4
###   @2='str'
# at 487
#150714 11:00:01 server id 2  end_log_pos 514   Xid = 24
COMMIT/*!*/;

Comment by Geoff Montee (Inactive) [ 2015-07-14 ]

Same issue with:

binlog_format=ROW
slave_parallel_threads=0
wsrep_slave_threads=2

and:

binlog_format=ROW
slave_parallel_threads=0
wsrep_slave_threads=1

I was able to see the GTID propagate to galera2 yesterday with the second set of values, but I haven't been able to make that happen again today. There might be some other interaction going on here.

Comment by Geoff Montee (Inactive) [ 2015-07-14 ]

This problem might be unrelated to the replication thread.

Leaving out the master server completely, I inserted a value directly into galera1. Galera2 assigned the transaction a different GTID.

galera1:

MariaDB [(none)]> INSERT INTO tmp.test_table VALUES (12, 'str');
Query OK, 1 row affected (0.01 sec)
 
MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+----------------+
| Variable_name    | Value          |
+------------------+----------------+
| gtid_current_pos | 22-2-11,21-1-1 |
+------------------+----------------+
1 row in set (0.01 sec)

galera2:

MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| gtid_current_pos | 21-1-2 |
+------------------+--------+
1 row in set (0.00 sec)

galera1 binlog:

#150714 12:08:59 server id 1  end_log_pos 520   GTID 21-1-1
/*!100001 SET @@session.gtid_domain_id=21*//*!*/;
/*!100001 SET @@session.server_id=1*//*!*/;
/*!100001 SET @@session.gtid_seq_no=1*//*!*/;
BEGIN
/*!*/;
# at 520
# at 571
#150714 12:08:59 server id 1  end_log_pos 571   Table_map: `tmp`.`test_table` mapped to number 70
#150714 12:08:59 server id 1  end_log_pos 609   Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `tmp`.`test_table`
### SET
###   @1=12
###   @2='str'
# at 609
#150714 12:08:59 server id 1  end_log_pos 636   Xid = 31
COMMIT/*!*/;

galera2 binlog:

#150714 12:08:59 server id 1  end_log_pos 506   GTID 21-1-2
/*!100001 SET @@session.server_id=1*//*!*/;
/*!100001 SET @@session.gtid_seq_no=2*//*!*/;
BEGIN
/*!*/;
# at 506
# at 557
#150714 12:08:59 server id 1  end_log_pos 557   Table_map: `tmp`.`test_table` mapped to number 70
#150714 12:08:59 server id 1  end_log_pos 595   Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `tmp`.`test_table`
### SET
###   @1=12
###   @2='str'
# at 595
#150714 12:08:59 server id 1  end_log_pos 622   Xid = 31
COMMIT/*!*/;

Comment by Geoff Montee (Inactive) [ 2016-05-11 ]

Users run into this problem all the time. Should wsrep_gtid_mode (MDEV-6594) be backported to MariaDB Galera Cluster 10.0?

https://mariadb.com/kb/en/mariadb/galera-cluster-system-variables/#wsrep_gtid_mode

Or should this issue just be closed as Won't Fix for 10.0?

Comment by Sachin Setiya (Inactive) [ 2017-12-11 ]

Hi GeoffMontee!

I think this can be done without wsrep_gtid_mode (some modification of 10715 patch). But there are some precaution that user should tale to avoid double apply of events on galera node
, I mean use of Ignore_slave_ids is required , for this patch to work.

Comment by Sachin Setiya (Inactive) [ 2017-12-25 ]

10715 patch in 10.1.31 will fix this. Wont fix for 10.0-galera.

Generated at Thu Feb 08 07:27:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.