Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.1.25
Description
This is similar to MDEV-8458, but this issue also effects MariaDB 10.1 with wsrep_gtid_mode=ON.
When wsrep_gtid_mode is enabled, transactions that are replicated within a cluster by Galera receive a GTID where the domain_id is specified by wsrep_gtid_domain_id, the server_id is specified by server_id, and the seq_no is incremented for each transaction that is committed in the domain.
It does not seem to work this way for transactions that are replicated by an asynchronous slave thread within a Galera cluster.
For example, let's say that we have two clusters and one cluster replicates from the other using GTID replication.
On cluster1, we see the following:
MariaDB [(none)]> show global variables like '%gtid%';
|
+------------------------+-------+
|
| Variable_name | Value |
|
+------------------------+-------+
|
| gtid_binlog_pos | 1-1-1 |
|
| gtid_binlog_state | 1-1-1 |
|
| gtid_current_pos | 1-1-1 |
|
| gtid_domain_id | 1 |
|
| gtid_ignore_duplicates | OFF |
|
| gtid_slave_pos | |
|
| gtid_strict_mode | OFF |
|
| wsrep_gtid_domain_id | 1 |
|
| wsrep_gtid_mode | ON |
|
+------------------------+-------+
|
9 rows in set (0.00 sec)
|
On cluster2, we see the following:
MariaDB [(none)]> show global variables like '%gtid%';
|
+------------------------+-------------+
|
| Variable_name | Value |
|
+------------------------+-------------+
|
| gtid_binlog_pos | 2-2-2 |
|
| gtid_binlog_state | 2-2-2 |
|
| gtid_current_pos | 1-1-1,2-2-2 |
|
| gtid_domain_id | 2 |
|
| gtid_ignore_duplicates | OFF |
|
| gtid_slave_pos | 1-1-1 |
|
| gtid_strict_mode | OFF |
|
| wsrep_gtid_domain_id | 2 |
|
| wsrep_gtid_mode | ON |
|
+------------------------+-------------+
|
9 rows in set (0.00 sec)
|
One node in cluster2 is a slave of one node in cluster1:
MariaDB [(none)]> show slave status\G
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: 172.30.0.32
|
Master_User: repl
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: mariadb-bin.000002
|
Read_Master_Log_Pos: 428
|
Relay_Log_File: ip-172-30-0-46-relay-bin.000002
|
Relay_Log_Pos: 644
|
Relay_Master_Log_File: mariadb-bin.000002
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 428
|
Relay_Log_Space: 951
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 0
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 1
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: Slave_Pos
|
Gtid_IO_Pos: 1-1-1
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: conservative
|
1 row in set (0.00 sec)
|
If we commit a transaction on cluster1, we would expect it to have the GTID 1-1-2 on cluster1, and either 2-1-3 or 2-2-3 on cluster2, depending on whether it uses the server_id of the originating cluster or replaces it with its own. Does that actually happen?:
Let's say that we execute the following on cluster1:
MariaDB [(none)]> insert into db1.tab values (1, 'str1');
|
Query OK, 1 row affected (0.00 sec)
|
What GTID does this transaction have on each cluster?
Here's the binlog event on the node in cluster1 where the transaction originated:
# at 428
|
#170802 16:51:02 server id 1 end_log_pos 466 GTID 1-1-2 trans
|
/*!100001 SET @@session.gtid_seq_no=2*//*!*/;
|
BEGIN
|
/*!*/;
|
# at 466
|
# at 523
|
#170802 16:51:02 server id 1 end_log_pos 523 Annotate_rows:
|
#Q> insert into db1.tab values (1, 'str1')
|
#170802 16:51:02 server id 1 end_log_pos 567 Table_map: `db1`.`tab` mapped to number 18
|
# at 567
|
#170802 16:51:02 server id 1 end_log_pos 606 Write_rows: table id 18 flags: STMT_END_F
|
 |
BINLOG '
|
NjuCWRMBAAAALAAAADcCAAAAABIAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
|
NjuCWRcBAAAAJwAAAF4CAAAAABIAAAAAAAEAAv/8AQAAAARzdHIx
|
'/*!*/;
|
### INSERT INTO `db1`.`tab`
|
### SET
|
### @1=1
|
### @2='str1'
|
# at 606
|
#170802 16:51:02 server id 1 end_log_pos 633 Xid = 10
|
COMMIT/*!*/;
|
And here's the binlog event on the node in cluster2 that is acting as a slave to cluster1:
# at 617
|
#170802 16:51:02 server id 1 end_log_pos 655 GTID 2-1-2 trans
|
/*!100001 SET @@session.server_id=1*//*!*/;
|
/*!100001 SET @@session.gtid_seq_no=2*//*!*/;
|
BEGIN
|
/*!*/;
|
# at 655
|
#170802 16:51:02 server id 1 end_log_pos 699 Table_map: `db1`.`tab` mapped to number 21
|
# at 699
|
#170802 16:51:02 server id 1 end_log_pos 738 Write_rows: table id 21 flags: STMT_END_F
|
 |
BINLOG '
|
NjuCWRMBAAAALAAAALsCAAAAABUAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
|
NjuCWRcBAAAAJwAAAOICAAAAABUAAAAAAAEAAgP8AQAAAARzdHIx
|
'/*!*/;
|
### INSERT INTO `db1`.`tab`
|
### SET
|
### @1=1
|
### @2='str1'
|
# at 738
|
#170802 16:51:02 server id 1 end_log_pos 765 Xid = 3
|
COMMIT/*!*/;
|
And here's the binlog event on another node in cluster2:
# at 576
|
#170802 16:51:02 server id 1 end_log_pos 614 GTID 2-1-3 trans
|
/*!100001 SET @@session.server_id=1*//*!*/;
|
/*!100001 SET @@session.gtid_seq_no=3*//*!*/;
|
BEGIN
|
/*!*/;
|
# at 614
|
#170802 16:51:02 server id 1 end_log_pos 658 Table_map: `db1`.`tab` mapped to number 20
|
# at 658
|
#170802 16:51:02 server id 1 end_log_pos 697 Write_rows: table id 20 flags: STMT_END_F
|
 |
BINLOG '
|
NjuCWRMBAAAALAAAAJICAAAAABQAAAAAAAEAA2RiMQADdGFiAAIDDwKWAAI=
|
NjuCWRcBAAAAJwAAALkCAAAAABQAAAAAAAEAAgP8AQAAAARzdHIx
|
'/*!*/;
|
### INSERT INTO `db1`.`tab`
|
### SET
|
### @1=1
|
### @2='str1'
|
# at 697
|
#170802 16:51:02 server id 1 end_log_pos 724 Xid = 3
|
COMMIT/*!*/;
|
So the transaction has the expected GTID in cluster1, and it has the expected GTID for the non-slave nodes in cluster2, but it has an unexpected GTID for the slave node in cluster2.
Attachments
Issue Links
- causes
-
MDEV-10227 MariaDB Galera cluster gtid's falling out of sync inspite of setting wsrep_gtid_mode=ON
- Closed
- is caused by
-
MDEV-20720 Galera: Replicate MariaDB GTID to other nodes in the cluster
- Closed
- relates to
-
MDEV-8458 Galera Cluster replication stream doesn't pass along MariaDB's GTID
- Closed