[MDEV-27665] MariaDB bi-directional replication cluster to cluster fails with inconsistent GTIDs Created: 2022-01-28  Updated: 2022-01-28

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.5.13
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Ulrich Moser (Inactive) Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: galera, gtid, replication
Environment:

Ubuntu server 20.04 LTS, virtual servers



 Description   

We have a PoC installation consisting of 2 clusters with 2 nodes each. A new datadir was initialized on node11 and a demo DB loaded there. Then node12 was started with an empty datadir executing an SST. Next a backup was taken on node11 and imported to node21. After bootstrapping the second cluster from node21 and starting node22 we face the following situation of GTIDs.
node11:
{{----------------------------------------+

Variable_name Value

----------------------------------------+

gtid_binlog_pos 10-1-17,11-1-41
gtid_binlog_state 10-1-17,11-1-41
gtid_cleanup_batch_size 64
gtid_current_pos 10-1-17,11-1-41
gtid_domain_id 11
gtid_ignore_duplicates OFF
gtid_pos_auto_engines  
gtid_slave_pos  
gtid_strict_mode OFF
wsrep_gtid_domain_id 10
wsrep_gtid_mode ON

----------------------------------------+
}}

node12 identical.

node21:
{{---------------------------------------+

Variable_name Value

---------------------------------------+

gtid_binlog_pos 10-1-22,20-2-5
gtid_binlog_state 10-1-22,20-2-5
gtid_cleanup_batch_size 64
gtid_current_pos 20-2-5
gtid_domain_id 21
gtid_ignore_duplicates OFF
gtid_pos_auto_engines  
gtid_slave_pos  
gtid_strict_mode OFF
wsrep_gtid_domain_id 20
wsrep_gtid_mode ON

---------------------------------------+
}}

node22 again being identical.

First: How can it happen, that node21 and node22 have a higher GTID from cluster 1 (wsrep_gtid_domain_id = 10) and that there are GTIDs from cluster 2 although nothing has been executed on that cluster at all.

We then tried to start a replication with node21 being slave of node11 and node12 being slave of node22. The result of the first replication node11 -> node21 was that the GTID slave_pos on node 21 always reflected the gtid_pos of node11 but the data was not stored to cluster 2 while whatever settings we chose the second replication node22 -> node12 never worked.


Generated at Thu Feb 08 09:54:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.