[MDEV-20720] Galera: Replicate MariaDB GTID to other nodes in the cluster Created: 2016-08-31  Updated: 2023-11-27  Resolved: 2020-01-29

Status: Closed
Project: MariaDB Server
Component/s: Galera, Replication
Fix Version/s: 10.5.1

Type: Task Priority: Critical
Reporter: Nirbhay Choubey (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 28
Labels: None

Issue Links:
Blocks
is blocked by MDEV-9856 wsrep_gtid_mode requires nodes to hav... Closed
PartOf
includes MDEV-14323 Example mtr test for 2 clusters Closed
Problem/Incident
causes MDEV-6866 Ensure consistency of sequence number... Closed
causes MDEV-8458 Galera Cluster replication stream doe... Closed
causes MDEV-10227 MariaDB Galera cluster gtid's falling... Closed
causes MDEV-13431 wsrep_gtid_mode uses wrong GTID for t... Closed
Relates
relates to MDEV-7984 Galera doesn't ignore duplicates prop... Closed
relates to MDEV-9107 GTID Slave Pos of untrack domain ids ... Closed
relates to MDEV-9315 Xid is only shown for RBR events Closed
relates to MDEV-14153 Implicitly dropped temporary tables c... Closed
relates to MDEV-20769 MASTER_GTID_WAIT to work for replicat... Open
relates to MXS-4819 Failover for async replication betwee... Open
relates to MDEV-9855 log_slave_updates is required for wsr... Closed
relates to MDEV-20715 Implement system variable to disallow... Closed
relates to MXS-2787 Incorrect implementation of Galera re... Closed
Sprint: 10.2.4-4, 10.2.4-1, 10.3.1-2, 10.2.10, 10.2.11, 10.1.30, 10.2.12

 Description   

The MariaDB GTID is currently not transferred to other nodes in the cluster. As a result,
receiving nodes simply use the current gtid_domain_id (or wsrep_gitd_domain_id in 10.1)
and server id to tag the incoming transactions along with galera-assigned sequence number.



 Comments   
Comment by Tim Soderstrom [ 2016-10-19 ]

This sounds similar to what I ran into but it seemed a tad vague. I am running MariaDB 10.1.18. I have a 3 node Galera cluster and an async slave. GTID's are enabled and all nodes have 'log-bin' and 'log-slave-updates' and are using 0 for the domain (the default).

What I found was that all Galera nodes seem to be writing all data to their binary logs, but their GTIDs do not match. I can find things by the transaction-id across all the logs, but if I try to find things by GTID, results are inconsistent. This means I cannot merely re-point the slave server to another node because that node does not have the same GTID information as the current master and, thus, the slave does not no where to begin.

It sounds like this issue applies to this bug? I see the target is 10.2. If so, it would be ideal if it was reflected in the KB documentation for 10.1?

Comment by Andrew Garner [ 2016-10-19 ]

Tim, I think you may be running into MDEV-10944 (a 10.1.18 regression). Although there are a myriad of other ways to get MariaDB GTIDs out of sync in a Galera cluster, even without that regression.

Comment by Tim Soderstrom [ 2016-10-19 ]

Doh you are right, that sounds exactly like our problem. Bug search fail on my part - thank you for providing that!

Comment by Michaël de groot [ 2016-12-28 ]

Hi!

I also noticed that the initial GTID is not passed on with all SST methods. If I remember correctly, only rsync SST will sync it.

I think this implementation should make fixing that in SST unneeded, could you please confirm that?

Thanks,
Michaël

Comment by Michaël de groot [ 2017-03-13 ]

sachin.setiya.007 can you please tell me why this issue is stalled? This is very important issue to fix. Right now it is very inconvenient to replicate 1 galera cluster to another, and with circular replication between 2 galera clusters it becomes a real pain.

Comment by Geoff Montee (Inactive) [ 2017-08-04 ]

I assume that "Fix Version/s: 10.2" is not accurate anymore. Since 10.2 is already GA, I assume this would go into MariaDB 10.3 at the earliest. Is that correct?

Comment by Sachin Setiya (Inactive) [ 2017-09-11 ]

I am occupied by galera merges and , galera bugs ,so I did not get the time to do this , I will again start working on this hopefully in next week.

Comment by Sachin Setiya (Inactive) [ 2017-10-10 ]

http://lists.askmonty.org/pipermail/commits/2017-October/011552.html

Comment by Michaël de groot [ 2017-10-10 ]

Cool, very nice that this issue is finally getting done! Thank you sachin.setiya.007.

In the tests, please consider circular asynchronous replication between 2 or more Galera clusters:

Cluster 1: A <> B <> C
Cluster 2: D <> E <> F

All nodes have log_slave_updates enabled. Bidirectional asynchronous replication is between node A and node D. Writes originate from, for example, node B.
Node D goes down. With this change, we should now be able to change the streams easily:
On node A: STOP SLAVE; CHANGE MASTER TO MASTER_HOST='e'; START SLAVE;
On node E: CHANGE MASTER TO MASTER_HOST='a', MASTER_USER='repl', MASTER_PASSWORD='insecure', MASTER_USE_GTID=slave_pos; START SLAVE;

Can you please make sure this scenario is tested?

sachin.setiya.007 maybe the implementation done here is not enough for this use case. How does node e recognize transactions that originated from node D? Maybe we need to set up ignore domain ID on the asynchronous replication stream?

Comment by Sachin Setiya (Inactive) [ 2017-10-12 ]

Hi michaeldg,

Writing a mtr test case for this situation is bit difficult. But I will try to simulate this on vms

Regards
sachin

Comment by Andrei Elkin [ 2017-10-24 ]

Sachin, hello.

Please check out a review mail I sent out.

Cheers,

Andrei.

Comment by Sachin Setiya (Inactive) [ 2017-11-28 ]

Status Update:-

Actually test case for 2X3 node galera cluster has been created, But this tests fails because of
issue with rpl_slave_state:hash.
More information of this bug Problem

Comment by Sachin Setiya (Inactive) [ 2017-11-28 ]

Branch Buildbot

Comment by Sachin Setiya (Inactive) [ 2017-12-04 ]

Status Update:- All issue solved.

So the problem was suppose A cluster like this

A <-> B <-> C (Galera Cluster 1)

(Circular normal replication between A < – > D (no galera))

D <-> E <-> F (Galera Cluster 2)

So the event group arriving from B , C was applied 2 times on A (similarly for event group of E, F to D).
Reason being Galera event group does not contain GTID_LOG_EVENT , so say when A recieved an event group from B its rpl_slave_state::hash(gtid_slave_pos) is not updated, So when A gets the same event group from D(because of circular replication) It will apply this event again. If we set ignore_server_ids while setting circular replication this problem can be solved.
A will ignore server id of B,C And D will ignore server id of E , F. Replicate-same-sever-id shuould be turned off.

Comment by Sachin Setiya (Inactive) [ 2017-12-11 ]

There is one more constraint, In the case of master slave replication to gtid cluster. Or Gtid_cluster to gtid_cluster (async or may be circular replication ) , Cluster should have different domain id wrt to master or slave

Comment by Sachin Setiya (Inactive) [ 2017-12-25 ]

http://lists.askmonty.org/pipermail/commits/2017-December/011761.html

Comment by Mark Stoute [ 2018-04-20 ]

Thank you for this fix.
I upgraded my production cluster to 10.1.32 via rolling-restart, and found GTIDs out of sync, and wsrep_provider_version is still behind (25.3.18(r3632)). Prod cluster was initially bootstrapped as v 10.1.18.

In a dev cluster where I bootstrapped the cluster from 10.1.32, GTIDs are in sync and wsrep_provider_version is higher (25.3.23(r3789)).

Is it true that in order to have my production cluster have GTIDs in sync, I will need to bootstrap with 10.1.32?

Comment by Sachin Setiya (Inactive) [ 2018-05-03 ]

There is been some confusion, the gtid has been transferred between nodes only of the cluster is async slave , If we want to transfer gtid inside of write set that will be bugger change , and will involve changing galera code
by change galera gtid to become same gtid format as of mariadb and use this gtid in commit instead of generating gtid.

Comment by Arjen Lentz [ 2018-07-09 ]

sachin.setiya.007 it would be ok if Galera just passed the MariaDB GTID around as-is (as an extra arbitrary field as part of a commit), so it will be stored in each binlog. That would not require Galera to start using MariaDB GTIDs. Just see them as separate: Galera GTID and MariaDB GTID.
The issue is that right now, what's happening with say MDEV-14153 is just horrendous.

Comment by Sachin Setiya (Inactive) [ 2018-07-11 ]

Hi arjen

Actually that wont work , because lets mariadb server some how has to generate gtid in sync, lets say we have 3 node cluster with each node gtid 1-1-1 , and then we do simultaneous write on node 1 and node 2, So both will generate gtid 1-1-2 and this will be wrong sequence. So we need galera to manage gtid , since it is transaction coordinator not mariadb

Comment by Arjen Lentz [ 2018-07-11 ]

I'm sorry sachin.setiya.007 but that's just not correct. Remember that GTID also works in an async replication and master-master configuration.
The format is S-D-# where S is the server-id (which should be unique in the cluster or replication environment), D for the replication domain (see the MariaDB docs, it tends to be 0 by default unless the application sets it to something else, and # for the # going up within that.
So for your example, you'd actually see something like 1-0-1 and 2-0-1 on the two different servers, which is a perfectly correct flow of things, and the next transactions written on the servers after that will be something like 1-0-2 and 2-0-2.
Hope this clarifies.

Comment by Sachin Setiya (Inactive) [ 2018-07-11 ]

When we have different domain id , then user ensures that the bingol events don't conflict each other , but this is not the case with galera, galera can handle conflicts , so I think within one cluster we should have one domain id, and this is what galera internally does , it has one uuid for one cluster

Comment by Daniel Black [ 2018-07-11 ]

Format is D-S-# and I'm fairly sure arjen is talking about different server IDs on each galera node (despite a little dyslexia).

Comment by Sachin Setiya (Inactive) [ 2018-07-11 ]

danblack, right Format is D-S-X , Actually my first comment is slightly wrong , each node will have gtid 1(constant)-X(node server id) -Y(seq no ), server id will be different on each node , but still seq no will be wrt to domain id

Comment by Arjen Lentz [ 2018-07-11 ]

yes thanks Dan - I had it right in a blogpost the other day.

sachin.setiya.007 The seq# component its own is not unique, it's the GTID as a whole that needs to be unique.
The UUID you're referring to is the Galera cluster identifier, which is indeed a single unique ID across the entire cluster - it never changes; this is how a node can see whether it belongs in a cluster or not. If you bootstrap a new cluster, a new UUID is generated.

Comment by Sachin Setiya (Inactive) [ 2018-07-11 ]

arjen, I never said that the sequence no is unique its own , it is unique with respect to domain id , For example 1-1-1 and 1-2-1 and conflicting gtid , However 1-1-1 and 2-1-1 and perfectly okay gtid https://mariadb.com/kb/en/library/gtid/#the-domain-id

Comment by Daniel Black [ 2018-07-12 ]

GTIDs need to pass through the cluster. Consider this requirement:

  • A DB connection occurs through a DB load balancer, at the end of the updating a user's profile transaction, the GTID is selected by the application
  • The gtid is placed in the web session information for that user.
  • The user in the next web fetches a new web page going though the load balancer to a different cluster member (or even async slave for that matter).
  • Because galera transactions or async slaves aren't applied immediately, a query of the user's profile may retrieve an out of date version. To prevent this the DB application should be able to

    SELECT master_gtid_wait(@gtid, 0.1)

    to ensure it has the latest data that the user previously updated (it can deal with the timeout).

I'm sure I'm not the only one of the 19 voters and 31 watchers wanting this.

There should be no need for the application to consider that a *G*TID is anything but a global identifier.

Galera needs to ensure that the sequential visibility in applying each D-S pair (i.e 0-1-33 isn't visible when 0-1-22 isn't) so of course the gtid needs to be transferred in the writeset.
Galera should handle that 0-1-33 and 0-2-33 are unique transactions from different servers no matter what the replication process was taken to deliver them.

Each server has its own server-id and can be responsible for GTID generation without coordination. If the certification fails then the server skips a GTID value. The galera GTID has a different purpose so its needed to stay independent.

If that server is part of the cluster then the galera mechanism can ensure that can be applied without conflict however this is independent on what the GTID actually is.

Comment by Valerii Kravchuk [ 2018-08-10 ]

We should also consider the case of ALTER running on node by node in RSU mode. We should end up with consisetnt GTIDs in cluster after this, or invent some workaround (do not generate local GTIDs while in RSU mode, request to do everything with sql_log_bin=0?).

Comment by Kristian Nielsen [ 2018-08-10 ]

RSU=rolling schema upgrade, perhaps?

If you want the ALTERs to replicate to async slaves not part of the cluster, the GTID way is to binlog the ALTER in a separate domain id (SET SESSION gtid_domain_id=xxx). This will make them independent of the normal binlog stream. Grab the @@last_gtid from the first node, and use it to set server_id / gtid_seq_no on the other nodes to get the same GTID on all nodes for the ALTER.

If you do not want the ALTERs to replicate async to slaves, SET SESSION sql_log_bin=0 is the way.

Comment by Sylvain ARBAUDIE [ 2019-07-12 ]

Would there be any issue using the galera seqno as the last part of the mariadb gtid ? apart for RSU DDL i mean ?

Comment by Teemu Ollakka [ 2019-07-15 ]

There are at least two major issues which need to be resolved in order to use Galera seqno as part of the MariaDB GTID:

  1. Occasionally Galera seqno is generated for a write set which do not commit a transaction, these include (but not limited to) write sets that fail certification and intermediate streaming replication fragments. In order to keep GTID sequences continuous, all of these events should be logged in binlog as dummy events, which could cause excessive clutter under certain workloads.
  2. Master-slave topology where Galera cluster acts as a slave: It is required that the original GTID from the master should be preserved in binlog events. However, as Galera will generate a write set/seqno for the applied transaction, there will be two GTIDs which should be persisted in binlog for each transaction. It is not clear how this could be handled to preserve compatibility with async master/slave replication.
Comment by Geoff Montee (Inactive) [ 2019-10-02 ]

A feature like MDEV-20715 could also improve Galera's support for MariaDB GTIDs. Specifically, it could prevent each node from generating GTIDs for local transactions, which could make it easier for replication slaves to use any cluster node as master, without risking inconsistent GTIDs.

Comment by Sachin Setiya (Inactive) [ 2020-01-21 ]

Okay to push

Comment by Ian Gilfillan [ 2020-03-29 ]

The pull request linked with this issue is still marked as open, although the task has been closed.

Generated at Thu Feb 08 09:01:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.