The MariaDB GTID is currently not transferred to other nodes in the cluster. As a result,
receiving nodes simply use the current gtid_domain_id (or wsrep_gitd_domain_id in 10.1)
and server id to tag the incoming transactions along with galera-assigned sequence number.
Attachments
Issue Links
causes
MDEV-6866Ensure consistency of sequence number in Galera GTID Replication
Closed
MDEV-8458Galera Cluster replication stream doesn't pass along MariaDB's GTID
Closed
MDEV-10227MariaDB Galera cluster gtid's falling out of sync inspite of setting wsrep_gtid_mode=ON
Closed
MDEV-13431wsrep_gtid_mode uses wrong GTID for transaction committed by slave thread
This sounds similar to what I ran into but it seemed a tad vague. I am running MariaDB 10.1.18. I have a 3 node Galera cluster and an async slave. GTID's are enabled and all nodes have 'log-bin' and 'log-slave-updates' and are using 0 for the domain (the default).
What I found was that all Galera nodes seem to be writing all data to their binary logs, but their GTIDs do not match. I can find things by the transaction-id across all the logs, but if I try to find things by GTID, results are inconsistent. This means I cannot merely re-point the slave server to another node because that node does not have the same GTID information as the current master and, thus, the slave does not no where to begin.
It sounds like this issue applies to this bug? I see the target is 10.2. If so, it would be ideal if it was reflected in the KB documentation for 10.1?
Tim Soderstrom
added a comment - This sounds similar to what I ran into but it seemed a tad vague. I am running MariaDB 10.1.18. I have a 3 node Galera cluster and an async slave. GTID's are enabled and all nodes have 'log-bin' and 'log-slave-updates' and are using 0 for the domain (the default).
What I found was that all Galera nodes seem to be writing all data to their binary logs, but their GTIDs do not match. I can find things by the transaction-id across all the logs, but if I try to find things by GTID, results are inconsistent. This means I cannot merely re-point the slave server to another node because that node does not have the same GTID information as the current master and, thus, the slave does not no where to begin.
It sounds like this issue applies to this bug? I see the target is 10.2. If so, it would be ideal if it was reflected in the KB documentation for 10.1?
Tim, I think you may be running into MDEV-10944 (a 10.1.18 regression). Although there are a myriad of other ways to get MariaDB GTIDs out of sync in a Galera cluster, even without that regression.
Andrew Garner
added a comment - Tim, I think you may be running into MDEV-10944 (a 10.1.18 regression). Although there are a myriad of other ways to get MariaDB GTIDs out of sync in a Galera cluster, even without that regression.
I also noticed that the initial GTID is not passed on with all SST methods. If I remember correctly, only rsync SST will sync it.
I think this implementation should make fixing that in SST unneeded, could you please confirm that?
Thanks,
Michaël
Michaël de groot
added a comment - Hi!
I also noticed that the initial GTID is not passed on with all SST methods. If I remember correctly, only rsync SST will sync it.
I think this implementation should make fixing that in SST unneeded, could you please confirm that?
Thanks,
Michaël
sachin.setiya.007 can you please tell me why this issue is stalled? This is very important issue to fix. Right now it is very inconvenient to replicate 1 galera cluster to another, and with circular replication between 2 galera clusters it becomes a real pain.
Michaël de groot
added a comment - sachin.setiya.007 can you please tell me why this issue is stalled? This is very important issue to fix. Right now it is very inconvenient to replicate 1 galera cluster to another, and with circular replication between 2 galera clusters it becomes a real pain.
I assume that "Fix Version/s: 10.2" is not accurate anymore. Since 10.2 is already GA, I assume this would go into MariaDB 10.3 at the earliest. Is that correct?
Geoff Montee (Inactive)
added a comment - I assume that "Fix Version/s: 10.2" is not accurate anymore. Since 10.2 is already GA, I assume this would go into MariaDB 10.3 at the earliest. Is that correct?
I am occupied by galera merges and , galera bugs ,so I did not get the time to do this , I will again start working on this hopefully in next week.
Sachin Setiya (Inactive)
added a comment - I am occupied by galera merges and , galera bugs ,so I did not get the time to do this , I will again start working on this hopefully in next week.
Cool, very nice that this issue is finally getting done! Thank you sachin.setiya.007.
In the tests, please consider circular asynchronous replication between 2 or more Galera clusters:
Cluster 1: A <> B <> C
Cluster 2: D <> E <> F
All nodes have log_slave_updates enabled. Bidirectional asynchronous replication is between node A and node D. Writes originate from, for example, node B.
Node D goes down. With this change, we should now be able to change the streams easily:
On node A: STOP SLAVE; CHANGE MASTER TO MASTER_HOST='e'; START SLAVE;
On node E: CHANGE MASTER TO MASTER_HOST='a', MASTER_USER='repl', MASTER_PASSWORD='insecure', MASTER_USE_GTID=slave_pos; START SLAVE;
Can you please make sure this scenario is tested?
sachin.setiya.007 maybe the implementation done here is not enough for this use case. How does node e recognize transactions that originated from node D? Maybe we need to set up ignore domain ID on the asynchronous replication stream?
Michaël de groot
added a comment - - edited Cool, very nice that this issue is finally getting done! Thank you sachin.setiya.007 .
In the tests, please consider circular asynchronous replication between 2 or more Galera clusters:
Cluster 1: A <> B <> C
Cluster 2: D <> E <> F
All nodes have log_slave_updates enabled. Bidirectional asynchronous replication is between node A and node D. Writes originate from, for example, node B.
Node D goes down. With this change, we should now be able to change the streams easily:
On node A: STOP SLAVE; CHANGE MASTER TO MASTER_HOST='e'; START SLAVE;
On node E: CHANGE MASTER TO MASTER_HOST='a', MASTER_USER='repl', MASTER_PASSWORD='insecure', MASTER_USE_GTID=slave_pos; START SLAVE;
Can you please make sure this scenario is tested?
sachin.setiya.007 maybe the implementation done here is not enough for this use case. How does node e recognize transactions that originated from node D? Maybe we need to set up ignore domain ID on the asynchronous replication stream?
Writing a mtr test case for this situation is bit difficult. But I will try to simulate this on vms
Regards
sachin
Sachin Setiya (Inactive)
added a comment - Hi michaeldg ,
Writing a mtr test case for this situation is bit difficult. But I will try to simulate this on vms
Regards
sachin
Actually test case for 2X3 node galera cluster has been created, But this tests fails because of
issue with rpl_slave_state:hash.
More information of this bug Problem
Sachin Setiya (Inactive)
added a comment - Status Update:-
Actually test case for 2X3 node galera cluster has been created, But this tests fails because of
issue with rpl_slave_state:hash.
More information of this bug Problem
(Circular normal replication between A < – > D (no galera))
D <-> E <-> F (Galera Cluster 2)
So the event group arriving from B , C was applied 2 times on A (similarly for event group of E, F to D).
Reason being Galera event group does not contain GTID_LOG_EVENT , so say when A recieved an event group from B its rpl_slave_state::hash(gtid_slave_pos) is not updated, So when A gets the same event group from D(because of circular replication) It will apply this event again. If we set ignore_server_ids while setting circular replication this problem can be solved.
A will ignore server id of B,C And D will ignore server id of E , F. Replicate-same-sever-id shuould be turned off.
Sachin Setiya (Inactive)
added a comment - - edited Status Update:- All issue solved.
So the problem was suppose A cluster like this
A <- > B < -> C (Galera Cluster 1)
(Circular normal replication between A < – > D (no galera))
D <- > E < -> F (Galera Cluster 2)
So the event group arriving from B , C was applied 2 times on A (similarly for event group of E, F to D).
Reason being Galera event group does not contain GTID_LOG_EVENT , so say when A recieved an event group from B its rpl_slave_state::hash(gtid_slave_pos) is not updated, So when A gets the same event group from D(because of circular replication) It will apply this event again. If we set ignore_server_ids while setting circular replication this problem can be solved.
A will ignore server id of B,C And D will ignore server id of E , F. Replicate-same-sever-id shuould be turned off.
There is one more constraint, In the case of master slave replication to gtid cluster. Or Gtid_cluster to gtid_cluster (async or may be circular replication ) , Cluster should have different domain id wrt to master or slave
Sachin Setiya (Inactive)
added a comment - There is one more constraint, In the case of master slave replication to gtid cluster. Or Gtid_cluster to gtid_cluster (async or may be circular replication ) , Cluster should have different domain id wrt to master or slave
Thank you for this fix.
I upgraded my production cluster to 10.1.32 via rolling-restart, and found GTIDs out of sync, and wsrep_provider_version is still behind (25.3.18(r3632)). Prod cluster was initially bootstrapped as v 10.1.18.
In a dev cluster where I bootstrapped the cluster from 10.1.32, GTIDs are in sync and wsrep_provider_version is higher (25.3.23(r3789)).
Is it true that in order to have my production cluster have GTIDs in sync, I will need to bootstrap with 10.1.32?
Mark Stoute
added a comment - Thank you for this fix.
I upgraded my production cluster to 10.1.32 via rolling-restart, and found GTIDs out of sync, and wsrep_provider_version is still behind (25.3.18(r3632)). Prod cluster was initially bootstrapped as v 10.1.18.
In a dev cluster where I bootstrapped the cluster from 10.1.32, GTIDs are in sync and wsrep_provider_version is higher (25.3.23(r3789)).
Is it true that in order to have my production cluster have GTIDs in sync, I will need to bootstrap with 10.1.32?
There is been some confusion, the gtid has been transferred between nodes only of the cluster is async slave , If we want to transfer gtid inside of write set that will be bugger change , and will involve changing galera code
by change galera gtid to become same gtid format as of mariadb and use this gtid in commit instead of generating gtid.
Sachin Setiya (Inactive)
added a comment - There is been some confusion, the gtid has been transferred between nodes only of the cluster is async slave , If we want to transfer gtid inside of write set that will be bugger change , and will involve changing galera code
by change galera gtid to become same gtid format as of mariadb and use this gtid in commit instead of generating gtid.
sachin.setiya.007 it would be ok if Galera just passed the MariaDB GTID around as-is (as an extra arbitrary field as part of a commit), so it will be stored in each binlog. That would not require Galera to start using MariaDB GTIDs. Just see them as separate: Galera GTID and MariaDB GTID.
The issue is that right now, what's happening with say MDEV-14153 is just horrendous.
Arjen Lentz
added a comment - sachin.setiya.007 it would be ok if Galera just passed the MariaDB GTID around as-is (as an extra arbitrary field as part of a commit), so it will be stored in each binlog. That would not require Galera to start using MariaDB GTIDs. Just see them as separate: Galera GTID and MariaDB GTID.
The issue is that right now, what's happening with say MDEV-14153 is just horrendous.
Actually that wont work , because lets mariadb server some how has to generate gtid in sync, lets say we have 3 node cluster with each node gtid 1-1-1 , and then we do simultaneous write on node 1 and node 2, So both will generate gtid 1-1-2 and this will be wrong sequence. So we need galera to manage gtid , since it is transaction coordinator not mariadb
Sachin Setiya (Inactive)
added a comment - Hi arjen
Actually that wont work , because lets mariadb server some how has to generate gtid in sync, lets say we have 3 node cluster with each node gtid 1-1-1 , and then we do simultaneous write on node 1 and node 2, So both will generate gtid 1-1-2 and this will be wrong sequence. So we need galera to manage gtid , since it is transaction coordinator not mariadb
I'm sorry sachin.setiya.007 but that's just not correct. Remember that GTID also works in an async replication and master-master configuration.
The format is S-D-# where S is the server-id (which should be unique in the cluster or replication environment), D for the replication domain (see the MariaDB docs, it tends to be 0 by default unless the application sets it to something else, and # for the # going up within that.
So for your example, you'd actually see something like 1-0-1 and 2-0-1 on the two different servers, which is a perfectly correct flow of things, and the next transactions written on the servers after that will be something like 1-0-2 and 2-0-2.
Hope this clarifies.
Arjen Lentz
added a comment - I'm sorry sachin.setiya.007 but that's just not correct. Remember that GTID also works in an async replication and master-master configuration.
The format is S-D-# where S is the server-id (which should be unique in the cluster or replication environment), D for the replication domain (see the MariaDB docs, it tends to be 0 by default unless the application sets it to something else, and # for the # going up within that.
So for your example, you'd actually see something like 1-0-1 and 2-0-1 on the two different servers, which is a perfectly correct flow of things, and the next transactions written on the servers after that will be something like 1-0-2 and 2-0-2.
Hope this clarifies.
When we have different domain id , then user ensures that the bingol events don't conflict each other , but this is not the case with galera, galera can handle conflicts , so I think within one cluster we should have one domain id, and this is what galera internally does , it has one uuid for one cluster
Sachin Setiya (Inactive)
added a comment - When we have different domain id , then user ensures that the bingol events don't conflict each other , but this is not the case with galera, galera can handle conflicts , so I think within one cluster we should have one domain id, and this is what galera internally does , it has one uuid for one cluster
Format is D-S-# and I'm fairly sure arjen is talking about different server IDs on each galera node (despite a little dyslexia).
Daniel Black
added a comment - Format is D-S-# and I'm fairly sure arjen is talking about different server IDs on each galera node (despite a little dyslexia).
danblack, right Format is D-S-X , Actually my first comment is slightly wrong , each node will have gtid 1(constant)-X(node server id) -Y(seq no ), server id will be different on each node , but still seq no will be wrt to domain id
Sachin Setiya (Inactive)
added a comment - danblack , right Format is D-S-X , Actually my first comment is slightly wrong , each node will have gtid 1(constant)-X(node server id) -Y(seq no ), server id will be different on each node , but still seq no will be wrt to domain id
yes thanks Dan - I had it right in a blogpost the other day.
sachin.setiya.007 The seq# component its own is not unique, it's the GTID as a whole that needs to be unique.
The UUID you're referring to is the Galera cluster identifier, which is indeed a single unique ID across the entire cluster - it never changes; this is how a node can see whether it belongs in a cluster or not. If you bootstrap a new cluster, a new UUID is generated.
Arjen Lentz
added a comment - yes thanks Dan - I had it right in a blogpost the other day.
sachin.setiya.007 The seq# component its own is not unique, it's the GTID as a whole that needs to be unique.
The UUID you're referring to is the Galera cluster identifier, which is indeed a single unique ID across the entire cluster - it never changes; this is how a node can see whether it belongs in a cluster or not. If you bootstrap a new cluster, a new UUID is generated.
arjen, I never said that the sequence no is unique its own , it is unique with respect to domain id , For example 1-1-1 and 1-2-1 and conflicting gtid , However 1-1-1 and 2-1-1 and perfectly okay gtid https://mariadb.com/kb/en/library/gtid/#the-domain-id
Sachin Setiya (Inactive)
added a comment - - edited arjen , I never said that the sequence no is unique its own , it is unique with respect to domain id , For example 1-1-1 and 1-2-1 and conflicting gtid , However 1-1-1 and 2-1-1 and perfectly okay gtid https://mariadb.com/kb/en/library/gtid/#the-domain-id
GTIDs need to pass through the cluster. Consider this requirement:
A DB connection occurs through a DB load balancer, at the end of the updating a user's profile transaction, the GTID is selected by the application
The gtid is placed in the web session information for that user.
The user in the next web fetches a new web page going though the load balancer to a different cluster member (or even async slave for that matter).
Because galera transactions or async slaves aren't applied immediately, a query of the user's profile may retrieve an out of date version. To prevent this the DB application should be able to
SELECT master_gtid_wait(@gtid, 0.1)
to ensure it has the latest data that the user previously updated (it can deal with the timeout).
I'm sure I'm not the only one of the 19 voters and 31 watchers wanting this.
There should be no need for the application to consider that a *G*TID is anything but a global identifier.
Galera needs to ensure that the sequential visibility in applying each D-S pair (i.e 0-1-33 isn't visible when 0-1-22 isn't) so of course the gtid needs to be transferred in the writeset.
Galera should handle that 0-1-33 and 0-2-33 are unique transactions from different servers no matter what the replication process was taken to deliver them.
Each server has its own server-id and can be responsible for GTID generation without coordination. If the certification fails then the server skips a GTID value. The galera GTID has a different purpose so its needed to stay independent.
If that server is part of the cluster then the galera mechanism can ensure that can be applied without conflict however this is independent on what the GTID actually is.
Daniel Black
added a comment - GTIDs need to pass through the cluster. Consider this requirement:
A DB connection occurs through a DB load balancer, at the end of the updating a user's profile transaction, the GTID is selected by the application
The gtid is placed in the web session information for that user.
The user in the next web fetches a new web page going though the load balancer to a different cluster member (or even async slave for that matter).
Because galera transactions or async slaves aren't applied immediately, a query of the user's profile may retrieve an out of date version. To prevent this the DB application should be able to
SELECT master_gtid_wait(@gtid, 0.1)
to ensure it has the latest data that the user previously updated (it can deal with the timeout).
I'm sure I'm not the only one of the 19 voters and 31 watchers wanting this.
There should be no need for the application to consider that a *G*TID is anything but a global identifier.
Galera needs to ensure that the sequential visibility in applying each D-S pair (i.e 0-1-33 isn't visible when 0-1-22 isn't) so of course the gtid needs to be transferred in the writeset.
Galera should handle that 0-1-33 and 0-2-33 are unique transactions from different servers no matter what the replication process was taken to deliver them.
Each server has its own server-id and can be responsible for GTID generation without coordination. If the certification fails then the server skips a GTID value. The galera GTID has a different purpose so its needed to stay independent.
If that server is part of the cluster then the galera mechanism can ensure that can be applied without conflict however this is independent on what the GTID actually is.
We should also consider the case of ALTER running on node by node in RSU mode. We should end up with consisetnt GTIDs in cluster after this, or invent some workaround (do not generate local GTIDs while in RSU mode, request to do everything with sql_log_bin=0?).
Valerii Kravchuk
added a comment - - edited We should also consider the case of ALTER running on node by node in RSU mode. We should end up with consisetnt GTIDs in cluster after this, or invent some workaround (do not generate local GTIDs while in RSU mode, request to do everything with sql_log_bin=0?).
If you want the ALTERs to replicate to async slaves not part of the cluster, the GTID way is to binlog the ALTER in a separate domain id (SET SESSION gtid_domain_id=xxx). This will make them independent of the normal binlog stream. Grab the @@last_gtid from the first node, and use it to set server_id / gtid_seq_no on the other nodes to get the same GTID on all nodes for the ALTER.
If you do not want the ALTERs to replicate async to slaves, SET SESSION sql_log_bin=0 is the way.
Kristian Nielsen
added a comment - RSU=rolling schema upgrade, perhaps?
If you want the ALTERs to replicate to async slaves not part of the cluster, the GTID way is to binlog the ALTER in a separate domain id (SET SESSION gtid_domain_id=xxx). This will make them independent of the normal binlog stream. Grab the @@last_gtid from the first node, and use it to set server_id / gtid_seq_no on the other nodes to get the same GTID on all nodes for the ALTER.
If you do not want the ALTERs to replicate async to slaves, SET SESSION sql_log_bin=0 is the way.
There are at least two major issues which need to be resolved in order to use Galera seqno as part of the MariaDB GTID:
Occasionally Galera seqno is generated for a write set which do not commit a transaction, these include (but not limited to) write sets that fail certification and intermediate streaming replication fragments. In order to keep GTID sequences continuous, all of these events should be logged in binlog as dummy events, which could cause excessive clutter under certain workloads.
Master-slave topology where Galera cluster acts as a slave: It is required that the original GTID from the master should be preserved in binlog events. However, as Galera will generate a write set/seqno for the applied transaction, there will be two GTIDs which should be persisted in binlog for each transaction. It is not clear how this could be handled to preserve compatibility with async master/slave replication.
Teemu Ollakka
added a comment - There are at least two major issues which need to be resolved in order to use Galera seqno as part of the MariaDB GTID:
Occasionally Galera seqno is generated for a write set which do not commit a transaction, these include (but not limited to) write sets that fail certification and intermediate streaming replication fragments. In order to keep GTID sequences continuous, all of these events should be logged in binlog as dummy events, which could cause excessive clutter under certain workloads.
Master-slave topology where Galera cluster acts as a slave: It is required that the original GTID from the master should be preserved in binlog events. However, as Galera will generate a write set/seqno for the applied transaction, there will be two GTIDs which should be persisted in binlog for each transaction. It is not clear how this could be handled to preserve compatibility with async master/slave replication.
A feature like MDEV-20715 could also improve Galera's support for MariaDB GTIDs. Specifically, it could prevent each node from generating GTIDs for local transactions, which could make it easier for replication slaves to use any cluster node as master, without risking inconsistent GTIDs.
Geoff Montee (Inactive)
added a comment - A feature like MDEV-20715 could also improve Galera's support for MariaDB GTIDs. Specifically, it could prevent each node from generating GTIDs for local transactions, which could make it easier for replication slaves to use any cluster node as master, without risking inconsistent GTIDs.
This sounds similar to what I ran into but it seemed a tad vague. I am running MariaDB 10.1.18. I have a 3 node Galera cluster and an async slave. GTID's are enabled and all nodes have 'log-bin' and 'log-slave-updates' and are using 0 for the domain (the default).
What I found was that all Galera nodes seem to be writing all data to their binary logs, but their GTIDs do not match. I can find things by the transaction-id across all the logs, but if I try to find things by GTID, results are inconsistent. This means I cannot merely re-point the slave server to another node because that node does not have the same GTID information as the current master and, thus, the slave does not no where to begin.
It sounds like this issue applies to this bug? I see the target is 10.2. If so, it would be ideal if it was reflected in the KB documentation for 10.1?