[MDEV-20720] Galera: Replicate MariaDB GTID to other nodes in the cluster Created: 2016-08-31 Updated: 2023-11-27 Resolved: 2020-01-29 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Replication |
| Fix Version/s: | 10.5.1 |
| Type: | Task | Priority: | Critical |
| Reporter: | Nirbhay Choubey (Inactive) | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 28 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | 10.2.4-4, 10.2.4-1, 10.3.1-2, 10.2.10, 10.2.11, 10.1.30, 10.2.12 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The MariaDB GTID is currently not transferred to other nodes in the cluster. As a result, |
| Comments |
| Comment by Tim Soderstrom [ 2016-10-19 ] | |
|
This sounds similar to what I ran into but it seemed a tad vague. I am running MariaDB 10.1.18. I have a 3 node Galera cluster and an async slave. GTID's are enabled and all nodes have 'log-bin' and 'log-slave-updates' and are using 0 for the domain (the default). What I found was that all Galera nodes seem to be writing all data to their binary logs, but their GTIDs do not match. I can find things by the transaction-id across all the logs, but if I try to find things by GTID, results are inconsistent. This means I cannot merely re-point the slave server to another node because that node does not have the same GTID information as the current master and, thus, the slave does not no where to begin. It sounds like this issue applies to this bug? I see the target is 10.2. If so, it would be ideal if it was reflected in the KB documentation for 10.1? | |
| Comment by Andrew Garner [ 2016-10-19 ] | |
|
Tim, I think you may be running into | |
| Comment by Tim Soderstrom [ 2016-10-19 ] | |
|
Doh you are right, that sounds exactly like our problem. Bug search fail on my part - thank you for providing that! | |
| Comment by Michaël de groot [ 2016-12-28 ] | |
|
Hi! I also noticed that the initial GTID is not passed on with all SST methods. If I remember correctly, only rsync SST will sync it. I think this implementation should make fixing that in SST unneeded, could you please confirm that? Thanks, | |
| Comment by Michaël de groot [ 2017-03-13 ] | |
|
sachin.setiya.007 can you please tell me why this issue is stalled? This is very important issue to fix. Right now it is very inconvenient to replicate 1 galera cluster to another, and with circular replication between 2 galera clusters it becomes a real pain. | |
| Comment by Geoff Montee (Inactive) [ 2017-08-04 ] | |
|
I assume that "Fix Version/s: 10.2" is not accurate anymore. Since 10.2 is already GA, I assume this would go into MariaDB 10.3 at the earliest. Is that correct? | |
| Comment by Sachin Setiya (Inactive) [ 2017-09-11 ] | |
|
I am occupied by galera merges and , galera bugs ,so I did not get the time to do this , I will again start working on this hopefully in next week. | |
| Comment by Sachin Setiya (Inactive) [ 2017-10-10 ] | |
|
http://lists.askmonty.org/pipermail/commits/2017-October/011552.html | |
| Comment by Michaël de groot [ 2017-10-10 ] | |
|
Cool, very nice that this issue is finally getting done! Thank you sachin.setiya.007. In the tests, please consider circular asynchronous replication between 2 or more Galera clusters: Cluster 1: A <> B <> C All nodes have log_slave_updates enabled. Bidirectional asynchronous replication is between node A and node D. Writes originate from, for example, node B. Can you please make sure this scenario is tested? sachin.setiya.007 maybe the implementation done here is not enough for this use case. How does node e recognize transactions that originated from node D? Maybe we need to set up ignore domain ID on the asynchronous replication stream? | |
| Comment by Sachin Setiya (Inactive) [ 2017-10-12 ] | |
|
Hi michaeldg, Writing a mtr test case for this situation is bit difficult. But I will try to simulate this on vms Regards | |
| Comment by Andrei Elkin [ 2017-10-24 ] | |
|
Sachin, hello. Please check out a review mail I sent out. Cheers, Andrei. | |
| Comment by Sachin Setiya (Inactive) [ 2017-11-28 ] | |
|
Status Update:- Actually test case for 2X3 node galera cluster has been created, But this tests fails because of | |
| Comment by Sachin Setiya (Inactive) [ 2017-11-28 ] | |
|
Branch Buildbot | |
| Comment by Sachin Setiya (Inactive) [ 2017-12-04 ] | |
|
Status Update:- All issue solved. So the problem was suppose A cluster like this A <-
D <- So the event group arriving from B , C was applied 2 times on A (similarly for event group of E, F to D). | |
| Comment by Sachin Setiya (Inactive) [ 2017-12-11 ] | |
|
There is one more constraint, In the case of master slave replication to gtid cluster. Or Gtid_cluster to gtid_cluster (async or may be circular replication ) , Cluster should have different domain id wrt to master or slave | |
| Comment by Sachin Setiya (Inactive) [ 2017-12-25 ] | |
|
http://lists.askmonty.org/pipermail/commits/2017-December/011761.html | |
| Comment by Mark Stoute [ 2018-04-20 ] | |
|
Thank you for this fix. In a dev cluster where I bootstrapped the cluster from 10.1.32, GTIDs are in sync and wsrep_provider_version is higher (25.3.23(r3789)). Is it true that in order to have my production cluster have GTIDs in sync, I will need to bootstrap with 10.1.32? | |
| Comment by Sachin Setiya (Inactive) [ 2018-05-03 ] | |
|
There is been some confusion, the gtid has been transferred between nodes only of the cluster is async slave , If we want to transfer gtid inside of write set that will be bugger change , and will involve changing galera code | |
| Comment by Arjen Lentz [ 2018-07-09 ] | |
|
sachin.setiya.007 it would be ok if Galera just passed the MariaDB GTID around as-is (as an extra arbitrary field as part of a commit), so it will be stored in each binlog. That would not require Galera to start using MariaDB GTIDs. Just see them as separate: Galera GTID and MariaDB GTID. | |
| Comment by Sachin Setiya (Inactive) [ 2018-07-11 ] | |
|
Hi arjen Actually that wont work , because lets mariadb server some how has to generate gtid in sync, lets say we have 3 node cluster with each node gtid 1-1-1 , and then we do simultaneous write on node 1 and node 2, So both will generate gtid 1-1-2 and this will be wrong sequence. So we need galera to manage gtid , since it is transaction coordinator not mariadb | |
| Comment by Arjen Lentz [ 2018-07-11 ] | |
|
I'm sorry sachin.setiya.007 but that's just not correct. Remember that GTID also works in an async replication and master-master configuration. | |
| Comment by Sachin Setiya (Inactive) [ 2018-07-11 ] | |
|
When we have different domain id , then user ensures that the bingol events don't conflict each other , but this is not the case with galera, galera can handle conflicts , so I think within one cluster we should have one domain id, and this is what galera internally does , it has one uuid for one cluster | |
| Comment by Daniel Black [ 2018-07-11 ] | |
|
Format is D-S-# and I'm fairly sure arjen is talking about different server IDs on each galera node (despite a little dyslexia). | |
| Comment by Sachin Setiya (Inactive) [ 2018-07-11 ] | |
|
danblack, right Format is D-S-X , Actually my first comment is slightly wrong , each node will have gtid 1(constant)-X(node server id) -Y(seq no ), server id will be different on each node , but still seq no will be wrt to domain id | |
| Comment by Arjen Lentz [ 2018-07-11 ] | |
|
yes thanks Dan - I had it right in a blogpost the other day. sachin.setiya.007 The seq# component its own is not unique, it's the GTID as a whole that needs to be unique. | |
| Comment by Sachin Setiya (Inactive) [ 2018-07-11 ] | |
|
arjen, I never said that the sequence no is unique its own , it is unique with respect to domain id , For example 1-1-1 and 1-2-1 and conflicting gtid , However 1-1-1 and 2-1-1 and perfectly okay gtid https://mariadb.com/kb/en/library/gtid/#the-domain-id | |
| Comment by Daniel Black [ 2018-07-12 ] | |
|
GTIDs need to pass through the cluster. Consider this requirement:
I'm sure I'm not the only one of the 19 voters and 31 watchers wanting this. There should be no need for the application to consider that a *G*TID is anything but a global identifier. Galera needs to ensure that the sequential visibility in applying each D-S pair (i.e 0-1-33 isn't visible when 0-1-22 isn't) so of course the gtid needs to be transferred in the writeset. Each server has its own server-id and can be responsible for GTID generation without coordination. If the certification fails then the server skips a GTID value. The galera GTID has a different purpose so its needed to stay independent. If that server is part of the cluster then the galera mechanism can ensure that can be applied without conflict however this is independent on what the GTID actually is. | |
| Comment by Valerii Kravchuk [ 2018-08-10 ] | |
|
We should also consider the case of ALTER running on node by node in RSU mode. We should end up with consisetnt GTIDs in cluster after this, or invent some workaround (do not generate local GTIDs while in RSU mode, request to do everything with sql_log_bin=0?). | |
| Comment by Kristian Nielsen [ 2018-08-10 ] | |
|
RSU=rolling schema upgrade, perhaps? If you want the ALTERs to replicate to async slaves not part of the cluster, the GTID way is to binlog the ALTER in a separate domain id (SET SESSION gtid_domain_id=xxx). This will make them independent of the normal binlog stream. Grab the @@last_gtid from the first node, and use it to set server_id / gtid_seq_no on the other nodes to get the same GTID on all nodes for the ALTER. If you do not want the ALTERs to replicate async to slaves, SET SESSION sql_log_bin=0 is the way. | |
| Comment by Sylvain ARBAUDIE [ 2019-07-12 ] | |
|
Would there be any issue using the galera seqno as the last part of the mariadb gtid ? apart for RSU DDL i mean ? | |
| Comment by Teemu Ollakka [ 2019-07-15 ] | |
|
There are at least two major issues which need to be resolved in order to use Galera seqno as part of the MariaDB GTID:
| |
| Comment by Geoff Montee (Inactive) [ 2019-10-02 ] | |
|
A feature like | |
| Comment by Sachin Setiya (Inactive) [ 2020-01-21 ] | |
|
Okay to push | |
| Comment by Ian Gilfillan [ 2020-03-29 ] | |
|
The pull request linked with this issue is still marked as open, although the task has been closed. |