[MDEV-16834] GTID current_pos easily breaks replication Created: 2018-07-27 Updated: 2020-08-25 Resolved: 2018-09-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Documentation, Replication |
| Affects Version/s: | 10.2.15 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Claudio Nanni | Assignee: | Kenneth Dyer (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | GTID | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
Replication using GTID and MASTER_USE_GTID=current_pos easily breaks when a transaction is generated on the Slave and Replication is restarted. So you can have: 1-100-1000 I don't know if it's on purpose, but it does not seem consistent to me, at first impression I'd make each server having its own trx numbering with no holes. |
| Comments |
| Comment by Geoff Montee (Inactive) [ 2018-07-27 ] |
GTID sequences are per-domain, not per-server. If you want each server to have its own independent GTID sequences, then you can set gtid_domain_id to a different value on each server. |
| Comment by Claudio Nanni [ 2018-07-27 ] |
|
I don't care if it's per domain or per server (although in my opinion the GTID sequence per server is more elegant), the problem here is the consistency, as you can see the number is repeated also in the same domain: 1-100-1000 When the GTID is generated on the Slave it continues from the next value, when the GTID is again generated on the Master, not having a feedback mechanism, the Master continues where it left and that means that there will be duplicate sequence numbers. This KB spends some words about it: https://mariadb.com/kb/en/library/gtid/ To have a consistent behaviour one should either introduce a feedback mechanism so that the Master can read the latest sequence number from the Slaves, or, the much more easier use a dedicated counter per server. |
| Comment by Geoff Montee (Inactive) [ 2018-07-27 ] |
As far as I understand it, it's not generally a good idea to write to the same domain on two different servers concurrently because of this exact problem. If two servers are writing concurrently, then I believe that it is usually appropriate to use different gtid_domain_id values on each server.
I believe that setting up circular replication would be the "feedback mechanism" you're looking for, but I think you'd still have to watch out for duplicates that could happen due to slave lag. In general, I think the best option is probably to set gtid_domain_id to different values on servers that will be writing concurrently. |
| Comment by Claudio Nanni [ 2018-07-27 ] |
To me is too fuzzy. Is the sequence number unique per domain id?
Not really. What I have in mind is just a call to the Slaves to ask latest locally generated gtid, of course not really applicable. To be really global GTID should use a central GTID dispatcher, that whenever a server needs to generate a transaction has a really unique GTID, Back to the main bug in this report it seems quite bad that STOP/START SLAVE will break replication. The explication is in how gtid_current_pos is set: "For each replication domain, if the server ID of the corresponding GTID in @@gtid_binlog_pos is equal to the servers own server_id, and the sequence number is higher than the corresponding GTID in @@gtid_slave_pos, then the GTID from @@gtid_binlog_pos will be used. Otherwise the GTID from @@gtid_slave_pos will be used for that domain." |
| Comment by Elena Stepanova [ 2018-07-29 ] |
|
If I understand correctly, the reported problem is the behavior explicitly documented for current_pos:
https://mariadb.com/kb/en/library/gtid/ |
| Comment by Claudio Nanni [ 2018-07-30 ] |
|
Hello Elena, |
| Comment by Elena Stepanova [ 2018-07-30 ] |
|
As long as it's about documentation, I leave it to documentation experts to decide how to do it best. |
| Comment by Kenneth Dyer (Inactive) [ 2018-09-15 ] |
|
Updated the GTID page, added new section from existing text covering current_pos and slave_pos to better emphasize issue. |
| Comment by Geoff Montee (Inactive) [ 2019-07-22 ] |
|
The slave's I/O thread currently only checks for inconsistent GTIDs when it initializes its local value of gtid_current_pos, which happens when the slave threads are first started. I don't think this is ideal. If a slave has MASTER_USE_GTID=current_pos set, then I think the slave's I/O thread should periodically compare its local value of gtid_current_pos to the slave's global value of gtid_binlog_pos. This would allow the slave to warn users that its position has become inconsistent, even if the slave threads don't get restarted. See |