[MDEV-11969] Can't remove GTIDs for a stale GTID Domain ID Created: 2017-02-01 Updated: 2017-12-12 Resolved: 2017-12-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.1.19, 10.1.20 |
| Fix Version/s: | 10.1.30 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jan Kunzmann (Inactive) | Assignee: | Andrei Elkin |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
I've simplified my situation for this bug description: We've two independent MariaDB clusters, both are a regular master-slave setup (let's call the masters A and B). There's also a MariaDB as data warehouse, using multi-source replication from both masters. All replications were created in the pre-GTID era using binlog_file and binlog_pos. Of course, both masters already generated GTIDs for the default domain id 0. When we migrated to a GTID based replication, we configured master A with domain id 1 and B with domain id 2. All slaves in group A have now 2 GTIDs in gtid_slave_pos: one with domain 1 with a increasing sequence counter, and one with a static sequence counter the former default domain 0. Master A also keeps track of this GTID of domain 0 via gtid_binlog_pos (and gtid_binlog_state). For master B and its slaves the same applies for domain 2 and 0, respectively. So far this is not a problem. However, it's not possible to introduce GTID based replication on the warehouse. The last statement written in the pre-GTID era for the default domain id 0 originated from master B and has a lower sequence number than the GTID for domain 0 on master A. Therefore, when executing
on the warehouse to allow its replication to use GTID, A attempts to scan the binlog not only for domain 1, but also for domain 0 (despite do_domain_id). Because the sequence number for domain 0 is lower than the one in A's gtid_binlog_pos, A refuses the connection with
There's no way to ditch knowledge about domain 0 on the masters of A and B except setting gtid_binlog_state which would cause a RESET MASTER and therefore isn't applicable in live operation. I assume that this issue is similar to MDEV-9108 which (as far as I understood) basically wants that do_domain_id also tells the master to ignore all other domains when scanning the binlogs for the starting position. This would solve my issue. But after all I believe that it's easier to allow altering gtid_binlog_pos on the master (not directly or via gtid_binlog_state, but through a function call) to forget GTIDs for a specific domain id without issuing RESET MASTER. |
| Comments |
| Comment by Elena Stepanova [ 2017-02-01 ] |
|
plinux, I'll leave it to you to choose and set the 'Fix version'. |
| Comment by Michael Gmelin [ 2017-03-04 ] |
|
We're facing this issue in various similar replication setups as well. After changing a server's gtid_domain_id, it's not possible to get rid of the last gtid of the previous domain in gtid_binlog_state on the master without using "reset master" and the slaves get stuck with fatal error 1236. do_domain_id doesn't help, as the slaves always check gtid_binlog_state and try to lookup all gtids in the master's binary log. |
| Comment by Andrei Elkin [ 2017-09-06 ] |
|
Lixun, hello. Let me grab the ticket from you since I am implementing the very requested measure in mdev-12012. Andrei |
| Comment by Andrei Elkin [ 2017-12-12 ] |
|
|