[MDEV-4474] Replication aborts on starting a circular replication using GTID Created: 2013-05-03 Updated: 2013-05-22 Resolved: 2013-05-22 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.0.2 |
| Fix Version/s: | 10.0.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Elena Stepanova | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
I set replication 1->2 to use GTID, start it, execute some events on server 1 and server 2, then set replication 2->1 to use GTID too, and attempt to start it. Output of the test case provided below
Test case:
cnf file:
bzr version-info
|
| Comments |
| Comment by Kristian Nielsen [ 2013-05-03 ] |
|
Thanks for testing this! You are in uncharted territory, I did consider circular topologies in the There is one problem with your test. You have two masters active at the same It does not help that you stopped the direction 2->1. What matters is that you Concretely, we have S2: create table t2 ... On S2, "create table t2" will be binlogged before "insert into t1". But on S1, So when S1 connects with GTID as slave to S2, it asks to start from the There are two ways to do this correctly: 1. Either configure different domain_ids for S1 and S2. 2. Or alternatively, use a single domain and make sure that everything is So I would like to handle this in two stages. First, we should make sure that (1) and (2) actually work correctly (they Second, while what the test does is fundamentally incorrect and cannot work, "First" we should do now. "Second" I would like to revisit later, when the What do you think? |
| Comment by Kristian Nielsen [ 2013-05-03 ] |
|
BTW, I've now started writing some docs! I didn't manage to finish them today, but should be ready early next week, they should explain this issue, among others. |
| Comment by Elena Stepanova [ 2013-05-03 ] |
|
>> There is one problem with your test. I see, thanks for the explanation. reason A: reason B: >> So I would like to handle this in two stages. |
| Comment by Elena Stepanova [ 2013-05-05 ] |
|
Decreasing the priority since it works by design, leaving open for the second stage described above; not removing 10.0.3 from the 'Fix version' list, since for minor bugs it gets updated automatically upon a release anyway. |
| Comment by Kristian Nielsen [ 2013-05-08 ] |
|
I had a good discussion on this with Elena. I think the root problem here is that when we do "insert into t1 values (2)" on server 1, this changes the replication position of server 1, so that it now skips earlier transactions on server 2. This is highly unexpected. The original reason for this was to make CHANGE MASTER TO master_use_gtid=1 work automatically, regardless of whether the server was before the CHANGE MASTER already a slave server, or in fact the old master of the replication hierarchy. However, now I think this magic causes more problems than it solves. I think I have an idea for doing this better, in a way that avoids the surprising magic but still makes it easy to connect a server with GTID, whether it was a master before or a slave. |
| Comment by Kristian Nielsen [ 2013-05-22 ] |
|
After implementing CHANGE MASTER TO master_use_gtid=slave_pos (and using it in the test case instead of master_use_gtid=1), the test passes without problems. So this is promising. |
| Comment by Kristian Nielsen [ 2013-05-22 ] |
|
This is now fixed and pushed to 10.0-base. The fix is the new user interface with separate master_use_gtid=slave_pos and |