[MDEV-16025] gtid_ignore_duplicates(rpl_gtid.cc) does not contemplate replicate_* filters Created: 2018-04-25  Updated: 2020-08-25  Resolved: 2018-07-17

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0, 10.1, 10.0.34, 10.2, 10.3
Fix Version/s: 10.0.35

Type: Bug Priority: Major
Reporter: Claudio Nanni Assignee: Andrei Elkin
Resolution: Not a Bug Votes: 3
Labels: gtid, replication
Environment:

Linux


Attachments: Text File 3_slap_tests.txt     JPEG File replica-setup.jpg    

 Description   

When two GTIDs get to the same Slave like for example via a multisource setup one of the two is discarded if you have gtid_ignore_duplicates=ON.
This is fine.
Also there is no guarantee on via which multisource channel each GTID arrives and it's processed first, they are all written in the per-source relay log and processed asynchronously, this is also fine.
If you have a good replication setup you probably have put in place replication filters so that duplicate transactions are not applied twice on the same slave(they still are copied from the Master into the relay logs). With GTID the option gtid_ignore_duplicates can save you, but with traditional coordinates you risk to apply twice the same transactions.
For this reason a setup that can lead the same transaction to reach to same multisource slave should have a `replication_wild_do|ignore` filter to avoid this (unless you are implementing a sort of custom highly available replication setup).
The problem happens exactly when you setup a filter, the piece of code that checks for duplicates apparently(at least from my limited understanding) is not aware that GTIDs coming from the channel with the filter won't be actually applied, so if the channel without the filter arrives second on a specific GTID it will be considered as duplicate and not applied, with the result of having missing transactions.
I attach a picture of the setup and the log of some testing I executed amending sql/rpl_gtid.cc to add some extra debug prints.

For reference: https://jira.mariadb.org/browse/MDEV-5804



 Comments   
Comment by Elena Stepanova [ 2018-06-12 ]

This is a strange setup, unless I'm missing something. The jpg says, "REPLICATION FILTER B2C such as no transactions coming from A via B have to be applied on C, also because they're supposed to reach C directly via the A2C replication channel. e.g. B2C - replicate_wild_do_tables='dummy%.%'."

But doesn't it actually ensure that no transactions coming from B are applied on C at all, either originated from A or from B? And if so, then what's the point in this complicated setup – if you don't want to apply any transactions from B on C, then why have B->C replication at all?

But the strange setup aside, the effect itself is of course easily reproducible, it doesn't even require a concurrent test, an MTR test will do, just let B2C replication work first, and only then start A2C – events from A won't be replicated. I don't know if it was really designed to be so or just happened to be, either way it doesn't seem to contradict any existing documentation. I'll leave it to Elkin to decide whether it's a bug or not, and if it's to be fixed, then in which versions – feel free to adjust Fix Version/s accordingly.

Comment by Claudio Nanni [ 2018-06-12 ]

C must replicate whatever is executed on B and A(directly), plain multisource.
B replicates from A.
C must not replicate transactions coming from A that go via B.
binlog filters on B cannot be used for several reasons(apart that is never a good idea).
So to avoid C replicating the same data twice a (wild) replicate filter is setup on B2C channel.
One may also setup a mutually exclusive filter also on A2C, but this is not relevant.
Since replicate filters don't avoid the transaction to reach the Slave's relay logs(they just make the transaction to not be applied) when the same gtid arrives on the multisource slave(on different relay logs, one per source) sql/rpl_gtid.cc code will ignore the second one(if ignore duplicates is on), always, but in case the first to arrive was the one from the channel with the exclusion filter that transaction won't be applied at all.

Comment by Andrei Elkin [ 2018-07-11 ]

> C must not replicate transactions coming from A that go via B.

I've just spoken with Claudio to make sure I understood immediate tech requirements well.
The aim is to regard replication events coming via a proxy (B) as duplicates, while
those that are coming from a direct link (A) as originals, and the duplicates to ignore.

For such purpose we always have IGNORE_SERVER_IDS = (server_id_list) which I offered
instead of meddling with @@replicate*. B2C therefore should be built on the Change-master option.

As to how @@replicate* combines with MSR and --gtid-ignore-duplicates as it is exemplified with provided tests I don't see actually a solid base to complain. Semantics of the combination does not match the aim which elenst pointed out.

Comment by Andrei Elkin [ 2018-07-11 ]

The status is set back to Unconfirmed to transit to Closed when Claudio has confirmed
the server-id-to-ignore filter works for his case. As to @@replicate* and --gtid-ignore-duplicates combination I said already that consistent outcome from the two is possible only
when the former is deployed on all the channels.

Comment by Andrei Elkin [ 2018-07-17 ]

Set straight to Not-a-Bug as ideas provided to how to achieve aimed filtering.

Generated at Thu Feb 08 08:25:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.