Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16025

gtid_ignore_duplicates(rpl_gtid.cc) does not contemplate replicate_* filters

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.0.34, 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
    • 10.0.35
    • Replication
    • Linux

    Description

      When two GTIDs get to the same Slave like for example via a multisource setup one of the two is discarded if you have gtid_ignore_duplicates=ON.
      This is fine.
      Also there is no guarantee on via which multisource channel each GTID arrives and it's processed first, they are all written in the per-source relay log and processed asynchronously, this is also fine.
      If you have a good replication setup you probably have put in place replication filters so that duplicate transactions are not applied twice on the same slave(they still are copied from the Master into the relay logs). With GTID the option gtid_ignore_duplicates can save you, but with traditional coordinates you risk to apply twice the same transactions.
      For this reason a setup that can lead the same transaction to reach to same multisource slave should have a `replication_wild_do|ignore` filter to avoid this (unless you are implementing a sort of custom highly available replication setup).
      The problem happens exactly when you setup a filter, the piece of code that checks for duplicates apparently(at least from my limited understanding) is not aware that GTIDs coming from the channel with the filter won't be actually applied, so if the channel without the filter arrives second on a specific GTID it will be considered as duplicate and not applied, with the result of having missing transactions.
      I attach a picture of the setup and the log of some testing I executed amending sql/rpl_gtid.cc to add some extra debug prints.

      For reference: https://jira.mariadb.org/browse/MDEV-5804

      Attachments

        Activity

          This is a strange setup, unless I'm missing something. The jpg says, "REPLICATION FILTER B2C such as no transactions coming from A via B have to be applied on C, also because they're supposed to reach C directly via the A2C replication channel. e.g. B2C - replicate_wild_do_tables='dummy%.%'."

          But doesn't it actually ensure that no transactions coming from B are applied on C at all, either originated from A or from B? And if so, then what's the point in this complicated setup – if you don't want to apply any transactions from B on C, then why have B->C replication at all?

          But the strange setup aside, the effect itself is of course easily reproducible, it doesn't even require a concurrent test, an MTR test will do, just let B2C replication work first, and only then start A2C – events from A won't be replicated. I don't know if it was really designed to be so or just happened to be, either way it doesn't seem to contradict any existing documentation. I'll leave it to Elkin to decide whether it's a bug or not, and if it's to be fixed, then in which versions – feel free to adjust Fix Version/s accordingly.

          elenst Elena Stepanova added a comment - This is a strange setup, unless I'm missing something. The jpg says, "REPLICATION FILTER B2C such as no transactions coming from A via B have to be applied on C, also because they're supposed to reach C directly via the A2C replication channel. e.g. B2C - replicate_wild_do_tables='dummy%.%'." But doesn't it actually ensure that no transactions coming from B are applied on C at all, either originated from A or from B? And if so, then what's the point in this complicated setup – if you don't want to apply any transactions from B on C, then why have B->C replication at all? But the strange setup aside, the effect itself is of course easily reproducible, it doesn't even require a concurrent test, an MTR test will do, just let B2C replication work first, and only then start A2C – events from A won't be replicated. I don't know if it was really designed to be so or just happened to be, either way it doesn't seem to contradict any existing documentation. I'll leave it to Elkin to decide whether it's a bug or not, and if it's to be fixed, then in which versions – feel free to adjust Fix Version/s accordingly.
          claudio.nanni Claudio Nanni added a comment -

          C must replicate whatever is executed on B and A(directly), plain multisource.
          B replicates from A.
          C must not replicate transactions coming from A that go via B.
          binlog filters on B cannot be used for several reasons(apart that is never a good idea).
          So to avoid C replicating the same data twice a (wild) replicate filter is setup on B2C channel.
          One may also setup a mutually exclusive filter also on A2C, but this is not relevant.
          Since replicate filters don't avoid the transaction to reach the Slave's relay logs(they just make the transaction to not be applied) when the same gtid arrives on the multisource slave(on different relay logs, one per source) sql/rpl_gtid.cc code will ignore the second one(if ignore duplicates is on), always, but in case the first to arrive was the one from the channel with the exclusion filter that transaction won't be applied at all.

          claudio.nanni Claudio Nanni added a comment - C must replicate whatever is executed on B and A(directly), plain multisource. B replicates from A. C must not replicate transactions coming from A that go via B. binlog filters on B cannot be used for several reasons(apart that is never a good idea). So to avoid C replicating the same data twice a (wild) replicate filter is setup on B2C channel. One may also setup a mutually exclusive filter also on A2C, but this is not relevant. Since replicate filters don't avoid the transaction to reach the Slave's relay logs(they just make the transaction to not be applied) when the same gtid arrives on the multisource slave(on different relay logs, one per source) sql/rpl_gtid.cc code will ignore the second one(if ignore duplicates is on), always, but in case the first to arrive was the one from the channel with the exclusion filter that transaction won't be applied at all.
          Elkin Andrei Elkin added a comment -

          > C must not replicate transactions coming from A that go via B.

          I've just spoken with Claudio to make sure I understood immediate tech requirements well.
          The aim is to regard replication events coming via a proxy (B) as duplicates, while
          those that are coming from a direct link (A) as originals, and the duplicates to ignore.

          For such purpose we always have IGNORE_SERVER_IDS = (server_id_list) which I offered
          instead of meddling with @@replicate*. B2C therefore should be built on the Change-master option.

          As to how @@replicate* combines with MSR and --gtid-ignore-duplicates as it is exemplified with provided tests I don't see actually a solid base to complain. Semantics of the combination does not match the aim which elenst pointed out.

          Elkin Andrei Elkin added a comment - > C must not replicate transactions coming from A that go via B. I've just spoken with Claudio to make sure I understood immediate tech requirements well. The aim is to regard replication events coming via a proxy (B) as duplicates, while those that are coming from a direct link (A) as originals, and the duplicates to ignore. For such purpose we always have IGNORE_SERVER_IDS = (server_id_list) which I offered instead of meddling with @@replicate* . B2C therefore should be built on the Change-master option. As to how @@replicate* combines with MSR and --gtid-ignore-duplicates as it is exemplified with provided tests I don't see actually a solid base to complain. Semantics of the combination does not match the aim which elenst pointed out.
          Elkin Andrei Elkin added a comment -

          The status is set back to Unconfirmed to transit to Closed when Claudio has confirmed
          the server-id-to-ignore filter works for his case. As to @@replicate* and --gtid-ignore-duplicates combination I said already that consistent outcome from the two is possible only
          when the former is deployed on all the channels.

          Elkin Andrei Elkin added a comment - The status is set back to Unconfirmed to transit to Closed when Claudio has confirmed the server-id-to-ignore filter works for his case. As to @@replicate* and --gtid-ignore-duplicates combination I said already that consistent outcome from the two is possible only when the former is deployed on all the channels.
          Elkin Andrei Elkin added a comment -

          Set straight to Not-a-Bug as ideas provided to how to achieve aimed filtering.

          Elkin Andrei Elkin added a comment - Set straight to Not-a-Bug as ideas provided to how to achieve aimed filtering.

          People

            Elkin Andrei Elkin
            claudio.nanni Claudio Nanni
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.