Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11969

Can't remove GTIDs for a stale GTID Domain ID

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 10.1.19, 10.1.20
    • Fix Version/s: 10.1.30
    • Component/s: Replication
    • Labels:
      None

      Description

      I've simplified my situation for this bug description:

      We've two independent MariaDB clusters, both are a regular master-slave setup (let's call the masters A and B). There's also a MariaDB as data warehouse, using multi-source replication from both masters. All replications were created in the pre-GTID era using binlog_file and binlog_pos. Of course, both masters already generated GTIDs for the default domain id 0.

      When we migrated to a GTID based replication, we configured master A with domain id 1 and B with domain id 2. All slaves in group A have now 2 GTIDs in gtid_slave_pos: one with domain 1 with a increasing sequence counter, and one with a static sequence counter the former default domain 0. Master A also keeps track of this GTID of domain 0 via gtid_binlog_pos (and gtid_binlog_state).

      For master B and its slaves the same applies for domain 2 and 0, respectively. So far this is not a problem.

      However, it's not possible to introduce GTID based replication on the warehouse. The last statement written in the pre-GTID era for the default domain id 0 originated from master B and has a lower sequence number than the GTID for domain 0 on master A.

      Therefore, when executing

      CHANGE MASTER "A" TO master_use_gtid = slave_pos, do_domain_id = (1), ignore_domain_id = ();
      

      on the warehouse to allow its replication to use GTID, A attempts to scan the binlog not only for domain 1, but also for domain 0 (despite do_domain_id). Because the sequence number for domain 0 is lower than the one in A's gtid_binlog_pos, A refuses the connection with

      Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-XXXX-YYYY, which is not in the master's binlog
      

      There's no way to ditch knowledge about domain 0 on the masters of A and B except setting gtid_binlog_state which would cause a RESET MASTER and therefore isn't applicable in live operation.

      I assume that this issue is similar to MDEV-9108 which (as far as I understood) basically wants that do_domain_id also tells the master to ignore all other domains when scanning the binlogs for the starting position. This would solve my issue.

      But after all I believe that it's easier to allow altering gtid_binlog_pos on the master (not directly or via gtid_binlog_state, but through a function call) to forget GTIDs for a specific domain id without issuing RESET MASTER.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Elkin Andrei Elkin
              Reporter:
              DrMurx Jan Kunzmann
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: