Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-4411

Loading a SQL dump sends queries to the replica, breaking it (GTID under strict mode)

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 6.4.3
    • N/A
    • N/A
    • None

    Description

      I've been chasing a weird thing (bug?) which happens from time to time (once every 3-6 months) and looks like this:

      • SQL dump from a local MariaDB stand-alone DB (taken by HeidiSQL) is then loaded onto a master/slave cluster via MaxScale (presumably by the same tool).
      • MaxScale starts feeding the dump to the master (as expected).
      • At some point, MaxScale makes a connection to the slave and tries to execute something there.
      • The replica runs with GTID strict mode enabled and stops the replication:

      An attempt was made to binlog GTID 0-11-170271865 which would create an out-of-order sequence number with existing GTID 0-12-170271865, and gtid strict mode is enabled

      The binlog on the master and the relay log are identical, as it may be expected; the binlog on the replica shows a GTID with the local server ID created (but no data) and then the replication stops.

      This has happened enough times to rule out any accidental cause, Mercury retrograde etc.; both the master and the slave are firewalled out, so there is now way any connection could have been made directly into them - so this must have come from MaxScale. This happens both on loaded systems and testing ones, so the load factor seems not to play any role.

      MaxScale config is pretty straightforward, with a simple read-write split. It does have causal reads tracking now set to "global", but the same issue was present when this option did not exist; all-in-all, this has happened on all MaxScale mainlines from 2.3 to 6. We do have "use_sql_variables_in=master" but I don't see how this would cause the observed effect.

      The most frustrating thing is that this happens rarely and by far not every data load ends like this; however, this thing only happens on dump loads, so it must be somehow related.

      I'm attaching here the relevant parts (stripped of the repetitive INSERT statements) from the master log, relay log and the slave log on a breakage that happened few days ago; also the header of a dump from HeidiSQL, which caused a similar breakage few months ago (it then broke literally after the first CREATE TABLE statement, on the USE statement shown - but this time it broke halfway through the dump which I'm still waiting to receive).

      I understand these are hardly enough and I cannot give a way to reliably reproduce this, but maybe somebody has seen or heard of something similar? Does this ring any bell?

      Attachments

        1. HeidiSQL- dump.sql
          4 kB
        2. master.log
          2 kB
        3. relay.log
          2 kB
        4. slave.log
          2 kB

        Activity

          People

            Unassigned Unassigned
            assen.totin Assen Totin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.