Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26945

GTID gets out of sync between Galera cluster nodes by executing 2 transactions under the same GTID on the restarted node!

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.6.4
    • Fix Version/s: 10.6
    • Component/s: Galera
    • Labels:
    • Environment:
      Ubuntu 20.04
      10.6.4-MariaDB-1:10.6.4+maria~focal-log - mariadb.org binary distribution

      Description

      I have 3 galera cluster nodes setup configured as required: https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-galera-cluster/
      It was running fine for 1 month. But suddenly - 29th of october I noticed that 2 nodes are having GTID which is +1 than the 3rd node. I started to investigate what is issue. It seams that mariadm on node with name node2 was self-restarted because server run out of RAM. And after restart the first new transaction executed was executed and logged in binary logs with the same GTID as the last transaction before the restart - so the node2 executed 2 transactions (the last before restart and the first after restart) with the same GTID!
      I am attaching combined screenshots where we can see difference between node2 and node1 binary logs - green lines marks situation so far good. The red ones marks what has gone wrong.

      I am also attaching the config file and the error log files from node1 and node2. Hope this helps to find out the cause.
      gtid_domain_id on each server is different 1 on node1, 2 on node2 and 3 on node 3 as recomended in mariadb docs link above.

      This situation leads also to the problem of replica server. My replica server (slave) now is attached to node2. All the nodes - node1, node2 and node3 have enabled binary logs. Before problem with GTID arrived I was able to switch the replica server to any cluster node and it was syncing fine. Now as GTIDs differs - this is not possible.

      No data loss is detected as it is just mess up with GTID numbers which causes also problem with replica server - no option to attach it to other cluster node except the node2 right now.

        Attachments

        1. mariadb.override.cnf
          2 kB
        2. node1.mariadb.err.log
          10 kB
        3. node2.mariadb.err.log
          17 kB
        4. ONE-GTID-2-queries.png
          ONE-GTID-2-queries.png
          329 kB
        5. out-of-sync-all.PNG
          out-of-sync-all.PNG
          28 kB
        6. Servers.PNG
          Servers.PNG
          34 kB

          Activity

            People

            Assignee:
            mkaruza Mario Karuza
            Reporter:
            normunds.puzo@gmail.com Normunds Puzo
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.