Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26945

GTID gets out of sync between Galera cluster nodes by executing 2 transactions under the same GTID on the restarted node!

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.4
    • 10.6
    • Galera
    • Ubuntu 20.04
      10.6.4-MariaDB-1:10.6.4+maria~focal-log - mariadb.org binary distribution

    Description

      I have 3 galera cluster nodes setup configured as required: https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-galera-cluster/
      It was running fine for 1 month. But suddenly - 29th of october I noticed that 2 nodes are having GTID which is +1 than the 3rd node. I started to investigate what is issue. It seams that mariadm on node with name node2 was self-restarted because server run out of RAM. And after restart the first new transaction executed was executed and logged in binary logs with the same GTID as the last transaction before the restart - so the node2 executed 2 transactions (the last before restart and the first after restart) with the same GTID!
      I am attaching combined screenshots where we can see difference between node2 and node1 binary logs - green lines marks situation so far good. The red ones marks what has gone wrong.

      I am also attaching the config file and the error log files from node1 and node2. Hope this helps to find out the cause.
      gtid_domain_id on each server is different 1 on node1, 2 on node2 and 3 on node 3 as recomended in mariadb docs link above.

      This situation leads also to the problem of replica server. My replica server (slave) now is attached to node2. All the nodes - node1, node2 and node3 have enabled binary logs. Before problem with GTID arrived I was able to switch the replica server to any cluster node and it was syncing fine. Now as GTIDs differs - this is not possible.

      No data loss is detected as it is just mess up with GTID numbers which causes also problem with replica server - no option to attach it to other cluster node except the node2 right now.

      Attachments

        1. mariadb.override.cnf
          2 kB
        2. node1.mariadb.err.log
          10 kB
        3. node2.mariadb.err.log
          17 kB
        4. ONE-GTID-2-queries.png
          ONE-GTID-2-queries.png
          329 kB
        5. out-of-sync-all.PNG
          out-of-sync-all.PNG
          28 kB
        6. Servers.PNG
          Servers.PNG
          34 kB

        Activity

          Hi,

          What is your `gtid_domain_id` on servers ? Are all tables use InnoDB SE ?

          mkaruza Mario Karuza (Inactive) added a comment - Hi, What is your `gtid_domain_id` on servers ? Are all tables use InnoDB SE ?
          normunds.puzo@gmail.com Normunds Puzo added a comment -

          gtid_domain_id on each server is different 1 on node1, 2 on node2 and 3 on node 3.
          Yes. All databases are innodb.
          If some of the databases would not be innodb, the GTID would contain additional internal id located after comma sign.

          normunds.puzo@gmail.com Normunds Puzo added a comment - gtid_domain_id on each server is different 1 on node1, 2 on node2 and 3 on node 3. Yes. All databases are innodb. If some of the databases would not be innodb, the GTID would contain additional internal id located after comma sign.
          normunds.puzo@gmail.com Normunds Puzo added a comment -

          Is there a way to update GTID on node manually so it matches the other nodes GTID?

          normunds.puzo@gmail.com Normunds Puzo added a comment - Is there a way to update GTID on node manually so it matches the other nodes GTID?
          normunds.puzo@gmail.com Normunds Puzo added a comment - - edited

          Additional info: Mariadb on node2 was self restarted, because the node run out of RAM for short period of time. No swap was enabled on the server. Total 8GB of RAM.

          normunds.puzo@gmail.com Normunds Puzo added a comment - - edited Additional info: Mariadb on node2 was self restarted, because the node run out of RAM for short period of time. No swap was enabled on the server. Total 8GB of RAM.
          normunds.puzo@gmail.com Normunds Puzo added a comment -

          Now node 1 is behind node 2 with 10 numbers and node3 with 9 numbers. Nothing in /var/lib/mysql/mariadb.err log appears on any of nodes during the period the GTIDs changed again...

          normunds.puzo@gmail.com Normunds Puzo added a comment - Now node 1 is behind node 2 with 10 numbers and node3 with 9 numbers. Nothing in /var/lib/mysql/mariadb.err log appears on any of nodes during the period the GTIDs changed again...

          People

            seppo Seppo Jaakola
            normunds.puzo@gmail.com Normunds Puzo
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.