[MDEV-26945] GTID gets out of sync between Galera cluster nodes by executing 2 transactions under the same GTID on the restarted node! - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.6.4
Fix Version/s: 10.6
Component/s: Galera
Labels:
- GTID
- galera
Environment:
Ubuntu 20.04
10.6.4-MariaDB-1:10.6.4+maria~focal-log - mariadb.org binary distribution

Description

I have 3 galera cluster nodes setup configured as required: https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-galera-cluster/
It was running fine for 1 month. But suddenly - 29th of october I noticed that 2 nodes are having GTID which is +1 than the 3rd node. I started to investigate what is issue. It seams that mariadm on node with name node2 was self-restarted because server run out of RAM. And after restart the first new transaction executed was executed and logged in binary logs with the same GTID as the last transaction before the restart - so the node2 executed 2 transactions (the last before restart and the first after restart) with the same GTID!
I am attaching combined screenshots where we can see difference between node2 and node1 binary logs - green lines marks situation so far good. The red ones marks what has gone wrong.

I am also attaching the config file and the error log files from node1 and node2. Hope this helps to find out the cause.
gtid_domain_id on each server is different 1 on node1, 2 on node2 and 3 on node 3 as recomended in mariadb docs link above.

This situation leads also to the problem of replica server. My replica server (slave) now is attached to node2. All the nodes - node1, node2 and node3 have enabled binary logs. Before problem with GTID arrived I was able to switch the replica server to any cluster node and it was syncing fine. Now as GTIDs differs - this is not possible.

No data loss is detected as it is just mess up with GTID numbers which causes also problem with replica server - no option to attach it to other cluster node except the node2 right now.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb.override.cnf
2 kB
2021-10-30 21:55
node1.mariadb.err.log
10 kB
2021-11-01 06:40
node2.mariadb.err.log
17 kB
2021-11-01 06:40
ONE-GTID-2-queries.png
329 kB
2021-10-30 21:55
out-of-sync-all.PNG
28 kB
2021-11-11 21:03
Servers.PNG
34 kB
2021-10-30 22:01

Activity

Ascending order - Click to sort in descending order

Mario Karuza (Inactive) added a comment - 2021-11-02 07:39

Hi,

What is your `gtid_domain_id` on servers ? Are all tables use InnoDB SE ?

Mario Karuza (Inactive) added a comment - 2021-11-02 07:39 Hi, What is your `gtid_domain_id` on servers ? Are all tables use InnoDB SE ?

Normunds Puzo added a comment - 2021-11-02 08:01

gtid_domain_id on each server is different 1 on node1, 2 on node2 and 3 on node 3.
Yes. All databases are innodb.
If some of the databases would not be innodb, the GTID would contain additional internal id located after comma sign.

Normunds Puzo added a comment - 2021-11-02 08:01 gtid_domain_id on each server is different 1 on node1, 2 on node2 and 3 on node 3. Yes. All databases are innodb. If some of the databases would not be innodb, the GTID would contain additional internal id located after comma sign.

Normunds Puzo added a comment - 2021-11-06 23:05

Is there a way to update GTID on node manually so it matches the other nodes GTID?

Normunds Puzo added a comment - 2021-11-06 23:05 Is there a way to update GTID on node manually so it matches the other nodes GTID?

Normunds Puzo added a comment - 2021-11-09 06:44 - edited

Additional info: Mariadb on node2 was self restarted, because the node run out of RAM for short period of time. No swap was enabled on the server. Total 8GB of RAM.

Normunds Puzo added a comment - 2021-11-09 06:44 - edited Additional info: Mariadb on node2 was self restarted, because the node run out of RAM for short period of time. No swap was enabled on the server. Total 8GB of RAM.

Normunds Puzo added a comment - 2021-11-11 21:04

Now node 1 is behind node 2 with 10 numbers and node3 with 9 numbers. Nothing in /var/lib/mysql/mariadb.err log appears on any of nodes during the period the GTIDs changed again...

Normunds Puzo added a comment - 2021-11-11 21:04 Now node 1 is behind node 2 with 10 numbers and node3 with 9 numbers. Nothing in /var/lib/mysql/mariadb.err log appears on any of nodes during the period the GTIDs changed again...

People

Assignee:: Seppo Jaakola

Reporter:: Normunds Puzo

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2021-10-30 21:56

Updated:: 2023-02-01 08:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration