Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
10.0.30-galera
Description
We have three nodes MariaDB-Galera-Cluster 10.0.30
Today one of nodes (Node Number 2) disconnected from group.
After reviewing error log, I saw an error happened:
This was error:
170418 18:11:48 [ERROR] WSREP: Local state seqno (3472060319) is greater than group seqno (3472057835): states diverged. Aborting to avoid potential data loss. Remove '/home/mysql//grastate.dat' file and restart if you wish to continue. (FATAL).
So after review all three servers log, I think in the interval 18:06:58 to 18:11:50 The internet connection of node 2 have been disorder, and in the meantime (When it was Non-primary) several query could be executed and commited.
I've attached my error logs to see.
Node 1 IP : ..*.35
Node 2 IP : ..*.60 (Impacted)
Node 3 IP : ..*.206
Sorry for my bad English.
I experience the same on MariaDB 10.3 and I have seen it before on other versions.
I think this has to do with network instability and frequent IST on clusters that are actively used. Somehow, if the write node gets disconnected a lot, it ends up in this state thinking it is inconsistent. Each time I ran into it I tried to find out if it really was inconsistent, but the few transactions that I checked to verify consistency were correct: The modified row was the same in the entire cluster, and the same row was present on all cluster nodes.
My guess is that IST succeeded but this did not get stored as succeeded in grastate.dat and/or inoodb transactional galera status.