Details

    Description

      We have three nodes MariaDB-Galera-Cluster 10.0.30
      Today one of nodes (Node Number 2) disconnected from group.
      After reviewing error log, I saw an error happened:
      This was error:
      170418 18:11:48 [ERROR] WSREP: Local state seqno (3472060319) is greater than group seqno (3472057835): states diverged. Aborting to avoid potential data loss. Remove '/home/mysql//grastate.dat' file and restart if you wish to continue. (FATAL).
      So after review all three servers log, I think in the interval 18:06:58 to 18:11:50 The internet connection of node 2 have been disorder, and in the meantime (When it was Non-primary) several query could be executed and commited.
      I've attached my error logs to see.
      Node 1 IP : ..*.35
      Node 2 IP : ..*.60 (Impacted)
      Node 3 IP : ..*.206
      Sorry for my bad English.

      Attachments

        1. node1.log
          21 kB
        2. node2.log
          27 kB
        3. node3.log
          22 kB

        Activity

          HamoonDBA Hamoon Mohammadian Pour created issue -
          elenst Elena Stepanova made changes -
          Field Original Value New Value
          Assignee Sachin Setiya [ sachin.setiya.007 ]
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0-galera [ 21901 ]
          serg Sergei Golubchik made changes -
          Assignee Sachin Setiya [ sachin.setiya.007 ] Jan Lindström [ jplindst ]

          Support of 10.0-galera has ended.

          jplindst Jan Lindström (Inactive) added a comment - Support of 10.0-galera has ended.
          jplindst Jan Lindström (Inactive) made changes -
          Fix Version/s N/A [ 14700 ]
          Fix Version/s 10.0-galera [ 21901 ]
          Resolution Won't Fix [ 2 ]
          Status Open [ 1 ] Closed [ 6 ]

          I experience the same on MariaDB 10.3 and I have seen it before on other versions.

          I think this has to do with network instability and frequent IST on clusters that are actively used. Somehow, if the write node gets disconnected a lot, it ends up in this state thinking it is inconsistent. Each time I ran into it I tried to find out if it really was inconsistent, but the few transactions that I checked to verify consistency were correct: The modified row was the same in the entire cluster, and the same row was present on all cluster nodes.
          My guess is that IST succeeded but this did not get stored as succeeded in grastate.dat and/or inoodb transactional galera status.

          michaeldg Michaël de groot added a comment - I experience the same on MariaDB 10.3 and I have seen it before on other versions. I think this has to do with network instability and frequent IST on clusters that are actively used. Somehow, if the write node gets disconnected a lot, it ends up in this state thinking it is inconsistent. Each time I ran into it I tried to find out if it really was inconsistent, but the few transactions that I checked to verify consistency were correct: The modified row was the same in the entire cluster, and the same row was present on all cluster nodes. My guess is that IST succeeded but this did not get stored as succeeded in grastate.dat and/or inoodb transactional galera status.
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 80373 ] MariaDB v4 [ 151970 ]

          People

            jplindst Jan Lindström (Inactive)
            HamoonDBA Hamoon Mohammadian Pour
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.