Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20439

WSREP_CLUSTER_SIZE at 0 after rolling update a node

Details

    Description

      Hello,

      I'm currently doing a rolling upgrade of my cluster mariadb galera from 10.3.17 to 10.4.7
      I upgraded just a node following instructions.
      All seem working fine (the last commit is the same etc... ) The WSREP_CLUSTER_SIZE is 3 on my 2 nodes in 10.3.17 but the WSREP_CLUSTER_SIZE is 0 on my node in 10.4.7 .

      i have this warning in log but don"t know if it's linked to my problem .
      " WSREP: View recovered from stable storage was empty. If the server is doing rolling upgrade from previous version which does not support storing view info into stable storage, this is ok. Otherwise this may be a sign of malfunction. "

      I'm little afraid to continu rolling upgrade another node ...

      Thanks by advance

      Attachments

        Issue Links

          Activity

            I presume, it may be the same as MDEV-22723, which probably may be fixed by using new Galera4 lib, like 26.4.4(rae24803), see explanations there by Yurchenko.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - I presume, it may be the same as MDEV-22723 , which probably may be fixed by using new Galera4 lib, like 26.4.4(rae24803), see explanations there by Yurchenko .
            cmcgrail Chris McGrail added a comment -

            We were/are using 26.4.4.

            If I read that bug report correctly the fix is in 4.5.

            "In any case this bug (and many other) is fixed in 4.5 release tag. All MariaDB 10.4 users should switch to it. It will solve a lot of trouble."

            It doesn't look like 4.5 is GA yet. We seem to be good now and will of course apply dot release updates as they are available.

            In any event, it is good to see something published about the issue we saw. It is unsettling to have an error that has no matches found in a web search.

            cmcgrail Chris McGrail added a comment - We were/are using 26.4.4. If I read that bug report correctly the fix is in 4.5. "In any case this bug (and many other) is fixed in 4.5 release tag. All MariaDB 10.4 users should switch to it. It will solve a lot of trouble." It doesn't look like 4.5 is GA yet. We seem to be good now and will of course apply dot release updates as they are available. In any event, it is good to see something published about the issue we saw. It is unsettling to have an error that has no matches found in a web search.
            Shi Yan Shi Yan added a comment - - edited

            We are having the same issue when rolling upgrade from 10.3 to 10.4/10.5, The galera version is 26.4.6.
            After the rolling upgrade, the strange value are shown in the first node(get upgraded) but looks the cluster sync is fine and info from other un-upgraded nodes are still good.

            wsrep_cluster_size 0
            wsrep_local_index 18446744073709551615

            [update]
            We found out that when other node mariadb stops running, these values will be refreshed and turn to good. For example, we upgrade the 1st node to 10.5, then the wrong value happens on the 1st node, but looks the cluster is still synced and value are good on our 2nd or 3rd nodes. Then when we stop mariadb on the 2nd nodes, the 1st will get correct value. But same thing happens on the 2nd node after it gets upgraded.
            Also when all of the three nodes get upgraded, the value will also be good.

            Shi Yan Shi Yan added a comment - - edited We are having the same issue when rolling upgrade from 10.3 to 10.4/10.5, The galera version is 26.4.6. After the rolling upgrade, the strange value are shown in the first node(get upgraded) but looks the cluster sync is fine and info from other un-upgraded nodes are still good. wsrep_cluster_size 0 wsrep_local_index 18446744073709551615 [update] We found out that when other node mariadb stops running, these values will be refreshed and turn to good. For example, we upgrade the 1st node to 10.5, then the wrong value happens on the 1st node, but looks the cluster is still synced and value are good on our 2nd or 3rd nodes. Then when we stop mariadb on the 2nd nodes, the 1st will get correct value. But same thing happens on the 2nd node after it gets upgraded. Also when all of the three nodes get upgraded, the value will also be good.

            julien.fritsch Shi Yan reported that they do not suffer from these wrong values and everything is good, so, I suppose, there is no strong reason to worry here. But in the customer ticket I have not seen a fresh reply from the customer if it is still a problem for them or not. Waiting...

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited julien.fritsch Shi Yan reported that they do not suffer from these wrong values and everything is good, so, I suppose, there is no strong reason to worry here. But in the customer ticket I have not seen a fresh reply from the customer if it is still a problem for them or not. Waiting...

            Closing it as not reproduced since the customer is not experiencing this anymore.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - Closing it as not reproduced since the customer is not experiencing this anymore.

            People

              stepan.patryshev Stepan Patryshev (Inactive)
              slevieux Levieux stéphane
              Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.