Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30541

IST always fails -- wsrep_sst_mariabackup does not handle "secret" correctly when doing an IST

Details

    Description

      I am rewriting this description completely, having learned a lot more. I have also changed the title of this report.

      I have a Galera cluster using [sst] encrypt=3 with SST mode mariabackup. Whenever a node gracefully shuts down and then comes back up, it fails. Retrying (which by default happens automatically because of systemd) ultimately succeeds, but only after a full SST is done.

      That is, IST always fails. Trying again results in an SST which succeeds.

      My database is big enough that this is not really acceptable, and it doesn't seem to be the intended behavior. I narrowed it down to an error in syslog "Donor does not know my secret!".

      Sure enough, in wsrep_sst_mariabackup, when we are NOT bypassing (that is, in full SST mode), there is the following:

      if [ -n "$WSREP_SST_OPT_REMOTE_PSWD" ]; then

      1. Let joiner know that we know its secret
        echo "$SECRET_TAG $WSREP_SST_OPT_REMOTE_PSWD" >> "$MAGIC_FILE"
        fi

      And when we ARE bypassing (that is, in IST mode) it is missing.

      I've modified wsrep_sst_mariabackup to add that statement in bypass mode, just after the $MAGIC_FILE is initially written, and now my nodes can come up with a quick IST rather than a long SST.

      Attachments

        Issue Links

          Activity

            It looks like wsrep_sst_mariabackup in 10.11.7 centralizes the logic for handling the MAGIC_FILE. From what I can tell looking at the code, this should fix the issue, but I haven't yet tried it.

            xan@biblionix.com Xan Charbonnet added a comment - It looks like wsrep_sst_mariabackup in 10.11.7 centralizes the logic for handling the MAGIC_FILE. From what I can tell looking at the code, this should fix the issue, but I haven't yet tried it.

            I've just upgraded a machine to 10.11.7 without modifying wsrep_sst_mariabackup, and it seems to work!

            Looks like this was the commit that fixed it:
            https://github.com/MariaDB/server/commit/66fafdb9227dc39ed0dadec12435880b6b060b8e

            The bug was reported in MDEV-32344 a mere 8 months after being reported here.

            xan@biblionix.com Xan Charbonnet added a comment - I've just upgraded a machine to 10.11.7 without modifying wsrep_sst_mariabackup, and it seems to work! Looks like this was the commit that fixed it: https://github.com/MariaDB/server/commit/66fafdb9227dc39ed0dadec12435880b6b060b8e The bug was reported in MDEV-32344 a mere 8 months after being reported here.
            quulah Miika Kankare added a comment -

            Thanks for the research Xan!

            I've now also upgraded to 10.11.7 and IST does seem to be working as expected.

            quulah Miika Kankare added a comment - Thanks for the research Xan! I've now also upgraded to 10.11.7 and IST does seem to be working as expected.

            I'd like to mark this resolved but I don't see any way to do it.

            xan@biblionix.com Xan Charbonnet added a comment - I'd like to mark this resolved but I don't see any way to do it.

            Thanks xan@biblionix.com for the research, closed as already fixed by MDEV-32344

            sysprg Julius Goryavsky added a comment - Thanks xan@biblionix.com for the research, closed as already fixed by MDEV-32344

            People

              sysprg Julius Goryavsky
              xan@biblionix.com Xan Charbonnet
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.