[MDEV-32344] IST "Donor does not know my secret" with ssl-mode=VERIFY_CA Created: 2023-10-03  Updated: 2024-01-01  Resolved: 2023-12-07

Status: Closed
Project: MariaDB Server
Component/s: wsrep
Affects Version/s: 10.3.39, 10.6.15, 10.11.5
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3

Type: Bug Priority: Critical
Reporter: Walter Doekes Assignee: Julius Goryavsky
Resolution: Fixed Votes: 2
Labels: None

Attachments: File issue-IST-donor-does-not-know-my-secret.patch    
Issue Links:
Blocks
Issue split
Relates
relates to MDEV-32342 WSREP_SST_OPT_REMOTE_AUTH bad value, ... Stalled

 Description   

Related to MDEV-32342, it seems that the remote SECRET is not set for ISTs when VERIFY_CA mode is on.

This results in:

WSREP_SST: [ERROR] Donor does not know my secret! (20231003 15:29:10.448)
WSREP_SST: [INFO] Donor: '', my: 'd9ca9b998550fafb64c0ccc822dde463' (20231003 15:29:10.449)

After this failed IST, a restart triggers an SST, which then (luckily) succeeds.

The cause appears to be this changeset:

commit 1ae7673aae7f82c4e659b1337177f2696c8e45ba (origin/bb-10.2-MDEV-24962-final)
Author: Julius Goryavsky <julius.goryavsky@mariadb.com>
Date:   Wed Apr 28 01:39:31 2021 +0200
 
    MDEV-24962: Galera SST innobackupex-move ignores Environment settings

There, a SECRET_TAG is also introduced (not mentioned in the commit message), and it is implemented for rsync and mariabackup, but not for IST:

(sst)

         # Store donor's wsrep GTID (state ID) and wsrep_gtid_domain_id
         # (separated by a space).
-        echo "${WSREP_SST_OPT_GTID} ${WSREP_SST_OPT_GTID_DOMAIN_ID}" > "${MAGIC_FILE}"
+        echo "$WSREP_SST_OPT_GTID $WSREP_SST_OPT_GTID_DOMAIN_ID" > "$MAGIC_FILE"
+
+        if [ -n "$WSREP_SST_OPT_REMOTE_PSWD" ]; then
+            # Let joiner know that we know its secret
+            echo "$SECRET_TAG $WSREP_SST_OPT_REMOTE_PSWD" >> "$MAGIC_FILE"
+        fi

(rsync)

+    if [ -n "$WSREP_SST_OPT_REMOTE_PSWD" ]; then
+        # Let joiner know that we know its secret
+        echo "$SECRET_TAG $WSREP_SST_OPT_REMOTE_PSWD" >> "$MAGIC_FILE"
+    fi
+
     rsync ${STUNNEL:+--rsh="$STUNNEL"} \
           --archive --quiet --checksum "$MAGIC_FILE" rsync://$WSREP_SST_OPT_ADDR

But not for IST:

     else # BYPASS FOR IST
 
         wsrep_log_info "Bypassing the SST for IST"
         echo "continue" # now server can resume updating data
 
         # Store donor's wsrep GTID (state ID) and wsrep_gtid_domain_id
         # (separated by a space).
-        echo "${WSREP_SST_OPT_GTID} ${WSREP_SST_OPT_GTID_DOMAIN_ID}" > "${MAGIC_FILE}"
-        echo "1" > "${DATA}/${IST_FILE}"
+        echo "$WSREP_SST_OPT_GTID $WSREP_SST_OPT_GTID_DOMAIN_ID" > "$MAGIC_FILE"
+        echo "1" > "$DATA/$IST_FILE"

I added the appropriate changes, and then the IST started to work too:

--- wsrep_sst_mariabackup 2023-10-03 19:35:52.008645765 +0200
+++ wsrep_sst_mariabackup 2023-10-03 19:35:43.520780315 +0200
@@ -1189,6 +1189,12 @@ if [ "$WSREP_SST_OPT_ROLE" = 'donor' ];
         # Store donor's wsrep GTID (state ID) and wsrep_gtid_domain_id
         # (separated by a space).
         echo "$WSREP_SST_OPT_GTID $WSREP_SST_OPT_GTID_DOMAIN_ID" > "$MAGIC_FILE"
+
+        if [ -n "$WSREP_SST_OPT_REMOTE_PSWD" ]; then
+            # Let joiner know that we know its secret
+            echo "$SECRET_TAG $WSREP_SST_OPT_REMOTE_PSWD" >> "$MAGIC_FILE"
+        fi
+
         echo "1" > "$DATA/$IST_FILE"
 
         if [ -n "$scomp" ]; then

See config/version at MDEV-32342.

Cheers,
Walter Doekes
OSSO B.V.



 Comments   
Comment by Julius Goryavsky [ 2023-12-07 ]

wdoekes Thank you very much for reporting, the official fix together with the new tests has been placed in the head revisions for 10.4 CS (link in MDEV) and in the head revisions for 10.4, 10.5 and 10.6 ES.

Generated at Thu Feb 08 10:30:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.