[MDEV-28999] Mariabackup SST fails with Bullseye Upgrade Created: 2022-07-01  Updated: 2022-08-27  Resolved: 2022-08-27

Status: Closed
Project: MariaDB Server
Component/s: Galera SST, mariabackup
Affects Version/s: 10.5.16
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Prachi Jain Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Custom Debian Bullseye Docker container on Debian Bullseye host OS.



 Description   

Server version: 10.5.16-MariaDB-1:10.5.16+maria~bullseye-log



An upgrade from 10.5.16-MariaDB-1:10.5.16+maria~buster-log to 10.5.16-MariaDB-1:10.5.16+maria~bullseye-log (buster to bullseye move) breaks mariabackup SST. The attempt to bootstrap a new cluster and adding another node to the cluster gives the below error on the joiner 


Jun 27 16:56:51 2022-06-27 16:56:51 1 [ERROR] WSREP: Failed to prepare for 'mariabackup' SST. Unrecoverable.
Jun 27 16:56:51 2022-06-27 16:56:51 1 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.


In the donor node’s syslog, below error is seen:


E Failed to set SNI host ""



Further investigation revealed that socat 1.7.4 (recently updated in Debian bullseye) defaults to using Server Name Indication (SNI) which breaks epoptes use of socat (https://github.com/epoptes/epoptes/issues/127) leading to failure of mariabackup SST on Bullseye.

 We resolved the issue by adding the no-sni option to the wsrep_sst_mariabackup script.

if [ $encrypt -lt 2 ]; then
            if [ "$WSREP_SST_OPT_ROLE" = 'joiner' ]; then
                tcmd="socat -u TCP-LISTEN:$SST_PORT,reuseaddr$sockopt stdio"
            else
                tcmd="socat -u stdio TCP:$REMOTEIP:$SST_PORT$sockopt,no-sni=1"
            fi
            return
        fi

In the hope that it may help others in the community, we're opening this bug report.



 Comments   
Comment by Marko Mäkelä [ 2022-07-01 ]

To diagnose this, we’ll need the output from the following commands that the script is executing:
mariadb-backup --backup
mariadb-backup --prepare
If any DDL operations (including TRUNCATE TABLE) could be executed on the cluster at the time of the snapshot transfer, then the root cause might be MDEV-28870, provided that the failure output matches its symptoms.

Generated at Thu Feb 08 10:05:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.