[MDEV-14255] Broken SST on Debian in 10.2.10 Created: 2017-11-02  Updated: 2019-12-09  Resolved: 2019-12-09

Status: Closed
Project: MariaDB Server
Component/s: Galera, wsrep
Affects Version/s: 10.2.10
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: DEZILLIUM LIMITED Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 3
Labels: galera, need_feedback
Environment:

Debian 9.2



 Description   

Hello,

MariaDB is broken (again) on Debian.

10.2.6: broken libmariadb3
10.2.7: unreleased for Debian
10.2.8: broken libmariadb3
10.2.9: working
10.2.10: broken sst

What ever testing is done for Debian and branch 10.2, is simply not working. 10.2 is supposed to be a stable release, yet so far half of the releases for that branch on Debian aren't working.

Enough with the rant, on to the details:

  • version 10.2.10, packages from MariaDB for Debian
  • Debian 9.2
  • apt-get upgrade fails. syslog shows:

Nov  2 10:25:38 : 2017-11-02 10:25:38 140222833100544 [Warning] WSREP: Gap in state sequence. Need state transfer.
Nov  2 10:25:38 : 2017-11-02 10:25:38 140222452963072 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '[redacted]' --datadir '/var/lib/mysql/'   --parent '23336'  '' '
Nov  2 10:25:38 : /usr//bin/wsrep_sst_xtrabackup-v2: line 646: WSREP_SST_OPT_PORT: unbound variable
Nov  2 10:25:38 : 2017-11-02 10:25:38 140222452963072 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '[redacted]' --datadir '/var/lib/mysql/'   --parent '23336'  ''
Nov  2 10:25:38 : #011Read: '(null)'
Nov  2 10:25:38 : 2017-11-02 10:25:38 140222452963072 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '[redacted]' --datadir '/var/lib/mysql/'   --parent '23336'  '' : 1 (Operation not permitted)
Nov  2 10:25:38 : 2017-11-02 10:25:38 140222833100544 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
Nov  2 10:25:38 : 2017-11-02 10:25:38 140222833100544 [ERROR] Aborting
Nov  2 10:25:46 : Error in my_thread_global_end(): 1 threads didn't exit

Thank you



 Comments   
Comment by DEZILLIUM LIMITED [ 2017-11-02 ]

Implementing the workaround in https://jira.mariadb.org/browse/MDEV-14256, results in even more issues.

SST cannot start because it cannot properly set up the encryption for it:

Nov  2 20:03:15 -wsrep-sst-joiner: 2017/11/02 20:03:15 socat[8884] E SSL_accept(): error:1408F10B:SSL routines:ssl3_get_record:wrong version number
Nov  2 20:03:15 -wsrep-sst-joiner: Error while getting data from donor node:  exit codes: 139 0

it also showed a deprecated warning:

Nov  2 19:23:45 -wsrep-sst-joiner: WSREP_SST: [WARNING] **** WARNING **** encrypt=3 is deprecated and will be removed in a future release (20171102 19:23:45.134)

That's weird, taking a look into /usr/bin/wsrep_sst_xtrabackup-v2:

        elif [[ $encrypt -eq 4 ]]; then
            wsrep_log_info "Using openssl based encryption with socat: with key, crt, and ca"
 
            verify_file_exists "$ssl_ca" "CA, certificate, and key files are required." \
                                         "Please check the 'ssl-ca' option.           "
            verify_file_exists "$ssl_cert" "CA, certificate, and key files are required." \
                                           "Please check the 'ssl-cert' option.         "
            verify_file_exists "$ssl_key" "CA, certificate, and key files are required." \
                                          "Please check the 'ssl-key' option.          "
 
            # Check to see that the key matches the cert
            verify_cert_matches_key $ssl_cert $ssl_key
 
            stagemsg+="-OpenSSL-Encrypted-4"
            if [[ "$WSREP_SST_OPT_ROLE"  == "joiner" ]]; then
                wsrep_log_info "Decrypting with CERT: $ssl_cert, KEY: $ssl_key, CA: $ssl_ca"
                tcmd="socat -u openssl-listen:${TSST_PORT},reuseaddr,cert=${ssl_cert},key=${ssl_key},cafile=${ssl_ca},verify=1${joiner_extra}${sockopt} stdio"
            else
                wsrep_log_info "Encrypting with CERT: $ssl_cert, KEY: $ssl_key, CA: $ssl_ca"
                tcmd="socat -u stdio openssl-connect:${REMOTEIP}:${TSST_PORT},cert=${ssl_cert},key=${ssl_key},cafile=${ssl_ca},verify=1${donor_extra}${sockopt}"
            fi

OK, new way to do things. The "encrypt = 4" option is present in 10.2.9 but there is no deprecation warning, let's do it the new way then. Check the release notes. Nada. Check the changelog. Nada. Had to go to Percona's documentation to get the info. And since the documentation is missing, here's what you need to change in your configuration (just for future reference, I'm almost positive nobody else uses encrypted SST on this planet):

[sst]
encrypt=4
ssl-ca="/your/ca-cert.pem"
ssl-key="/your/node.key"
ssl-cert="/your/node.pem"

Even with those changes, the wrong ssl version error still shows up, as shown above. I will restate that the only change performed on this node was updating MariaDB. All other nodes in the cluster are exactly up to the same point as this node, except the MariaDB update. Forcing a different node to perform an SST works as expected.

Comment by Andrii Nikitin (Inactive) [ 2017-11-28 ]

I cannot reproduce error in 'routines:ssl3_get_record:wrong version number' on jessie with 10.2.10 when generating certificates like in https://github.com/AndriiNikitin/mariadb-environs/blob/master/_plugin/configure/m-all/configure_ssl.sh - maybe it is some problem with your SSL certificates? Both encrypt=3 and encrypt=4 does work in tests I tried with mentioned fix from MDEV-14256

Could you confirm exact algorithm or commands used while creating certificates or provide non-sensitive output of command like:
openssl version
openssl x509 -inform pem -in client-cert.pem -noout -text

If you wish I can share full exact commands used to generate docker image(s), so you may spot eventual difference.

Comment by Andrii Nikitin (Inactive) [ 2017-11-28 ]

Tried in debian stretch 9.2 as well - both encrypt=3 and encrypt=4 work as expected.

Comment by DEZILLIUM LIMITED [ 2017-12-21 ]

Revisiting this since after another server was brought to 10.2.10, that server dropped off the cluster with the wrong ssl version error.

openssl version
OpenSSL 1.1.0f 25 May 2017

openssl x509 -inform pem -in client-cert.pem -noout -text
Certificate:
Data:
Version: 1 (0x0)
Serial Number: 37 (0x25)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C = [redacted], ST = [redacted], O = [redacted], OU = [redacted]
Validity
Not Before: Sep 28 22:06:27 2017 GMT
Not After : Sep 4 22:06:27 2117 GMT
Subject: C = [redacted], ST = [redacted], L = [redacted], O = [redacted], CN = [redacted]
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
[redacted]
Exponent: 65537 (0x10001)
Signature Algorithm: sha256WithRSAEncryption
[redacted]

Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-12 ]

the used Environment:Debian 9.2 is obsoleted by Debian 9.3

Comment by Jan Lindström (Inactive) [ 2019-12-04 ]

Does this problem exists on more recent version of 10.2 e.g. on 10.2.29 ?

Generated at Thu Feb 08 08:12:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.