[MDEV-10934] SST fails when SSL is enabled Created: 2016-10-01 Updated: 2017-11-06 Resolved: 2017-10-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera SST |
| Affects Version/s: | 10.1.18 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | DEZILLIUM LIMITED | Assignee: | Andrii Nikitin (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Debian 8.6, packages from MariaDB repo, Galera replication enabled |
||
| Issue Links: |
|
||||||||||||||||
| Sprint: | 10.1.20 | ||||||||||||||||
| Description |
|
Enabling SSL for SST results in a a split brain when a new node joins the cluster: Oct 1 15:01:20 1 -innobackupex-backup: 161001 15:01:20 [01] Encrypting and streaming ./ibdata1 Logs have been sanitized of course (IP/hostnames). Configuration files were taken from a production MariaDB cluster running Galera replication. The only changes made were: Here are the .conf settings: [some long key] is of course edited, there is a string generated from openssl, without the square brackets. Tried with encrypt=1 and encrypt=3 and still failing. wsrep_sst_method=xtrabackup-v2 declared in a [mysqld] section. SST succeeds when streamfmt is set to tar, but of course that is unencrypted. This is not a firewall issue, it has been verified that there are rules enabling all the nodes to talk to each other. Tried with 10.1.17 and 10.1.18 released today. |
| Comments |
| Comment by DEZILLIUM LIMITED [ 2016-10-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The issue lies with socat. Using socat from jessie (Debian) doesn't work and always results in 100% CPU usage with processes that need to be killed with a -9 signal in order to terminate (100% reproducible). Using socat from jessie-backports results in a hostname lookup failure (joiner does not have a DNS A record pointed to it, but IS declared in /etc/hosts). Editing /usr/bin/wsrep_sst_xtrabackup-v2: Change to: This gets the SST progress further. Since the certificates where generated with a hostname as their CN, their verification fails and SST stops again. Generating a new set of certificates with IPs as CNs (but leaving the old ones still used by wsrep, just not for SST), and setting those under a [sst] segment (keeping the old tca, since the new set is also signed by the same CA) allows the SST to complete, fully encrypted Edited to add: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrii Nikitin (Inactive) [ 2017-10-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you for your analysis and confirmation. I could reproduce similar issue with encrypt=1 (without SSL-certificated nodes) in 10.1.18 with message on joiner:
Just the same command succeeds in 10.1.28 , so while problem really exists in previous version, there is nothing to fix at the moment.
To force sst xtrabackup-v2 with encryption I paste command below into shell (use both encrypt=1 and encrypt=3):
To make it work in 10.1.28 I had to address several bugs and patch related scripts in unpacked tar image:
So, since original problem looks solved - I will close this call with resolution 'fixed'. |