[MDEV-22668] "Flush SSL" command doesn't reload wsrep cert Created: 2020-05-22  Updated: 2021-08-10  Resolved: 2021-08-02

Status: Closed
Project: MariaDB Server
Component/s: SSL, wsrep
Affects Version/s: 10.4
Fix Version/s: 10.4.19, 10.5.10

Type: Bug Priority: Critical
Reporter: Ricardo Melo Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 3
Labels: None

Issue Links:
Duplicate
PartOf

 Description   

"Flush SSL" is an awesome feature. It's nice to be able to renew TLS cert without restart servers. But looks like it is not triggering reload cert file from wsrep_provider_options. It will break replication when we need to restart a cluster node.
Should be nice to be able to call wsrep ssl reload.

Note: I am running MariaDB 10.4.13



 Comments   
Comment by Zephaniah Loss-Cutler-Hull [ 2021-01-14 ]

We are seeing the same behavior, with the addition that this breaks the cluster after certificate expiration if any of the connections have to be reset.

This is a pretty big issue in our environment.

Comment by Daniel Almeida (Inactive) [ 2021-01-14 ]

We have a client also experiencing this same issue.

+-----------------------------------+
| @@version                         |
+-----------------------------------+
| 10.4.17-10-MariaDB-enterprise-log |
+-----------------------------------+
 
+---------------------+-------------+
| Variable_name       | Value       |
+---------------------+-------------+
| wsrep_patch_version | wsrep_26.22 |
+---------------------+-------------+

Comment by Zephaniah Loss-Cutler-Hull [ 2021-03-17 ]

Do you have the bug number that this duplicates?

Comment by Sergei Golubchik [ 2021-03-17 ]

not really a duplicate, so reopened

Comment by Seppo Jaakola [ 2021-04-13 ]

There is a pull request for this submitted: https://github.com/MariaDB/server/pull/1801
But the PR is not linked with this MDEV, probably because of bad format title line in commit message

Comment by Seppo Jaakola [ 2021-04-13 ]

PR 1801 and the new Galera 4.8 library should contain a fix for this issue

Comment by Jan Lindström (Inactive) [ 2021-04-15 ]

Review and testing.

Comment by Zephaniah Loss-Cutler-Hull [ 2021-07-21 ]

This will be added on a bit by others, but to summarize some information:

FLUSH SSL is not properly triggering the galera reset in our environment, and last night this caused our production DB clusters to fail.

Investigation shows that even in 10.5.10.7, FLUSH SSL is only causing the MariaDB port 3306 to pick up the new certificate, while WSREP on port 4567 keeps the old certificate.

Running 'SET GLOBAL wsrep_provider_options = 'socket.ssl_reload=1';' as the DB root user does cause WSREP on port 4567 to pick up the new certificate.

Looking at the test case added in commit c3b016efde4b1e0c2b85ca26c814ad43f5611ab2, I see that it only ever tests to see if reconnection is possible after running the SET GLOBAL, but while it does later run a FLUSH SSL, it then immediately goes into clean up instead of testing to see if that worked properly.

As such, I'm pretty sure that this needs to be reopened, and people trying to use this feature need to be aware that FLUSH SSL is still insufficient when using WSREP, and that a workaround is currently possible by adding the SET GLOBAL to the sequence.

Comment by Mario Karuza (Inactive) [ 2021-07-22 ]

Hi Zephaniah ,

Can you run with wsrep debugging enabled (wsrep_debug=1) and report back trace.

Comment by Mario Karuza (Inactive) [ 2021-07-27 ]

Hi, there was problem with galera library. It should be fixed in new version.

Comment by Zephaniah Loss-Cutler-Hull [ 2021-07-28 ]

My apologies for the delay in getting the debug output, is that still needed?

And do you have the fix commit on the galera library?

Generated at Thu Feb 08 09:16:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.