[MDEV-26018] Update breaks cluster Created: 2021-06-24 Updated: 2021-12-22 Resolved: 2021-12-22 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | wsrep |
| Affects Version/s: | 10.4.20 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Tim van Dijen | Assignee: | Seppo Jaakola |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Red Hat Enterprise Linux Server release 7.9 (Maipo) Linux 3.10.0-1160.25.1.el7.x86_64 #1 SMP Tue Apr 13 18:55:45 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux 3-node wsrep-cluster |
||
| Attachments: |
|
| Description |
|
After updating from 10.4.19 > 10.4.20 I couldn't get any secondary nodes to join the cluster. I was able to resolve this by downgrading back to 10.4.19 |
| Comments |
| Comment by Julius Goryavsky [ 2021-06-25 ] |
|
tvdijen Hi! I looked at the log and it looks as if the encryption mode encrypt = 2 was changed to encrypt = 4, but without renaming the tca parameter to tkey or ssl-ca to ssl-key (what needs to be done when switching from encrypt = 2 to encrypt = 4). Please tell me if such configuration changes have been made? Or is it the result of some kind of automatic change? |
| Comment by Tim van Dijen [ 2021-06-25 ] |
|
Hi @Julius Goryavsky! Config wasn't changed between updates. It's been like this for years; encrypt = 4 So maybe I've been doing it all wrong for years, but it used to work.. |
| Comment by Tim van Dijen [ 2021-06-25 ] |
|
I've tried setting ssl-ca, ssl-cert and ssl-key instead, but the issue remains.. |
| Comment by Jan Lindström (Inactive) [ 2021-06-25 ] |
|
Please provide error logs from latest try. |
| Comment by Tim van Dijen [ 2021-06-25 ] |
|
OK, so as Julius pointed out I've set config from: [sst] To: [sst] On 10.4.19 I can bootstrap the cluster and join the secondaries. When I update to 10.4.20 it breaks in a similar fashion as with the old config. See attached logs. I've also added my server.cnf |
| Comment by Rob Brown [ 2021-07-23 ] |
|
CONFIRMED! I'm seeing the same problem. MariaDB 10.4.20 is able to successfully join a valid Galera. So that's good. And MariaDB 10.4.20 works fine as a Galera cluster. So that's good. Any MariaDB 10.4.20 that joins a Galera cluster via IST (Incremental State Transfers) can be slaved from using GTID. So that's good. But any MariaDB 10.4.20 node that has ever joined a Galera cluster via SST (State Snapshot Transfers) can never be slaved from using GTID. BAD! I was able to duplicate this issue 100% of the time: Galera Node #1: [root@node1 ~]# galera_new_cluster Galera Node #2: [root@node2 ~]# systemctl stop mariadb Node 2 uses SST to join Node 1 and the cluster links up fine. So that's good. Slave Node #3: Configure to replicate from Node #1 using hard-coded position: MariaDB[3]> CHANGE MASTER TO MASTER_HOST='node1', MASTER_LOG_FILE='mysql-binlog.000003', MASTER_LOG_POS=4; It works fine. So that's good. Then switch to GTID mode: MariaDB[3]> CHANGE MASTER TO MASTER_USE_GTID=slave_pos; It still works fine. So that's good. Try slaving from Node 2: MariaDB[3]> CHANGE MASTER TO MASTER_HOST='node2'; It will break because Node 2 is 10.4.20 that used SST to join the cluster. Seconds_Behind_Master: NULL Switch back to the goodness: MariaDB[3]> CHANGE MASTER TO MASTER_HOST='node1'; And it will replicate perfectly if Node 1 had never used SST: Seconds_Behind_Master: 0 Downgrade Node 2 to 10.4.19 [root@node2 ~]# systemctl stop mariadb Then Node 3 will suddenly be able to replicate from anywhere again: MariaDB[3]> CHANGE MASTER TO MASTER_HOST='node2'; Seconds_Behind_Master: 0 MariaDB[3]> CHANGE MASTER TO MASTER_HOST='node1'; Seconds_Behind_Master: 0 If you keep all Galera Master servers on 10.4.19 and DO NOT upgrade to 10.4.20, then you'll be safe. Replication Slaves are safe to upgrade to 10.4.20 (as long as you never promote them to Master). |
| Comment by Rob Brown [ 2021-08-07 ] |
|
RESOLUTION CONFIRMED! 10.4.19 = GOOD VERIFICATION: [root@node2 ~]# systemctl stop mariadb GTID Slaving works. [root@node2 ~]# systemctl stop mariadb GTID Slaving works. [root@node2 ~]# systemctl stop mariadb GTID Slaving FAILURE! [root@node2 ~]# systemctl stop mariadb GTID Slaving still FAILURE! [root@node2 ~]# systemctl stop mariadb GTID Slaving works again. [root@node2 ~]# systemctl restart mariadb GTID Slaving still works. Changing MariaDB version on Slave server [node3] to 10.4.19 or 10.4.20 or 10.4.21 has no effect on the success or failure of GTID Slaving in each scenario. You can close this ticket now. THANKS! |
| Comment by Tim van Dijen [ 2021-08-26 ] |
|
I'm still experiencing the exact same issue with 10.4.21... Cluster bootstraps just fine, secondaries won't join. I think my setup is different from Rob's, because I'm running multi-master and never did anything like promoting slaves to master as he does in his comments above. |
| Comment by Tim van Dijen [ 2021-11-15 ] |
|
We've just tried upgrading to 10.5.x and that worked.. This issue may be closed now! |
| Comment by Ralf Gebhardt [ 2021-12-22 ] |
|
Reported to be fixed with newer versions |