Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.6.16
-
None
-
Redhat 8 on VMware
Description
Our environment is 3-nodes Galera clusters - 2 DB nodes + 1 arbitrator.
We started running aritrator on redhat 8. Encounter reboot one DB node causing cluster down. This does not happen when arbitartor on redhat 7.
DB node version : MariaDB 10.6.16 on redhat 7/8
arbitrator version : 26.4.14 or 26.4.16 on redhat 8
Interconnection TCP netstat (galera port 18301)
┌-- arbitrator <-┐
|
V |
|
DB node 1 <--- DB node 2
|
|
DB node 1:
|
[root@t1vdbs-gcisdba-el8aa22-1-01 ~]# netstat -an | grep 18301 | grep "\.27:" | grep 18301 | grep ESTABLISHED
|
tcp 0 0 172.25.213.27:18301 172.25.223.27:41579 ESTABLISHED
|
tcp 0 0 172.25.213.27:18301 172.24.134.27:40817 ESTABLISHED
|
|
DB node 2:
|
[root@t2vdbs-gcisdba-el8aa22-2-01 errorlog]# netstat -an | grep 18301 | grep "\.27:" | grep 18301 | grep ESTABLISHED
|
tcp 0 0 172.25.223.27:60023 172.24.134.27:18301 ESTABLISHED
|
tcp 0 0 172.25.223.27:41579 172.25.213.27:18301 ESTABLISHED
|
|
arbitrator:
|
[si00chw@t1vdbs-gcissc-witness03d witness]$ netstat -an | grep 18301 | grep "\.27:" | grep 18301 | grep ESTABLISHED
|
tcp 0 0 172.24.134.27:18301 172.25.223.27:60023 ESTABLISHED
|
tcp 0 0 172.24.134.27:40817 172.25.213.27:18301 ESTABLISHED
|
|
When we reboot DB node 1 guest OS,
- DB node 2 detects node 1 down (see attached file node2-errorlog.txt)
- arbitrator does not have any log about node 1 down
- DB node 2 got isolated and DB cluster down
We tried to use OS "nc" command to check further.
- nc output from arbitrator to DB node 2 - keeps "connected" when we reboot DB node 1.
- nc output from DB node 2 to arbitrator - changed from "connected" to "connection refused" when we reboot DB node 1. The output is immediately and not timeout. Thus, firewall should be opened.
Kindly advise what we can do to further troubleshoot this case.
node 2 error log starts from 2024-01-31 18:31:34, and at that point arbitrator had already connectivity issues, first messages are:
After that node 2 looses connection both to node 1 and the arbitrator, and cannot resume connections until the end of the error log sample.
How is node 1 restarted, is it full container/server reboot or just mariadb server restart?
Galera release log has comment for redhat 8 install:
Please check if this was carried out in your installation.
Does this problem happen always when node 1 or node 2 is restarted, or is it temporary problem?
Please attach related logs from node 1, node 2 and arbitrator over the complete period of networking problems.