[MDEV-33356] Galera cluster down when one DB node rebooted when arbitrator on RHEL8 - Jira

XML

Word

Printable

Our environment is 3-nodes Galera clusters - 2 DB nodes + 1 arbitrator.

We started running aritrator on redhat 8. Encounter reboot one DB node causing cluster down. This does not happen when arbitartor on redhat 7.

DB node version : MariaDB 10.6.16 on redhat 7/8
arbitrator version : 26.4.14 or 26.4.16 on redhat 8

Interconnection TCP netstat (galera port 18301)

    ┌-- arbitrator <-┐

    V                |

DB node 1  <---  DB node 2

DB node 1:

[root@t1vdbs-gcisdba-el8aa22-1-01 ~]# netstat -an | grep 18301 | grep "\.27:" | grep 18301 | grep ESTABLISHED

tcp        0      0 172.25.213.27:18301     172.25.223.27:41579     ESTABLISHED

tcp        0      0 172.25.213.27:18301     172.24.134.27:40817     ESTABLISHED

DB node 2:

[root@t2vdbs-gcisdba-el8aa22-2-01 errorlog]# netstat -an | grep 18301 | grep "\.27:" | grep 18301 | grep ESTABLISHED

tcp        0      0 172.25.223.27:60023     172.24.134.27:18301     ESTABLISHED

tcp        0      0 172.25.223.27:41579     172.25.213.27:18301     ESTABLISHED

arbitrator:

[si00chw@t1vdbs-gcissc-witness03d witness]$ netstat -an | grep 18301 | grep "\.27:" | grep 18301 | grep ESTABLISHED

tcp        0      0 172.24.134.27:18301     172.25.223.27:60023     ESTABLISHED

tcp        0      0 172.24.134.27:40817     172.25.213.27:18301     ESTABLISHED

When we reboot DB node 1 guest OS,

We tried to use OS "nc" command to check further.

nc output from arbitrator to DB node 2 - keeps "connected" when we reboot DB node 1.
nc output from DB node 2 to arbitrator - changed from "connected" to "connection refused" when we reboot DB node 1. The output is immediately and not timeout. Thus, firewall should be opened.

Kindly advise what we can do to further troubleshoot this case.

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.