Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6.16
-
None
-
RHEL8
Description
Hi Support Team,
We have a 3-node cluster, 2 of which are DB (service IP addr: 172.17.153.82 and 172.21.153.82) and 1 is Galera witness (IP addr: 172.18.16.82). The application connects to DB via an HA proxy located at the same site as the DB, which will hunt between the 2 DB nodes and connect to the healthy one.
All 3 nodes (2 DBs + 1 witness) are at different sites. There was a network maintenance at the site 172.17.x.x and we expected that the DB on that site would be inaccessible while the other 2 nodes will still form a cluster, and applications can still write to the DB node at 172.21.x.x. However, the application failed to connect to the remaining DB node (Lost connection to server at 'handshake: reading initial communication packet', system error: 11).
We needed to bootstrap after the network maintenance was over.
At node 1, the following error was observed:
exception from gcomm, backend must be restarted: evs::proto(fac95f38-8d6b, GATHER, view_id(REG,15b6ecac-8b9b,58)) failed to form singleton view after exceeding max_install_timeouts 3, giving up (FATAL) at /home/buildbot/buildbot/build/gcomm/src/evs_proto.cpp:handle_install_timer():728
On the other hand, the log files at the other nodes appeared to be the expected ones. We would like to know:
1. Why bootstraping was needed to resume?
2. Is the above error message normal and what is its meaning? A similar ticket MDEV-32110 has been raised by someone else but there is no feedback so far.
Please advise.
Thanks and best regards,
Lawrence
The mariadb-error.log for 2 nodes and the Galera log file are uploaded for your ref.