[MDEV-35823] Galera cluster down with error in one DB evs_proto.cpp:handle_install_timer():728 - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.6.16
Fix Version/s: 10.11
Component/s: Galera, Galera Arbitrator garbd
Labels:
None
Environment:
RHEL8

Description

Hi Support Team,

We have a 3-node cluster, 2 of which are DB (service IP addr: 172.17.153.82 and 172.21.153.82) and 1 is Galera witness (IP addr: 172.18.16.82). The application connects to DB via an HA proxy located at the same site as the DB, which will hunt between the 2 DB nodes and connect to the healthy one.

All 3 nodes (2 DBs + 1 witness) are at different sites. There was a network maintenance at the site 172.17.x.x and we expected that the DB on that site would be inaccessible while the other 2 nodes will still form a cluster, and applications can still write to the DB node at 172.21.x.x. However, the application failed to connect to the remaining DB node (Lost connection to server at 'handshake: reading initial communication packet', system error: 11).

We needed to bootstrap after the network maintenance was over.

At node 1, the following error was observed:
exception from gcomm, backend must be restarted: evs::proto(fac95f38-8d6b, GATHER, view_id(REG,15b6ecac-8b9b,58)) failed to form singleton view after exceeding max_install_timeouts 3, giving up (FATAL) at /home/buildbot/buildbot/build/gcomm/src/evs_proto.cpp:handle_install_timer():728

On the other hand, the log files at the other nodes appeared to be the expected ones. We would like to know:

1. Why bootstraping was needed to resume?
2. Is the above error message normal and what is its meaning? A similar ticket MDEV-32110 has been raised by someone else but there is no feedback so far.

Please advise.

Thanks and best regards,

Lawrence

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

galera-garbd.log.20250822.log
150 kB
2025-08-22 02:12
galera-garbd.log-20250111-habyapp-proddb2
306 kB
2025-01-15 05:10
mariadb-error.log.202508220502.node1.log
166 kB
2025-08-22 02:12
mariadb-error.log.202508220502.node2.log
35 kB
2025-08-22 02:12
mariadb-error.log-20250111-node1
135 kB
2025-01-15 05:10
mariadb-error.log-20250111-node2
121 kB
2025-01-15 05:10

Activity

People

Assignee:: Jan Lindström

Reporter:: Lawrence Man

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2025-01-12 09:34

Updated:: 2025-08-22 02:12

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.