[MDEV-27051] large evs.suspect_timeout causing long delcare stable on remaining good node Created: 2021-11-15 Updated: 2022-11-01 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.5.12 |
| Fix Version/s: | 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | William Wong | Assignee: | Teemu Ollakka |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
redhat 7 on VMware |
||
| Description |
|
Hi, Our DB config is 2 data nodes + 1 arbitrator. In one incident, one DB data got VM reboot. We found long declare stable on remaining good node. What causing this long declare stable on good remaining node behavior? Can we have keep large evs.suspect_timeout and short declare stable time? Good Case: 49a4a26a-b4f0 is node got rebooted. declare 3974f500-ba41 stable just a few seconds later. gmcast.peer_timeout=PT15S; 2021-11-15 23:05:58 0 [Note] WSREP: evs::proto(49a4a26a-b4f0, GATHER, view_id(REG,3974f500-ba41,175)) suspecting node: 5256b203-ba1b Long Declare Stable Case: 8f049c9f-9b55 is node got rebooted. declare 8144286d-abff stable 19 seconds later. gmcast.peer_timeout=PT15S; 2021-11-15 23:14:25 0 [Note] WSREP: (8980a003-82ed, 'ssl://172.25.100.204:18301') connection to peer 8f049c9f-9b55 with addr ssl://172.25.100.203:18301 timed out, no messages seen in PT15S, socket stats: rtt: 7793 rttvar: 11996 rto: 13312000 lost: 1 last_data_recv: 15310 cwnd: 1 last_queued_since: 315444649 last_delivered_since: 15310482757 send_queue_length: 0 send_queue_bytes: 0 segment: 0 messages: 0 |