Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.5.9
Description
Our Galera Cluster was created by 3 nodes.
Recently, node-0 restart repeatedly since server resource shortage,
and finally, node-1 and node-2 tried to connect to node-0 but failed:
2023-07-01 23:54:52 0 [Note] WSREP: (63d23c5c-b67b, 'tcp://0.0.0.0:4567') connection to peer 45ebb9d4-a748 with addr tcp://172.24.151.92:4567 timed out, no messages seen in PT3S, socket stats: rtt: 1473 rttvar: 2527 rto: 204000 lost: 0 last_data_recv: 3344 cwnd: 6 last_queued_since: 500032641 last_delivered_since: 3342827068 send_queue_length: 0 send_queue_bytes: 0 segment: 0 messages: 0
But after this message, node-1 and node-2 were all showing this message:
2023-07-01 23:55:01 3213483 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
2023-07-01 23:55:01 3320009 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
2023-07-01 23:55:01 3320012 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
This message kept showing, and node-1 and node-2 were both trigger status change, from
2023-07-01 23:55:01 6 [Note] WSREP: Server status change synced -> connected
2023-07-01 23:55:01 6 [Note] WSREP: Server status change connected -> connected
Then turned into Non-primary view:
2023-07-01 23:55:01 6 [Note] WSREP: Non-primary view
Since this issue, our Galera Cluster could not access, since each node local_state were turned into Initialization.
After we compared node-1 and node-2's wsrep_last_committed, we selected node-1 to rebootstrap node (SET WSREP_PROVIDER_OPTIONS = "pc.bootstrap = 1;"), node-1 turned into Primary, and
[Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
this message did not show, and node-2 joined cluster successfully.
Did there any reason or trigger, to let this message:
[Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
keep showing?
Did we hit any bug?
Thank you.