[MDEV-26861] Galera Crashing - what(): remote_endpoint: Transport endpoint is not connected Created: 2021-10-20 Updated: 2023-06-12 Resolved: 2023-06-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.4.20, 10.5.11 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mathew Toms | Assignee: | Teemu Ollakka |
| Resolution: | Incomplete | Votes: | 2 |
| Labels: | crash, galera | ||
| Environment: |
Ubuntu 20.04.2 LTS, Dedicated hosts per node |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Been seeing Galera nodes crashing within a few minutes of each other with days between incidents. Running 2 clusters with 3 nodes each, one cluster running 10.5.11 and another cluster 10.4.20. From the logs, both clusters seem to be suffering crashes for the same reason:
It appears that when the crash strikes one node, there is a high chance a second node will crash (with the same error) a few minutes after the 1st crash - causing the cluster to require a bootstrap. Other times, just one node will crash and automatically restart and rejoin the cluster 5-10 minutes later. Days between incidents overall. I've attached logs from both clusters and a stack trace from the 10.5.11 node. |
| Comments |
| Comment by veast [ 2022-04-28 ] |
|
Hello, Mathew Toms. |
| Comment by Théo Cerutti [ 2023-01-04 ] |
|
Me too same problem with Mariadb cluster 3 nodes 10.5.13 |
| Comment by Jan Lindström [ 2023-05-10 ] |
|
mattwt I could not find anything clear from provided stack trace or error logs. Problem is that error logs shows just a assertion and nothing what happened before crash. Can you please provide full unedited error log, show processlist output, node configuration? |