[MDEV-33518] Segmentation fault during rolling update - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.6.14
Fix Version/s: 10.6
Component/s: Galera
Labels:
None
Environment:
Kubernetes 1.26.9
docker.io/bitnami/mariadb-galera:10.6.14-debian-11-r0
bitnami helm galeracluster v7.0.1
3 node cluster
proxysql directing all write statements to one node

Description

I am running 27 galeraclusters on Kubernetes. Sporadically, I see an issue during rolling updates. Yesterday for instance, I just added two new labels to the galeracluster of my statefulsets and the galera-pods, which is rolled out with a rolling update in Kubernetes.

First, the pod galeracluster-2 is restarted, which was no problem. 40 seconds later it was in sync again.
Then the pod galeracluster-1 got restarted. But when the IST usually should happen, mysqld crashed with signal 11. A full SST sync was started, taking 10 minutes.
Finally, galeracluster-0 got restarted within 40 seconds.

The segfault on pod galeracluster-1 causes the pod to restart once more, but then, it will not sync with an IST but uses a SST instead, which takes 10 minutes for this galeracluster. In some of my bigger clusters SSTs take up to an hour, which is quite annoying. So I would like to find out, if I can reduce the odds for a SST to a minimum. Imagine to update 27 galeraclusters and having to wait for an hour every now and then. During my update session yesterday I only had one segmentation fault, but I have had sessions, where 4-5 pods went into the full SST sync.

Unfortunately, I can't force this behavior to reproduce it. It just happens every now and then on different clusters, and in different pods.

The provided logfile has been exported from Kibana, and you'll have to read it from the bottom... however, rows from the same microsecond appear in the "correct" order. This makes analyzing a bit tricky.

Please let me know if you require further information.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Untitled discover search(1).txt
12 kB
2024-02-21 14:06

Activity

People

Assignee:: Unassigned

Reporter:: Henrik Steffen

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2024-02-21 14:06

Updated:: 2024-02-22 03:57

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.