Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33518

Segmentation fault during rolling update

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.14
    • 10.6
    • Galera
    • None
    • Kubernetes 1.26.9
      docker.io/bitnami/mariadb-galera:10.6.14-debian-11-r0
      bitnami helm galeracluster v7.0.1
      3 node cluster
      proxysql directing all write statements to one node

    Description

      I am running 27 galeraclusters on Kubernetes. Sporadically, I see an issue during rolling updates. Yesterday for instance, I just added two new labels to the galeracluster of my statefulsets and the galera-pods, which is rolled out with a rolling update in Kubernetes.

      First, the pod galeracluster-2 is restarted, which was no problem. 40 seconds later it was in sync again.
      Then the pod galeracluster-1 got restarted. But when the IST usually should happen, mysqld crashed with signal 11. A full SST sync was started, taking 10 minutes.
      Finally, galeracluster-0 got restarted within 40 seconds.

      The segfault on pod galeracluster-1 causes the pod to restart once more, but then, it will not sync with an IST but uses a SST instead, which takes 10 minutes for this galeracluster. In some of my bigger clusters SSTs take up to an hour, which is quite annoying. So I would like to find out, if I can reduce the odds for a SST to a minimum. Imagine to update 27 galeraclusters and having to wait for an hour every now and then. During my update session yesterday I only had one segmentation fault, but I have had sessions, where 4-5 pods went into the full SST sync.

      Unfortunately, I can't force this behavior to reproduce it. It just happens every now and then on different clusters, and in different pods.

      The provided logfile has been exported from Kibana, and you'll have to read it from the bottom... however, rows from the same microsecond appear in the "correct" order. This makes analyzing a bit tricky.

      Please let me know if you require further information.

      Attachments

        Activity

          People

            Unassigned Unassigned
            cgrdhenrik Henrik Steffen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.