Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39855

WITHOUT OVERLAP deadlock that Galera cannot resolve

    XMLWordPrintable

Details

    Description

      Our MariaDB 11.4.8 + Galera 26.4.24 cluster hangs forever when parallel appliers deadlock on a WITHOUT OVERLAPS unique key.

      The deadlock locks up the entire cluster because applier (BF) transactions are not permitted to be rolled back.

      This took a long time to narrow down because our test cluster had higher latency between nodes than our production cluster, which made the bug impossible to reproduce in that environment. Fortunately the test suite runs multiple instances on the same machine which is a latency best case and we were able to use the mariadb test suite to reproduce this consistently.

      I've attached two tests I wrote while dissecting this bug. The single server case isn't too bad, but the Galera situation is very problematic when we trigger it.

      Based on the nature of the bug, I believe I've mitigated it in our own usage by setting: wsrep_slave_threads = 1

      But it still seems like a notable issue that it can deadlock the entire cluster with valid queries.

      In addition to 11.4.8, I've also verified this still affects git HEAD.

      Attachments

        Activity

          People

            seppo Seppo Jaakola
            anthonyryan1 Anthony Ryan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.