Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38976

Galera - Write loss with process crashes and network partitions

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 12.2.2
    • None
    • Galera
    • Debian Trixie, Galera 26.4.25-deb13

    Description

      With innodb_flush_log_at_trx_commit=1, write loss due to coordinated process crashes in Galera Cluster (see MDEV-38974) is significantly reduced. However, it is not eliminated! With process crashes and network partitions, Galera Cluster occasionally loses the effects of committed transactions. For example, at roughly 141 seconds into this Jepsen test run (https://s3.amazonaws.com/jepsen.io/analyses/mariadb-galera-12.1.2/20260224T175533-lost-writes-2.zip), the cluster lost approximately nineteen seconds of writes across four separate rows: 0, 285, 410, and 446. Some, like key 0, lost only a short postfix of elements. Key 410, on the other hand, lost all twenty-five elements and began afresh:

      Time (s) Elements
      -------- -----------------------------
      141.36 17, 19, 26, ..., 91, 92, 97
      152.79 175
      153.21 175, 176, 177, 179
      154.46 175, 176, 177, 179, 180

      Note that the transactions which wrote 17, 19, and so on were successfully committed; their effects definitely should not have been lost.

      You can reproduce this with the Jepsen MariaDB test suite, at https://github.com/jepsen-io/mysql. Try commit df8c29675809444b730a6ea5da0d80e243e7fc70, and try something like:

      lein run test-all --db maria --nodes n1,n2,n3 -w append --concurrency 6n --nemesis kill,partition --time-limit 300 --test-count 500 --innodb-flush-log-at-trx-commit 1 --expected-consistency-model snapshot-isolation --isolation repeatable-read

      This takes a few hours--I haven't had as much time as I'd like to put into getting a concise reproduction case. Nevertheless, this generally spits out a handful of cases of data loss each day.

      Attachments

        Activity

          People

            seppo Seppo Jaakola
            aphyr Kyle Kingsbury
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.