[MDEV-38976] Galera - Write loss with process crashes and network partitions - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 12.2.2
Fix Version/s: 12.3
Component/s: Galera
Labels:
- data-loss
- galera
Environment:
Debian Trixie, Galera 26.4.25-deb13

Description

With innodb_flush_log_at_trx_commit=1, write loss due to coordinated process crashes in Galera Cluster (see ~~MDEV-38974~~) is significantly reduced. However, it is not eliminated! With process crashes and network partitions, Galera Cluster occasionally loses the effects of committed transactions. For example, at roughly 141 seconds into this Jepsen test run (https://s3.amazonaws.com/jepsen.io/analyses/mariadb-galera-12.1.2/20260224T175533-lost-writes-2.zip), the cluster lost approximately nineteen seconds of writes across four separate rows: 0, 285, 410, and 446. Some, like key 0, lost only a short postfix of elements. Key 410, on the other hand, lost all twenty-five elements and began afresh:

Time (s) Elements
-------- -----------------------------
141.36 17, 19, 26, ..., 91, 92, 97
152.79 175
153.21 175, 176, 177, 179
154.46 175, 176, 177, 179, 180

Note that the transactions which wrote 17, 19, and so on were successfully committed; their effects definitely should not have been lost.

You can reproduce this with the Jepsen MariaDB test suite, at https://github.com/jepsen-io/mysql. Try commit df8c29675809444b730a6ea5da0d80e243e7fc70, and try something like:

lein run test-all --db maria --nodes n1,n2,n3 -w append --concurrency 6n --nemesis kill,partition --time-limit 300 --test-count 500 --innodb-flush-log-at-trx-commit 1 --expected-consistency-model snapshot-isolation --isolation repeatable-read

This takes a few hours--I haven't had as much time as I'd like to put into getting a concise reproduction case. Nevertheless, this generally spits out a handful of cases of data loss each day.

Attachments

Activity

People

Assignee:: Seppo Jaakola

Reporter:: Kyle Kingsbury

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2026-03-05 02:53

Updated:: 2026-04-27 16:04

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.