Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
12.2.2
-
None
-
None
-
Debian Trixie, Galera 26.4.25-deb13
Description
The MariaDB Galera Cluster Guide (https://mariadb.com/docs/galera-cluster/galera-cluster-quickstart-guides/mariadb-galera-cluster-guide) states that MariaDB Galera Cluster ensures transactions are “instantly replicated to all other nodes, ensuring no replica lag". This does not appear to be the case: under normal operation, writes by transaction T1 can be invisible to a later transaction T2, even though T2 began only after T1 was acknowledged as committed to the client. This behavior occurs in healthy clusters, without faults, every few minutes.
For example, take this test run (https://s3.amazonaws.com/jepsen.io/analyses/mariadb-galera-12.1.2/20260306T144936-g-single-realtime.zip), which contains the following pair of transactions: g-single-realtime.svg
The top transaction appended 9 to key 17693, then committed and was acknowledged to the client. The bottom transaction began after that acknowledgement, hence the real-time (rt) dependency edge from top to bottom. However, the bottom transaction read key 17963, and failed to observe the top transaction's append of 9. This is a stale read, which is inconsistent with Galera Cluster's claims of instant, lag-free replication.
You can reproduce this with the Jepsen MySQL/MariaDB test harness at version df8c29675809444b730a6ea5da0d80e243e7fc70, by running something like:
lein run test-all --db maria --nodes n1,n2,n3 -w append --concurrency 6n --nemesis none --time-limit 300 --test-count 300 --innodb-flush-log-at-trx-commit 1 --expected-consistency-model strong-snapshot-isolation --isolation repeatable-read --max-writes-per-key 16
In my tests, Galera violates Strong SI every 5-10 minutes--it's not super frequent, but it does seem to happen regularly.