Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38974

Galera - Write loss with coordinated process crashes due to innodb_flush_log_at_trx_commit=0

    XMLWordPrintable

Details

    Description

      MariaDB's documentation recommends that users set up Galera clusters with innodb_flush_log_at_trx_commit=0, characterizing it as "a safer, recommended option with Galera Cluster". It is not safe; this frequently causes the loss of committed transactions when process crashes occur in rapid succession.

      https://mariadb.com/docs/galera-cluster/galera-management/configuration/configuring-mariadb-galera-cluster

      A test suite to reproduce this is at https://github.com/jepsen-io/mysql; use commit 3500f8c80bd0f419d7f21a7b89eaf65f8651a7af, and try something like:

      lein run test-all --nodes n1,n2,n3 -w append --concurrency 6n --nemesis kill --time-limit 300 --test-count 5 --isolation repeatable-read --expected-consistency-model snapshot-isolation

      For an example failing case, including config files and the error/general logs on each node, see:

      https://s3.amazonaws.com/jepsen.io/analyses/mariadb-galera-12.1.2/20260105T175818-lost-writes.zip

      I suggest revising MariaDB's documentation to make it clear that this option allows the loss of committed transactions.

      Attachments

        Issue Links

          Activity

            People

              maxmether Max Mether
              aphyr Kyle Kingsbury
              Max Mether Max Mether
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.