Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29044

Lost update anomaly found in Galera-Maria cluster

    XMLWordPrintable

Details

    Description

      If I understand it correctly, transactions issued on different nodes in a Galera-Maria cluster should be free of lost update anomalies, as claimed in https://galeracluster.com/library/training/tutorials/supporting-transaction-isolation-levels.html. However, we have observed such anomaly in our tests.

      Experimental Setup

      We have set up our tests with the following configuration:

      We run a cluster of two nodes with docker-compose, using mariadb 10.7.3. We use a simple database schema with a single two-column table and each row representing a key-value pair:

      CREATE TABLE IF NOT EXISTS dbcop.variables (var BIGINT(64) UNSIGNED NOT NULL PRIMARY KEY, val BIGINT(64) UNSIGNED NOT NULL)
      

      We have one client session on each node; both sessions are executed concurrently. First, we initialize the table (using a different session) with values set to 0. After the initialization, both clients run a stream of transactions produced by our workload generator. The values written by the clients are guaranteed to be unique.

      Query results

      The results observed by the client are in client-result.log:

      Transaction (0, 0)
      WRITE(0, 0)
       
      ...
      

      where each transaction is identified by (session id, txn id). In a session, transactions with larger txn ids are executed after those with smaller ids finish. Queries are shown as READ(var, val) or WRITE(var, val).

      When running the experiment, a failed transaction is retried until all its operations succeed and the transaction is successfully committed. Only the successful shot is logged on the client side, and all earier executions are shown as ROLLBACK in the query logs.

      The lost-update violation

      Both transactions txn (1, 5) and txn (2, 13) read var=0, val=4 written by txn (1, 4) and both transaction successfully commit their writes on var=0. A lost update found! See lost-update.pdf for the complete scenario.

      Database Logs

      The server-side logs of the nodes are attached in server-logs.zip. The query logs and error logs are stored in mariadb_general.log and mariadb_error.log. The binary logs are in mariadb_bin*.

      Reproducing the violation

      The tools to reproduce the violation are attached in tools.zip.

      1. Start the galera cluster in docker

      The docker-compose file is generator/docker/docker-compose.yml. Database
      logs are stored in /tmp/ in the containers.

      2. Record and verify a history

      First build the tools:

          # tested on ubuntu 22.04
          apt install g++ openjdk-11-jdk cmake libgmp-dev zlib1g-dev pkg-config libssl-dev cargo
          cd generator
          cargo b --release
          cd ../PolySI
          ./gradlew jar
          cd ..
          

      Then generate txns and record history:

          # Run txns and record history
          rm -rf /tmp/a/ /tmp/b
          ./generator/target/release/dbcop generate -d /tmp/a/ -e 2 -n 2 --nhist 1 -t 10 -v 1
          ./generator/target/release/dbcop run --db galera -d /tmp/a/ -o /tmp/b/ 127.0.0.1:3306 127.0.0.1:3307
          
          # requires Java 11; will print [[[ REJECT ]]] if a violation is found
          java -jar ./PolySI/build/libs/PolySI-1.0.0-SNAPSHOT.jar audit -t dbcop --no-pruning /tmp/b/hist-00000/history.bincode
          
          # print history
          java -jar ./PolySI/build/libs/PolySI-1.0.0-SNAPSHOT.jar dump -t dbcop /tmp/b/hist-00000/history.bincode
          

      Note that, since we are running black-box testing with randomized workloads, we cannot reproduce exactly the same violating histories. As we observed, violations manifest very frequently, e.g., we observed 8 violating histories out of 10 collected histories.

      Attachments

        1. client-result.log
          0.9 kB
        2. client-result-debug.log
          0.9 kB
        3. client-result-sync.log
          0.9 kB
        4. lost-update.pdf
          52 kB
        5. my.cnf
          1 kB
        6. my.conf-sync
          1 kB
        7. server-logs.zip
          55 kB
        8. server-logs-debug.zip
          210 kB
        9. server-log-sync.zip
          58 kB
        10. tools.zip
          3.20 MB

        Activity

          People

            sciascid Daniele Sciascia
            nobiplusplus Si Liu
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.