Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39263

innodb_snapshot_isolation fails to prevent lost updates under contention

    XMLWordPrintable

Details

    • Can result in unexpected behaviour

    Description

      innodb_snapshot_isolation fails to prevent lost updates under contention

      Summary

      innodb_snapshot_isolation = ON with REPEATABLE READ and START TRANSACTION WITH CONSISTENT SNAPSHOT fails to reject concurrent updates to the same row, allowing lost updates. The rate is approximately 1 in 150K commits under moderate contention (32 threads, 16 rows).

      Environment

      • MariaDB 12.2.2 (mariadb:12.2.2 Docker image)
      • innodb_snapshot_isolation = ON
      • InnoDB engine
      • Transaction isolation: REPEATABLE READ
      • Transactions opened with START TRANSACTION WITH CONSISTENT SNAPSHOT
      • Host: Linux 6.8.0-60-generic, x86_64

      Expected behavior

      From the MariaDB documentation: when innodb_snapshot_isolation is enabled, an UPDATE inside a REPEATABLE READ transaction started with START TRANSACTION WITH CONSISTENT SNAPSHOT must fail with ER_CHECKREAD (error 1020, "Record has changed since last read in table") if the target row was modified by another transaction after the snapshot was taken.

      This guarantees that no two concurrent snapshot transactions can both successfully commit an update to the same row based on the same pre-transaction state – i.e., lost updates are prevented.

      What is a lost update

      A lost update occurs when two transactions T1 and T2 both:

      1. Read the same row (observing the same value V).
      2. Compute a new value based on V.
      3. Write the new value to that row.
      4. Commit successfully.

      One of the two writes is silently overwritten by the other. The effect of one committed transaction is lost as if it never happened.

      Formally (Berenson et al., 1995 – "A Critique of ANSI SQL Isolation Levels"), the lost update anomaly P4 is:

      T1 reads row r.
      T2 reads row r.
      T1 writes row r (based on its read). T1 commits.
      T2 writes row r (based on its read). T2 commits.
      T1's write is lost.
      

      With innodb_snapshot_isolation, MariaDB should detect that T2's UPDATE targets a row modified after T2's snapshot and reject it with ER_CHECKREAD at step 4, preventing the anomaly.

      Test design

      Schema

      CREATE TABLE counter (id INT PRIMARY KEY, val INT NOT NULL) ENGINE=InnoDB;
      CREATE TABLE log (
          seq BIGINT AUTO_INCREMENT PRIMARY KEY,
          counter_id INT NOT NULL,
          old_val INT NOT NULL,
          new_val INT NOT NULL
      ) ENGINE=InnoDB;
       
      -- 16 independent counter rows, all starting at 0
      INSERT INTO counter (id, val) VALUES (1, 0), (2, 0), ..., (16, 0);
      

      Thread logic (32 threads, each running continuously for 20 minutes)

      Each thread repeats:

      1. Pick a random counter_id in [1, 16].
      2. START TRANSACTION WITH CONSISTENT SNAPSHOT
      3. SELECT val FROM counter WHERE id = :counter_id       -- reads V
      4. UPDATE counter SET val = V+1 WHERE id = :counter_id  -- writes V+1
      5. INSERT INTO log (counter_id, old_val, new_val) VALUES (:counter_id, V, V+1)
      6. sleep(100 microseconds)   -- widen the race window
      7. COMMIT
      

      If step 4 fails with ER_CHECKREAD (expected under contention), the thread rolls back and retries from step 1. If COMMIT fails, the thread also retries. Only successful commits are counted.

      Why this proves lost updates

      Invariant: Each successful commit increments exactly one counter by exactly 1. Therefore:

      SUM(counter.val) == number of successful commits
      

      If any lost update occurs, two transactions both read the same value V from the same counter and both commit V+1. The counter advances by 1 instead of 2, but two commits were counted. The invariant breaks:

      SUM(counter.val) < number of successful commits
      

      Three independent measurements are cross-checked:

      Metric Source Meaning
      Committed count In-process atomic counter, incremented after each successful COMMIT How many times the application observed COMMIT succeed
      Audit log rows SELECT COUNT FROM log (inserted inside the same transaction as the UPDATE) How many transactions the database actually committed
      Counter sum SELECT SUM(val) FROM counter Net effect of all increments

      If no lost updates occur, all three are equal.

      If a lost update occurs:

      • Committed count == audit log rows (both reflect what the DB committed).
      • Counter sum < committed count (the lost update caused one increment to be a no-op on the counter value).
      • The audit log contains a duplicate (counter_id, new_val) pair: two committed transactions wrote the same new_val to the same counter.

      Why this cannot be a false positive:

      • The UPDATE sets val = V+1 where V is a literal read from the same transaction's snapshot. It is not val = val + 1 (which would use the current row version). Two transactions can only write the same new_val if they both read the same old_val – which means the second one was not rejected despite the row being modified after its snapshot.
      • The audit log INSERT is inside the same transaction as the UPDATE. If the transaction rolls back, neither the counter update nor the log entry persists. If the transaction commits, both persist atomically. Therefore audit log rows == committed count (confirmed in all runs).
      • Each counter is independent. A lost update on counter 5 does not affect counter 7. The per-counter audit trail is a self-contained proof.

      Results

      Run summary

      Run Threads Counters Delay Duration Commits Rejected Reject % Lost updates
      5 32 16 100us 20 min 5,881,030 9,871,950 62.7% 40

      Run 5 parameters (16 counters + 100us delay) dramatically increased the reproduction rate by increasing throughput (4,900 commits/sec vs ~500) and widening the race window.

      Run 5 detailed results

      Total committed:  5,881,030
      Total rejected:   9,871,950
      Rejection rate:   62.7%
      Counter sum:      5,880,990   (expected: 5,881,030)
      Throughput:       4,900 commits/sec
      Audit log rows:   5,881,030   (matches committed count exactly)
      Duplicate values: 40           (each = one lost update)
      Lost updates:     40           (5,881,030 - 5,880,990)
       
      Per-counter values:
        [1]=368597  [2]=368855  [3]=367124  [4]=367492
        [5]=366690  [6]=366643  [7]=366796  [8]=367650
        [9]=367974  [10]=367825 [11]=368279 [12]=367794
        [13]=367717 [14]=367144 [15]=366743 [16]=367667
      

      40 lost updates were spread across 14 of 16 counters:

      Counter Lost updates
      2 2
      3 1
      4 2
      5 4
      6 2
      7 5
      8 1
      9 6
      10 2
      11 3
      12 3
      14 4
      15 2
      16 3

      Audit log evidence

      Every lost update follows the same pattern. Two transactions take a snapshot, both read the same old_val, both UPDATE to the same new_val, and both COMMIT successfully. The second should have been rejected with ER_CHECKREAD.

      Representative examples (3 of 40):

      Counter 2 – seq 382588 and 382594 (gap: 6 log entries apart):

      seq      counter_id  old_val  new_val
      382578   2           24057    24058    -- preceding normal commit
      382588   2           24058    24059    -- T1: read 24058, wrote 24059, COMMIT OK
      382594   2           24058    24059    -- T2: read 24058, wrote 24059, COMMIT OK  << BUG
      382606   2           24059    24060    -- following normal commit
      

      Counter 7 – seq 5060312 and 5060315 (gap: 3 log entries apart):

      seq      counter_id  old_val  new_val
      5060301  7           315440   315441   -- preceding normal commit
      5060312  7           315441   315442   -- T1: read 315441, wrote 315442, COMMIT OK
      5060315  7           315441   315442   -- T2: read 315441, wrote 315442, COMMIT OK  << BUG
      5060334  7           315442   315443   -- following normal commit
      

      Counter 10 – seq 3728491 and 3728493 (gap: 2 log entries apart):

      seq      counter_id  old_val  new_val
      3728476  10          232965   232966   -- preceding normal commit
      3728491  10          232966   232967   -- T1: read 232966, wrote 232967, COMMIT OK
      3728493  10          232966   232967   -- T2: read 232966, wrote 232967, COMMIT OK  << BUG
      3728499  10          232967   232968   -- following normal commit
      

      The seq column is a global auto-increment across all counters. The small gaps (2--13) between the duplicate pair confirm the two transactions were near-simultaneous – exactly the tight timing window expected from a race condition.

      Full evidence for all 40 lost updates across all 14 affected counters:

      seq      ctr  old_val -> new_val  note
      -------- ---  -----------------  -------------------------
      382588   2    24058   -> 24059    T1 OK
      382594   2    24058   -> 24059    T2 OK <- LOST UPDATE (gap 6)
       
      5825975  2    365428  -> 365429   T1 OK
      5825980  2    365428  -> 365429   T2 OK <- LOST UPDATE (gap 5)
       
      4988851  3    311285  -> 311286   T1 OK
      4988856  3    311285  -> 311286   T2 OK <- LOST UPDATE (gap 5)
       
      2570781  4    160966  -> 160967   T1 OK
      2570786  4    160966  -> 160967   T2 OK <- LOST UPDATE (gap 5)
       
      5271518  4    329585  -> 329586   T1 OK
      5271522  4    329585  -> 329586   T2 OK <- LOST UPDATE (gap 4)
       
      220717   5    13710   -> 13711    T1 OK
      220724   5    13710   -> 13711    T2 OK <- LOST UPDATE (gap 7)
       
      1176470  5    73282   -> 73283    T1 OK
      1176483  5    73282   -> 73283    T2 OK <- LOST UPDATE (gap 13)
       
      4354827  5    271715  -> 271716   T1 OK
      4354835  5    271715  -> 271716   T2 OK <- LOST UPDATE (gap 8)
       
      4804190  5    299600  -> 299601   T1 OK
      4804197  5    299600  -> 299601   T2 OK <- LOST UPDATE (gap 7)
       
      3192626  6    199730  -> 199731   T1 OK
      3192630  6    199730  -> 199731   T2 OK <- LOST UPDATE (gap 4)
       
      4611160  6    287876  -> 287877   T1 OK
      4611168  6    287876  -> 287877   T2 OK <- LOST UPDATE (gap 8)
       
      473637   7    29417   -> 29418    T1 OK
      473643   7    29417   -> 29418    T2 OK <- LOST UPDATE (gap 6)
       
      1393565  7    87021   -> 87022    T1 OK
      1393572  7    87021   -> 87022    T2 OK <- LOST UPDATE (gap 7)
       
      5060312  7    315441  -> 315442   T1 OK
      5060315  7    315441  -> 315442   T2 OK <- LOST UPDATE (gap 3)
       
      5192584  7    323660  -> 323661   T1 OK
      5192590  7    323660  -> 323661   T2 OK <- LOST UPDATE (gap 6)
       
      5621036  7    350360  -> 350361   T1 OK
      5621046  7    350360  -> 350361   T2 OK <- LOST UPDATE (gap 10)
       
      1919034  8    119698  -> 119699   T1 OK
      1919041  8    119698  -> 119699   T2 OK <- LOST UPDATE (gap 7)
       
      327932   9    20478   -> 20479    T1 OK
      327935   9    20478   -> 20479    T2 OK <- LOST UPDATE (gap 3)
       
      346221   9    21590   -> 21591    T1 OK
      346227   9    21590   -> 21591    T2 OK <- LOST UPDATE (gap 6)
       
      1988638  9    123885  -> 123886   T1 OK
      1988645  9    123885  -> 123886   T2 OK <- LOST UPDATE (gap 7)
       
      2545615  9    158661  -> 158662   T1 OK
      2545619  9    158661  -> 158662   T2 OK <- LOST UPDATE (gap 4)
       
      3209522  9    200262  -> 200263   T1 OK
      3209529  9    200262  -> 200263   T2 OK <- LOST UPDATE (gap 7)
       
      4091265  9    255844  -> 255845   T1 OK
      4091270  9    255844  -> 255845   T2 OK <- LOST UPDATE (gap 5)
       
      3728491  10   232966  -> 232967   T1 OK
      3728493  10   232966  -> 232967   T2 OK <- LOST UPDATE (gap 2)
       
      4864245  10   304269  -> 304270   T1 OK
      4864253  10   304269  -> 304270   T2 OK <- LOST UPDATE (gap 8)
       
      1334823  11   83439   -> 83440    T1 OK
      1334830  11   83439   -> 83440    T2 OK <- LOST UPDATE (gap 7)
       
      1512048  11   94630   -> 94631    T1 OK
      1512052  11   94630   -> 94631    T2 OK <- LOST UPDATE (gap 4)
       
      5121677  11   320628  -> 320629   T1 OK
      5121681  11   320628  -> 320629   T2 OK <- LOST UPDATE (gap 4)
       
      1033955  12   64808   -> 64809    T1 OK
      1033964  12   64808   -> 64809    T2 OK <- LOST UPDATE (gap 9)
       
      1497871  12   93518   -> 93519    T1 OK
      1497874  12   93518   -> 93519    T2 OK <- LOST UPDATE (gap 3)
       
      5340853  12   333926  -> 333927   T1 OK
      5340857  12   333926  -> 333927   T2 OK <- LOST UPDATE (gap 4)
       
      276258   14   17362   -> 17363    T1 OK
      276269   14   17362   -> 17363    T2 OK <- LOST UPDATE (gap 11)
       
      2232313  14   139398  -> 139399   T1 OK
      2232317  14   139398  -> 139399   T2 OK <- LOST UPDATE (gap 4)
       
      2391597  14   149314  -> 149315   T1 OK
      2391603  14   149314  -> 149315   T2 OK <- LOST UPDATE (gap 6)
       
      5434067  14   339338  -> 339339   T1 OK
      5434070  14   339338  -> 339339   T2 OK <- LOST UPDATE (gap 3)
       
      2842710  15   177055  -> 177056   T1 OK
      2842715  15   177055  -> 177056   T2 OK <- LOST UPDATE (gap 5)
       
      5080193  15   316601  -> 316602   T1 OK
      5080197  15   316601  -> 316602   T2 OK <- LOST UPDATE (gap 4)
       
      1189409  16   74503   -> 74504    T1 OK
      1189413  16   74503   -> 74504    T2 OK <- LOST UPDATE (gap 4)
       
      4030103  16   252111  -> 252112   T1 OK
      4030109  16   252111  -> 252112   T2 OK <- LOST UPDATE (gap 6)
       
      5645450  16   352818  -> 352819   T1 OK
      5645458  16   352818  -> 352819   T2 OK <- LOST UPDATE (gap 8)
      

      Reproducer

      Prerequisites

      • Docker
      • Rust toolchain: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
      • System packages: apt install build-essential pkg-config libssl-dev zlib1g-dev

      Project structure

      The reproducer is a Rust test project with two files:

      Cargo.toml:

      [package]
      name = "isolation-tests"
      version = "0.1.0"
      edition = "2021"
       
      [dependencies]
      mysql = "25"
      rand = "0.10.0"
       
      [[test]]
      name = "isolation"
      path = "tests/isolation.rs"
      

      tests/isolation.rs – contains the stress test (stress_lost_update_prevention) and supporting anomaly tests (G-single, G2-item). The full source is included at the end of this report.

      Steps

      # 1. Create the project directory
      mkdir isolation-tests && cd isolation-tests
       
      # 2. Create Cargo.toml and tests/isolation.rs as shown above
      #    (or copy from the attached source files)
       
      # 3. Start MariaDB 12.2.2
      docker run -d --name mariadb-test \
        -e MARIADB_ROOT_PASSWORD=testpass \
        -e MARIADB_DATABASE=testdb \
        -p 3307:3306 mariadb:12.2.2
       
      # 4. Wait for MariaDB to be ready
      until docker exec mariadb-test mariadb -uroot -ptestpass -e "SELECT 1" 2>/dev/null; do
        sleep 1
      done
       
      # 5. Enable innodb_snapshot_isolation and increase max_connections
      docker exec mariadb-test mariadb -uroot -ptestpass -e "
        SET GLOBAL innodb_snapshot_isolation = ON;
        SET GLOBAL max_connections = 200;
      "
       
      # 6. Run the stress test (20 minutes by default)
      DATABASE_URL="mysql://root:testpass@127.0.0.1:3307/testdb" \
        cargo test stress_lost_update -- --nocapture
       
      # 7. Optionally adjust duration (e.g. 10 minutes)
      DATABASE_URL="mysql://root:testpass@127.0.0.1:3307/testdb" \
        STRESS_DURATION_SECS=600 \
        cargo test stress_lost_update -- --nocapture
       
      # 8. Run all tests (G-single, G2-item, lost update basic, stress)
      DATABASE_URL="mysql://root:testpass@127.0.0.1:3307/testdb" \
        cargo test -- --nocapture
       
      # 9. After failure, inspect the audit log for evidence
      docker exec mariadb-test mariadb -uroot -ptestpass testdb -e "
        SELECT counter_id, new_val, COUNT(*) as cnt
        FROM test_stress_log
        GROUP BY counter_id, new_val
        HAVING cnt > 1;
      "
       
      # 10. Show surrounding context for each duplicate
      docker exec mariadb-test mariadb -uroot -ptestpass testdb -e "
        SELECT l.seq, l.counter_id, l.old_val, l.new_val
        FROM test_stress_log l
        JOIN (
            SELECT counter_id, new_val
            FROM test_stress_log
            GROUP BY counter_id, new_val
            HAVING COUNT(*) > 1
        ) d ON l.counter_id = d.counter_id
           AND l.new_val BETWEEN d.new_val - 1 AND d.new_val + 1
        ORDER BY l.counter_id, d.new_val, l.seq;
      "
       
      # 11. Cleanup
      docker rm -f mariadb-test
      

      The test asserts SUM(counter.val) == committed_count. If any lost update occurs, the assertion fails and the output includes the duplicate audit log entries as evidence.

      What the test outputs

      On success (no lost updates):

      PASSED: No lost updates under stress. innodb_snapshot_isolation held.
      test stress_lost_update_prevention ... ok
      

      On failure (lost updates detected):

      Stress test results (32 threads, 16 counters, 1200.1s):
        Total committed:  5881030
        Counter sum:      5880990
        Duplicate values:  40
          counter 2: new_val=24059 appeared 2 times
          ...
      LOST UPDATE DETECTED: counter sum (5880990) != committed count (5881030).
      40 updates were lost! Duplicates in log: 40
      test stress_lost_update_prevention ... FAILED
      

      Attachments

        Issue Links

          Activity

            People

              thiru Thirunarayanan Balathandayuthapani
              vadimtk153 Vadim Tkachenko
              Marko Mäkelä Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.