Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-40096

Nondeterministic lock-wait wake-up order under RU/RC with innodb_snapshot_isolation=ON causes different final states

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 12.2.2
    • N/A
    • None
    • Storage engine: InnoDB
      innodb_snapshot_isolation=ON
      Isolation levels tested: READ UNCOMMITTED, READ COMMITTED
    • Not for Release Notes

    Description

      Description

      I found a behavior that confused me in MariaDB 12.2.2 with `innodb_snapshot_isolation=ON}} under both `READ UNCOMMITTED` and `READ COMMITTED`.

      In the following test case, both `[1-1]` and `[2-1]` are blocked by Transaction 3. After Transaction 3 rolls back, I repeatedly observed two different wake-up orders across multiple runs.

      In most runs, `[1-1]` is unblocked first. Then `[2-1]` remains blocked until Transaction 1 commits. This leads to the following final table state:

      MariaDB [test]> SELECT t0.c0, t0.c1 FROM t0 WHERE TRUE FOR UPDATE;
      +---------+----+
      | c0      | c1 |
      +---------+----+
      |    3.55 |  0 |
      | 0.35103 |  1 |
      +---------+----+
      

      However, in a smaller number of runs, `[2-1]` is unblocked first. Then `[1-1]` remains blocked until Transaction 2 commits. This leads to a different final table state:

      MariaDB [test]> SELECT * FROM t0 WHERE TRUE FOR UPDATE;
      +---------+----+
      | c0      | c1 |
      +---------+----+
      | 0.35103 |  0 |
      |    3.55 |  2 |
      +---------+----+
      

      Therefore, with the same test case and the same intended commit order, repeated executions can produce different final database states depending on whether `[1-1]` or `[2-1]` is resumed first after Transaction 3 rolls back.

      I am not sure whether this nondeterministic lock-wait wake-up order is expected behavior. If it is expected, could you please help explain it? If it is not expected, this may indicate an issue in lock-wait scheduling or recovery order under `READ UNCOMMITTED` / `READ COMMITTED` with `innodb_snapshot_isolation=ON`.

      In particular, under `READ UNCOMMITTED`, after many repeated attempts, I did observe several runs where `[2-1]` was resumed before `[1-1]`.

      Minimal test case

      CREATE OR REPLACE TABLE t0(
          c0 REAL SIGNED NOT NULL,
          c1 SMALLINT SIGNED UNIQUE NOT NULL,
          PRIMARY KEY(c1, c0)
      ) ENGINE=InnoDB;
      CREATE INDEX ic2 ON t0(c0 DESC, c1);
      INSERT INTO t0 VALUES (0.35103, 1);
      CREATE UNIQUE INDEX ic0 USING BTREE ON t0(c0, c1);
      

      Please run the test with:

      SET SESSION innodb_snapshot_isolation=ON;
      SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
      

      The same behavior was also observed under:

      SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
      

      – Transaction 1, with statements:

      [1-0] BEGIN;
      [1-1] SELECT t0.c1 FROM t0 WHERE TRUE FOR UPDATE;
      [1-2] INSERT INTO t0 VALUES (3.55, 2);
      [1-3] COMMIT;
      

      – Transaction 2, with statements:

      [2-0] BEGIN;
      [2-1] SELECT t0.c0, t0.c1 FROM t0 WHERE TRUE FOR UPDATE;
      [2-2] UPDATE t0 SET c1 =0 WHERE c1=2;
      [2-3] COMMIT;
      

      – Transaction 3, with statements:

      [3-0] BEGIN;
      [3-1] UPDATE t0 SET c0=0.16072 WHERE TRUE;
      [3-2] ROLLBACK;
      

      Input schedule

      [1-0, 3-0, 3-1, 1-1, 2-0, 2-1, 2-2, 3-2, 2-2, 1-2, 1-3, 2-3]
      

      In this schedule:

      [1-0] BEGIN;
      [3-0] BEGIN;
      [3-1] UPDATE t0 SET c0=0.16072 WHERE TRUE;
      [1-1] SELECT t0.c1 FROM t0 WHERE TRUE FOR UPDATE; -- blocked by Transaction 3
      [2-0] BEGIN;
      [2-1] SELECT t0.c0, t0.c1 FROM t0 WHERE TRUE FOR UPDATE; -- blocked by Transaction 3
      [3-2] ROLLBACK; -- releases the blocking lock
      

      After `[3-2] ROLLBACK`, the wake-up order of `[1-1]` and `[2-1]` appears to be nondeterministic.

      Expected result

      I expected the wake-up order to be deterministic for the same input schedule, or at least not to cause different final database states under repeated executions of the same test case.
      Since `[1-1]` is issued before `[2-1]` and both statements are blocked by Transaction 3 on the same initial row, I expected `[1-1]` to be resumed before `[2-1]` after Transaction 3 rolls back.

      With this wake-up order, the final table state is:

      +---------+----+
      | c0      | c1 |
      +---------+----+
      |    3.55 |  0 |
      | 0.35103 |  1 |
      +---------+----+
      

      Actual result

      In most runs, `[1-1]` is resumed first and the final state is:

      +---------+----+
      | c0      | c1 |
      +---------+----+
      |    3.55 |  0 |
      | 0.35103 |  1 |
      +---------+----+
      

      However, in some runs, `[2-1]` is resumed first. Then `[1-1]` is resumed only after Transaction 2 commits, and the final state becomes:

      +---------+----+
      | c0      | c1 |
      +---------+----+
      | 0.35103 |  0 |
      |    3.55 |  2 |
      +---------+----+
      

      Thus, repeated executions of the same test case can produce different final database states.

      Additional note

      This behavior may require multiple runs to reproduce. In my tests, the `[2-1]`-first wake-up order happened less frequently, but I did observe it several times, especially under `READ UNCOMMITTED`.

      If this is expected due to nondeterministic lock scheduling, could you please clarify whether InnoDB/MariaDB provides any guarantee about the wake-up order of multiple transactions waiting for the same lock after the blocking transaction rolls back or committed?

      Attachments

        Activity

          People

            Unassigned Unassigned
            yousaha yousaha
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.