Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Not a Bug
-
12.2.2
-
None
-
Storage engine: InnoDB
innodb_snapshot_isolation=ON
Isolation levels tested: READ UNCOMMITTED, READ COMMITTED
-
Not for Release Notes
Description
Description
I found a behavior that confused me in MariaDB 12.2.2 with `innodb_snapshot_isolation=ON}} under both `READ UNCOMMITTED` and `READ COMMITTED`.
In the following test case, both `[1-1]` and `[2-1]` are blocked by Transaction 3. After Transaction 3 rolls back, I repeatedly observed two different wake-up orders across multiple runs.
In most runs, `[1-1]` is unblocked first. Then `[2-1]` remains blocked until Transaction 1 commits. This leads to the following final table state:
MariaDB [test]> SELECT t0.c0, t0.c1 FROM t0 WHERE TRUE FOR UPDATE; |
+---------+----+ |
| c0 | c1 |
|
+---------+----+ |
| 3.55 | 0 |
|
| 0.35103 | 1 |
|
+---------+----+ |
However, in a smaller number of runs, `[2-1]` is unblocked first. Then `[1-1]` remains blocked until Transaction 2 commits. This leads to a different final table state:
MariaDB [test]> SELECT * FROM t0 WHERE TRUE FOR UPDATE; |
+---------+----+ |
| c0 | c1 |
|
+---------+----+ |
| 0.35103 | 0 |
|
| 3.55 | 2 |
|
+---------+----+ |
Therefore, with the same test case and the same intended commit order, repeated executions can produce different final database states depending on whether `[1-1]` or `[2-1]` is resumed first after Transaction 3 rolls back.
I am not sure whether this nondeterministic lock-wait wake-up order is expected behavior. If it is expected, could you please help explain it? If it is not expected, this may indicate an issue in lock-wait scheduling or recovery order under `READ UNCOMMITTED` / `READ COMMITTED` with `innodb_snapshot_isolation=ON`.
In particular, under `READ UNCOMMITTED`, after many repeated attempts, I did observe several runs where `[2-1]` was resumed before `[1-1]`.
Minimal test case
CREATE OR REPLACE TABLE t0( |
c0 REAL SIGNED NOT NULL, |
c1 SMALLINT SIGNED UNIQUE NOT NULL, |
PRIMARY KEY(c1, c0) |
) ENGINE=InnoDB;
|
CREATE INDEX ic2 ON t0(c0 DESC, c1); |
INSERT INTO t0 VALUES (0.35103, 1); |
CREATE UNIQUE INDEX ic0 USING BTREE ON t0(c0, c1); |
Please run the test with:
SET SESSION innodb_snapshot_isolation=ON; |
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; |
The same behavior was also observed under:
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; |
– Transaction 1, with statements:
[1-0] BEGIN; |
[1-1] SELECT t0.c1 FROM t0 WHERE TRUE FOR UPDATE; |
[1-2] INSERT INTO t0 VALUES (3.55, 2); |
[1-3] COMMIT; |
– Transaction 2, with statements:
[2-0] BEGIN; |
[2-1] SELECT t0.c0, t0.c1 FROM t0 WHERE TRUE FOR UPDATE; |
[2-2] UPDATE t0 SET c1 =0 WHERE c1=2; |
[2-3] COMMIT; |
– Transaction 3, with statements:
[3-0] BEGIN; |
[3-1] UPDATE t0 SET c0=0.16072 WHERE TRUE; |
[3-2] ROLLBACK; |
Input schedule
[1-0, 3-0, 3-1, 1-1, 2-0, 2-1, 2-2, 3-2, 2-2, 1-2, 1-3, 2-3]
|
In this schedule:
[1-0] BEGIN; |
[3-0] BEGIN; |
[3-1] UPDATE t0 SET c0=0.16072 WHERE TRUE; |
[1-1] SELECT t0.c1 FROM t0 WHERE TRUE FOR UPDATE; -- blocked by Transaction 3 |
[2-0] BEGIN; |
[2-1] SELECT t0.c0, t0.c1 FROM t0 WHERE TRUE FOR UPDATE; -- blocked by Transaction 3 |
[3-2] ROLLBACK; -- releases the blocking lock |
After `[3-2] ROLLBACK`, the wake-up order of `[1-1]` and `[2-1]` appears to be nondeterministic.
Expected result
I expected the wake-up order to be deterministic for the same input schedule, or at least not to cause different final database states under repeated executions of the same test case.
Since `[1-1]` is issued before `[2-1]` and both statements are blocked by Transaction 3 on the same initial row, I expected `[1-1]` to be resumed before `[2-1]` after Transaction 3 rolls back.
With this wake-up order, the final table state is:
+---------+----+
|
| c0 | c1 |
|
+---------+----+
|
| 3.55 | 0 |
|
| 0.35103 | 1 |
|
+---------+----+
|
Actual result
In most runs, `[1-1]` is resumed first and the final state is:
+---------+----+
|
| c0 | c1 |
|
+---------+----+
|
| 3.55 | 0 |
|
| 0.35103 | 1 |
|
+---------+----+
|
However, in some runs, `[2-1]` is resumed first. Then `[1-1]` is resumed only after Transaction 2 commits, and the final state becomes:
+---------+----+
|
| c0 | c1 |
|
+---------+----+
|
| 0.35103 | 0 |
|
| 3.55 | 2 |
|
+---------+----+
|
Thus, repeated executions of the same test case can produce different final database states.
Additional note
This behavior may require multiple runs to reproduce. In my tests, the `[2-1]`-first wake-up order happened less frequently, but I did observe it several times, especially under `READ UNCOMMITTED`.
If this is expected due to nondeterministic lock scheduling, could you please clarify whether InnoDB/MariaDB provides any guarantee about the wake-up order of multiple transactions waiting for the same lock after the blocking transaction rolls back or committed?