Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6.19
Description
Hello all,
We're seeing a deadlock where the victim transaction appears "reversed" between our application exception and the `SHOW ENGINE INNODB STATUS` output.
*Environments*
- *AWS RDS :* MariaDB 10.6.19-MariaDB-log
- *Local (repro):* 10.2.40-MariaDB-1:10.2.40+maria~bionic-log
- Engine: InnoDB, table `users`
*Scenario*
Two sessions update the same table, touching two rows in opposite order (classic cross-update deadlock):
- Session A: `UPDATE ... WHERE id=1;` (hold) → later tries `id=2`
- Session B: `UPDATE ... WHERE id=2;` (hold) → later tries `id=1`
*What happens on AWS RDS (10.6.19)*
- `SHOW ENGINE INNODB STATUS` shows:
- `(1) TRANSACTION = trx id X` (e.g., 4677344224)
- `(2) TRANSACTION = trx id Y` (e.g., 4677344213)
- Tail line: `*** WE ROLL BACK TRANSACTION (2)`
- *But* around the same timestamp, the application's deadlock exception indicates the other trx (e.g., 4677344224) was actually rolled back.
*What happens on Local (10.2.40)*
- I can reproduce the deadlock and the victim aligns with the tail line — `(2)` is indeed the one that gets rolled back; no mismatch with the app.
*Questions*
1. In 10.6.x, is there any known issue or change where the deadlock monitor might list `(1)/(2)` or print the final line in a way that can make `WE ROLL BACK TRANSACTION (2)` appear to refer to a different victim than the one the application sees?
2. Any known MDEV about *mapping of `(1)/(2)` entries to the actual victim trx id* or about *printing order* within the deadlock block in 10.6.x?
3. Is the most likely explanation that `SHOW ENGINE INNODB STATUS` only shows the *latest* deadlock (and we correlated the app exception to the wrong block)? If yes, is enabling `innodb_print_all_deadlocks=ON` the recommended approach to guarantee 1:1 correlation?
4. What is your recommended best practice to correlate the deadlock victim to the app exception — e.g., matching `MariaDB thread id` with `SELECT CONNECTION_ID()` logged by the app, or any other fields?
5. Are there any victim-selection heuristic differences between 10.2 and 10.6 (outside parallel replication) that could explain an environment-dependent victim?
6. Please confirm the intended semantics: the tail line `*** WE ROLL BACK TRANSACTION (2)` should always mean the transaction listed under the `(2)` section in that same block is the victim (i.e., `(2)` is an ordinal label within the block, not a trx id).
*Repro steps (simplified)*
```sql
– Session A
START TRANSACTION;
UPDATE users SET name='A', updated_at=NOW() WHERE id=1;
DO SLEEP(5);
UPDATE users SET name='A2', updated_at=NOW() WHERE id=2;
– Session B
START TRANSACTION;
UPDATE users SET name='B', updated_at=NOW() WHERE id=2;
DO SLEEP(5);
UPDATE users SET name='B2', updated_at=NOW() WHERE id=1;
```
*Observed result (AWS RDS 10.6.19)*
- App exception says trx X rolled back, while `SHOW ENGINE INNODB STATUS`'s tail line says `WE ROLL BACK TRANSACTION (2)` where `(2)` maps to trx Y.
*Expected result*
- The deadlock block and the app exception should point to the *same* victim transaction (given the same deadlock).
*Data I can attach*
- Full deadlock blocks from AWS RDS and Local (with `trx id`, `MariaDB thread id`, `query id`, timestamps).
- Application exception logs showing which trx was rolled back.
- `SELECT VERSION()`, `SHOW VARIABLES LIKE 'innodb%';`, transaction isolation level, and `my.cnf` excerpts.
- If helpful, logs with `innodb_print_all_deadlocks=ON` to capture every deadlock (not just the latest) and avoid correlation ambiguity.
Thank you very much!
Attachments
Issue Links
- relates to
-
MDEV-31655 Parallel replication deadlock victim preference code errorneously removed
-
- Closed
-