[MDEV-37845] Deadlock log shows "WE ROLL BACK TRANSACTION (2)" but the rolled-back trx (per application exception) is the other one — mismatch between app and SHOW ENGINE INNODB STATUS on 10.6.19 - Jira

XML

Word

Printable

Details

Type: Bug
Status: Confirmed (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.6.19
Fix Version/s: 10.6, 10.11
Component/s: Storage Engine - InnoDB
Labels:
- innodb

Description

Hello all,

We're seeing a deadlock where the victim transaction appears "reversed" between our application exception and the `SHOW ENGINE INNODB STATUS` output.

*Environments*

*AWS RDS :* MariaDB 10.6.19-MariaDB-log
*Local (repro):* 10.2.40-MariaDB-1:10.2.40+maria~bionic-log
Engine: InnoDB, table `users`

*Scenario*
Two sessions update the same table, touching two rows in opposite order (classic cross-update deadlock):

Session A: `UPDATE ... WHERE id=1;` (hold) → later tries `id=2`
Session B: `UPDATE ... WHERE id=2;` (hold) → later tries `id=1`

*What happens on AWS RDS (10.6.19)*

`SHOW ENGINE INNODB STATUS` shows:

`(1) TRANSACTION = trx id X` (e.g., 4677344224)
`(2) TRANSACTION = trx id Y` (e.g., 4677344213)
Tail line: `*** WE ROLL BACK TRANSACTION (2)`
*But* around the same timestamp, the application's deadlock exception indicates the other trx (e.g., 4677344224) was actually rolled back.

*What happens on Local (10.2.40)*

I can reproduce the deadlock and the victim aligns with the tail line — `(2)` is indeed the one that gets rolled back; no mismatch with the app.

*Questions*

1. In 10.6.x, is there any known issue or change where the deadlock monitor might list `(1)/(2)` or print the final line in a way that can make `WE ROLL BACK TRANSACTION (2)` appear to refer to a different victim than the one the application sees?
2. Any known MDEV about *mapping of `(1)/(2)` entries to the actual victim trx id* or about *printing order* within the deadlock block in 10.6.x?
3. Is the most likely explanation that `SHOW ENGINE INNODB STATUS` only shows the *latest* deadlock (and we correlated the app exception to the wrong block)? If yes, is enabling `innodb_print_all_deadlocks=ON` the recommended approach to guarantee 1:1 correlation?
4. What is your recommended best practice to correlate the deadlock victim to the app exception — e.g., matching `MariaDB thread id` with `SELECT CONNECTION_ID()` logged by the app, or any other fields?
5. Are there any victim-selection heuristic differences between 10.2 and 10.6 (outside parallel replication) that could explain an environment-dependent victim?
6. Please confirm the intended semantics: the tail line `*** WE ROLL BACK TRANSACTION (2)` should always mean the transaction listed under the `(2)` section in that same block is the victim (i.e., `(2)` is an ordinal label within the block, not a trx id).

*Repro steps (simplified)*

```sql
– Session A
START TRANSACTION;
UPDATE users SET name='A', updated_at=NOW() WHERE id=1;
DO SLEEP(5);
UPDATE users SET name='A2', updated_at=NOW() WHERE id=2;

– Session B
START TRANSACTION;
UPDATE users SET name='B', updated_at=NOW() WHERE id=2;
DO SLEEP(5);
UPDATE users SET name='B2', updated_at=NOW() WHERE id=1;
```

*Observed result (AWS RDS 10.6.19)*

App exception says trx X rolled back, while `SHOW ENGINE INNODB STATUS`'s tail line says `WE ROLL BACK TRANSACTION (2)` where `(2)` maps to trx Y.

*Expected result*

The deadlock block and the app exception should point to the *same* victim transaction (given the same deadlock).

*Data I can attach*

Full deadlock blocks from AWS RDS and Local (with `trx id`, `MariaDB thread id`, `query id`, timestamps).
Application exception logs showing which trx was rolled back.
`SELECT VERSION()`, `SHOW VARIABLES LIKE 'innodb%';`, transaction isolation level, and `my.cnf` excerpts.
If helpful, logs with `innodb_print_all_deadlocks=ON` to capture every deadlock (not just the latest) and avoid correlation ambiguity.

Thank you very much!

Attachments

Issue Links

relates to

MDEV-31655 Parallel replication deadlock victim preference code errorneously removed

Closed

Activity

People

Assignee:: Kristian Nielsen

Reporter:: pham van thao

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2025-10-12 02:16

Updated:: 2025-10-13 08:23

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.