Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37845

Deadlock log shows "WE ROLL BACK TRANSACTION (2)" but the rolled-back trx (per application exception) is the other one — mismatch between app and SHOW ENGINE INNODB STATUS on 10.6.19

    XMLWordPrintable

Details

    Description

      Hello all,

      We're seeing a deadlock where the victim transaction appears "reversed" between our application exception and the `SHOW ENGINE INNODB STATUS` output.

      *Environments*

      • *AWS RDS :* MariaDB 10.6.19-MariaDB-log
      • *Local (repro):* 10.2.40-MariaDB-1:10.2.40+maria~bionic-log
      • Engine: InnoDB, table `users`

      *Scenario*
      Two sessions update the same table, touching two rows in opposite order (classic cross-update deadlock):

      • Session A: `UPDATE ... WHERE id=1;` (hold) → later tries `id=2`
      • Session B: `UPDATE ... WHERE id=2;` (hold) → later tries `id=1`

      *What happens on AWS RDS (10.6.19)*

      • `SHOW ENGINE INNODB STATUS` shows:
      • `(1) TRANSACTION = trx id X` (e.g., 4677344224)
      • `(2) TRANSACTION = trx id Y` (e.g., 4677344213)
      • Tail line: `*** WE ROLL BACK TRANSACTION (2)`
      • *But* around the same timestamp, the application's deadlock exception indicates the other trx (e.g., 4677344224) was actually rolled back.

      *What happens on Local (10.2.40)*

      • I can reproduce the deadlock and the victim aligns with the tail line — `(2)` is indeed the one that gets rolled back; no mismatch with the app.

      *Questions*

      1. In 10.6.x, is there any known issue or change where the deadlock monitor might list `(1)/(2)` or print the final line in a way that can make `WE ROLL BACK TRANSACTION (2)` appear to refer to a different victim than the one the application sees?
      2. Any known MDEV about *mapping of `(1)/(2)` entries to the actual victim trx id* or about *printing order* within the deadlock block in 10.6.x?
      3. Is the most likely explanation that `SHOW ENGINE INNODB STATUS` only shows the *latest* deadlock (and we correlated the app exception to the wrong block)? If yes, is enabling `innodb_print_all_deadlocks=ON` the recommended approach to guarantee 1:1 correlation?
      4. What is your recommended best practice to correlate the deadlock victim to the app exception — e.g., matching `MariaDB thread id` with `SELECT CONNECTION_ID()` logged by the app, or any other fields?
      5. Are there any victim-selection heuristic differences between 10.2 and 10.6 (outside parallel replication) that could explain an environment-dependent victim?
      6. Please confirm the intended semantics: the tail line `*** WE ROLL BACK TRANSACTION (2)` should always mean the transaction listed under the `(2)` section in that same block is the victim (i.e., `(2)` is an ordinal label within the block, not a trx id).

      *Repro steps (simplified)*

      ```sql
      – Session A
      START TRANSACTION;
      UPDATE users SET name='A', updated_at=NOW() WHERE id=1;
      DO SLEEP(5);
      UPDATE users SET name='A2', updated_at=NOW() WHERE id=2;

      – Session B
      START TRANSACTION;
      UPDATE users SET name='B', updated_at=NOW() WHERE id=2;
      DO SLEEP(5);
      UPDATE users SET name='B2', updated_at=NOW() WHERE id=1;
      ```

      *Observed result (AWS RDS 10.6.19)*

      • App exception says trx X rolled back, while `SHOW ENGINE INNODB STATUS`'s tail line says `WE ROLL BACK TRANSACTION (2)` where `(2)` maps to trx Y.

      *Expected result*

      • The deadlock block and the app exception should point to the *same* victim transaction (given the same deadlock).

      *Data I can attach*

      • Full deadlock blocks from AWS RDS and Local (with `trx id`, `MariaDB thread id`, `query id`, timestamps).
      • Application exception logs showing which trx was rolled back.
      • `SELECT VERSION()`, `SHOW VARIABLES LIKE 'innodb%';`, transaction isolation level, and `my.cnf` excerpts.
      • If helpful, logs with `innodb_print_all_deadlocks=ON` to capture every deadlock (not just the latest) and avoid correlation ambiguity.

      Thank you very much!

      Attachments

        Issue Links

          Activity

            People

              knielsen Kristian Nielsen
              thaopv pham van thao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.