h1. MDEV-37974 — Transaction/isolation fuzz testing of the {{lock_rec_insert_check_and_lock()}} fix

h2. Summary

The MDEV-37974 branch (commit {{4d36da565ae}}, _"Avoid bogus deadlock in lock_rec_insert_check_and_lock()"_, on top of {{10.11.17}}) was built as a debug server and exercised with three independent transaction/isolation anomaly oracles — *TROC*, *Fucci*, and *APTrans* — running concurrently against it. The same tools were then run against a *baseline* binary built from the identical tree with only {{storage/innobase/lock/lock0lock.cc}} reverted to its parent, to attribute every finding.

*Result: no isolation regression attributable to the change.* Every anomaly the tools reported also appears on the unmodified baseline (and TROC/Fucci reported _more_ on the baseline). The debug server (with {{UNIV_DEBUG}}, {{_GLIBCXX_DEBUG}}, {{SAFE_MUTEX}}) raised *no assertion, signal, or InnoDB invariant failure* during any run. The fix's own deterministic MTR tests pass.

h2. Test environment

|| Item || Value ||
| Server | {{10.11.17-MariaDB-debug}}, branch MDEV-37974 @ {{4d36da565ae}} |
| Build type | {{Debug}} ({{UNIV_DEBUG}}, {{_GLIBCXX_DEBUG}}, {{_GLIBCXX_ASSERTIONS}}, {{SAFE_MUTEX}}, {{SAFEMALLOC}}) |
| Baseline | identical tree, {{lock0lock.cc}} reverted to {{4d36da565ae^}} (only file affecting the binary) |
| Tools | TROC, Fucci, APTrans via {{ghcr.io/arcivanov/txn-tester:0.0.2}} |
| Tool params | TROC/Fucci {{MODE=full DURATION=900}}; APTrans {{ISOLATION=all SAMPLE_NUM=5}} |
| Topology | one of each tool, concurrent, distinct databases, over TCP loopback (max contention) |

h2. Findings overview (patched vs. baseline)

|| Tool || Patched (with fix) || Baseline (no fix) || Verdict ||
| TROC — logged inconsistencies | 0 | 2 | Noise — _fewer_ with the fix |
| Fucci — logged inconsistencies | 1 | 2 | Noise (see below) |
| APTrans — SERIALIZABLE anomalies | 35 | 29 | Oracle noise |
| APTrans — REPEATABLE READ anomalies | 33 | 21 | Oracle noise |
| APTrans — READ COMMITTED anomalies | 11 | 4 | Oracle noise |
| Server assertion / crash (debug build) | none | none | Clean |

Counts vary run-to-run in both directions because all three tools are randomized fuzzers; the qualitative conclusion (same anomaly classes present with and without the change) is what matters.

h2. Triage of each signal

h3. 1. TROC — clean on the patched build

TROC completed its full 900s run on the patched server with *zero* logged inconsistencies (exit 0). On the baseline it logged two ({{Error: Missing lock}}) plus a tool-internal execution failure when a deadlock interrupted one of its schedules. The patched build was therefore strictly cleaner.

h3. 2. Fucci — one inconsistency, structurally unrelated to the fix

Fucci reported a single {{Inconsistent query result}} ({{CSbugCase:1}}). The implicated schedule:
* contains *only {{UPDATE}} and {{SELECT}}* inside the concurrent transactions — the {{INSERT}}s are single-threaded data setup _before_ any {{BEGIN}};
* mixes a {{REPEATABLE READ}} transaction with a {{READ UNCOMMITTED}} transaction;
* hit a lock-wait *time out* mid-schedule, after which the oracle compared a partial result.

MDEV-37974 changes *only* {{lock_rec_insert_check_and_lock()}} — the {{INSERT}} record-locking path. A schedule with no concurrent {{INSERT}} cannot enter the modified code at all, so this finding cannot originate from the change. The baseline produced the same class of finding (two of them), confirming it is pre-existing Fucci noise around RU + timeout-truncated schedules.

h3. 3. APTrans — serializability-oracle mismatch, present at all isolation levels

APTrans flagged anomalies under *all three* isolation levels, including *35 under SERIALIZABLE*. InnoDB SERIALIZABLE genuinely serializes, so anomalies reported there are oracle false positives, not server bugs. The mechanism: APTrans drives every transaction with {{START TRANSACTION WITH CONSISTENT SNAPSHOT}} and judges results against a serializability oracle, while InnoDB REPEATABLE READ is snapshot-based (it permits write-skew-style phenomena that a serializability oracle flags). The baseline produced the same anomalies in the same proportions. None of the flagged cases exercise the fix's precondition (a granted X record lock on the successor plus a concurrent _waiting_ conflicting lock at {{INSERT}} time).

h2. Positive confirmation — the fix's own deterministic tests pass

Run on the patched debug build:

|| MTR test || Result ||
| {{innodb.mdev_37974}} | {color:green}*pass*{color} |
| {{innodb.lock_delete_updated}} | {color:green}*pass*{color} |
| {{versioning.update}} (timestamp, trx_id, heap, traditional, myisam) | {color:green}*pass* (5/5){color} |

These directly exercise the changed code path, including test 5 (the negative test where the predecessor check must _block_ the optimization and force a lock wait) and the cross-page (infimum) predecessor case.

h2. Conclusion

Across ~30 minutes of randomized concurrent fuzzing per configuration by three independent oracles, plus a controlled patched-vs-baseline differential, *the MDEV-37974 change introduced no detectable transaction-isolation anomaly*. All reported signals are reproducible on the unmodified baseline and are attributable to known tool noise (Fucci's RU/timeout handling; APTrans's serializability oracle vs. InnoDB snapshot RR), not to the {{lock0lock.cc}} change. The debug server never tripped an InnoDB lock-subsystem assertion, and the fix's deterministic MTR tests pass.

h2. Reproduction

{code:bash}
# Build (in-source debug)
cmake --build <build> --target mariadbd mariadb -j16
# Start server on :13306, root/txntest reachable over TCP, then per tool:
OUTPUT_DIR=./out TXN_TESTER_IMAGE=ghcr.io/arcivanov/txn-tester:0.0.2 \
  TOOL=troc   MODE=full DURATION=900 \
  DB_HOST=127.0.0.1 DB_PORT=13306 DB_USER=root DB_PASSWORD=txntest \
  ./run.sh --network host          # TOOL=fucci likewise; TOOL=aptrans ISOLATION=all SAMPLE_NUM=5
{code}

Exit-code contract of the image: {{0}} = clean, {{100}} = anomaly/inconsistency detected, any other non-zero = infrastructural failure.
