[MDEV-26903] Assertion `ctx->trx->state == TRX_STATE_ACTIVE' fails upon concurrent DROP INDEX Created: 2021-10-25 Updated: 2021-10-26 Resolved: 2021-10-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6, 10.7 |
| Fix Version/s: | 10.6.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | regression | ||
| Description |
|
The test case for reproducing purposes only, not suitable for the test suite! The test is concurrent and non-deterministic. It fails for me, but it can vary on different machines and builds. There is no need to run it with --repeat, as it already has a loop inside, which can be adjusted, although it can be converted into a one-loop-multiple-repeat variation. Also, a random grammar or mysqlslap can be created based on the same idea
Reproducible on 10.6/10.7 debug builds. The failure apparently started happening after this commit:
|
| Comments |
| Comment by Marko Mäkelä [ 2021-10-26 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Unfortunately, I cannot get an rr replay trace of this. I do get core dumps easily. This looks like a ‘dormant’ bug that was not directly caused by the With a simple fix, I got the test to fail differently.
After I changed the test to accept also that outcome:
it passed with flying colours in an AddressSanitizer build. My fix is simple:
I think that this error handling path was previously unreachable because we would have crashed earlier due to I suspect that mysql_inplace_alter_table() initiated rollback because of MDL upgrade timeout, but I have yet to confirm that and to create a test case:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-10-26 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It looks like we really need a deadlock and not a ‘garden variety’ MDL timeout. I failed to trigger the error with the following:
With an additional patch, I determined that the deadlock was reported by InnoDB:
This crashed for table t2. The instrumentation patch was as follows:
Yes, on deadlock the transaction would have been rolled back. It looks like the added table locking in Armed with this knowledge, I tried to come up with a better test case. I was lazy and tried to use debug injection:
Alas, this would hit an error later on a record lock (which is actually impossible, because we would be holding exclusive table locks at that point), and reverting the first hunk of the following patch would make no difference to the outcome.
Finally, injecting a fault already to the table lock acquisition (and adjusting the test to use it) made the trick:
This would repeat the reported assertion failure when the fix of rollback_inplace_alter_table() is reverted. I am not going to modify commit_try_norebuild() in my fix. In fact, I think that also DB_LOCK_WAIT_TIMEOUT should be impossible in that function, ever since |