[MDEV-22889] InnoDB occasionally breaks the isolation of a recovered transaction Created: 2020-06-14 Updated: 2023-04-27 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2, 10.3, 10.4, 10.5 |
| Fix Version/s: | 10.4, 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | need_rr, recovery, rr-profile-analyzed, transactions | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
The test case innodb.innodb_force_recovery_rollback that was added for
What occasionally happens is that the locking SELECT will return nonempty contents for the table. I was not able to repeat it in my local environment, but it did occur on buildbot for several branches and builders. Such breakage of ACID is serious, and we must find and fix the reason. I believe that we must execute something like
so that the server after the restart runs under rr record, so that we will get an execution trace of the failure for debugging. Here is an example of a failure: http://buildbot.askmonty.org/buildbot/builders/kvm-zyp-opensuse150-amd64/builds/3589/steps/mtr/logs/stdio
Apparently, only the Isolation is being being violated: the locking read will return all 1000 records that the incomplete transaction had inserted. It is also theoretically possible that the incomplete transaction that inserted those 1,000 records was unexpectedly committed when the server was killed. |
| Comments |
| Comment by Marko Mäkelä [ 2020-07-30 ] | ||||||||||||||||||||||
|
mleich, sorry, but it seems to me that MDEV-22889.test
The reason for this conflict is that in trx_sys->rw_trx_list there is exactly one transaction, and trx_sys->rw_trx_list->start.recovered holds. Notably, trx_roll_crash_recv_trx=NULL, that is, the transaction is not being rolled back. An attempt to roll back all transactions was made earlier:
The recovery reached the following code, whose intention is to abort the rollback of recovered transactions if shutdown has been initiated:
But, at this point we are still starting up, not shutting down! This means that the fix of I filed this separate bug as | ||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-30 ] | ||||||||||||||||||||||
|
mleich, in
If either let assigns a nonzero value, that is an error. A nonzero $count is what we are looking for in this ticket. | ||||||||||||||||||||||
| Comment by Matthias Leich [ 2020-07-30 ] | ||||||||||||||||||||||
|
Ok, I was so overwhelmed by the fact that the first SQL statement after a DB server restart could fail because of not getting a lock and |