[MDEV-29438] Recovery or backup of instant ALTER TABLE is incorrect Created: 2022-09-01 Updated: 2022-09-06 Resolved: 2022-09-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Backup, Storage Engine - InnoDB |
| Affects Version/s: | 10.5.2, 10.6.0, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10 |
| Fix Version/s: | 10.5.18, 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | regression-10.5 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Description |
|
mleich provided rr replay traces where mariadb-backup --prepare fails like this:
I extracted a copy of the page both from the backup and from the server at the logical point of time when the OPT_PAGE_CHECKSUM record was written. Apart from FIL_PAGE_LSN which is excluded from the checksum, the pages differ as follows:
The difference is for a clustered index leaf page record that starts at 0x50c. The DB_TRX_ID=0x70a would be incorrectly recovered as 0x60a. This difference was caught thanks to The columns of the index seem to be something like the following: On recovery, the incorrect byte was recovered by the following:
The buf starts at 0x50c (the start of the first column), and the prev_rec at offset 0x63 contains the following:
It is worth noting that right after the incorrect data had been copied, that prev_rec field would have been adjusted to the correct value:
Offset 0x63 is where the page infimum pseudo-record is stored. Because an instant ADD/DROP COLUMN has been executed on this table, the record will not contain the string infimum but something else (NUL bytes followed by the header of the supremum record). I think that we must ‘pessimize’ the implementation of |
| Comments |
| Comment by Marko Mäkelä [ 2022-09-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The infimum and supremum strings were repurposed in The insert of the record that is logged incorrectly is actually for the hidden metadata record of a
The ADD COLUMN…FIRST will cause columns to be reordered in the table. For reproducing this bug, I think that it is important that the table contains INT UNSIGNED PRIMARY KEY. Execute something like this in a loop, and concurrently run backup and restore:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
My attempt at reproducing this reproduced something else:
The index->n_fields=3 corresponds to (a,DB_TRX_ID,DB_ROLL_PTR) and the n_fields=5 corresponds to the metadata record (a,DB_TRX_ID,DB_ROLL_PTR,blob,b). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
That something else is now | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Finally, I reproduced this without hitting
I may have accidentally reproduced another bug in multi-batch recovery, because I only expected that single page to be corrupted. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
With the following fix, there will be no log apply failure messages:
That is, we would hit | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The reason for the flood of messages for unrelated pages is the following in log_phys_t::apply():
We are flagging the log corrupted only because a single page is corrupted. This will cause all subsequent invocations of recv_recover_page() to evict pages as corrupted:
It would be better to introduce another return value, say, log_phys_t::APPLIED_CORRUPTED, to indicate that the page was corrupted. Then there would be no ‘collateral damage’ of claiming unrelated pages as corrupted. The recv_sys.set_corrupt_log() must be reserved to cases where the entire log is corrupted. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think that this bug should also affect normal tables where the string infimum is being inserted in the first PRIMARY KEY column as the first record of the table, but I was unable to repeat an OPT_PAGE_CHECKSUM mismatch in such a case. In 10.5, recovery would not invoke buf_pool.corrupted_evict() if it fails to apply some log to a page. That code was added to 10.6.9 in |