[MDEV-27800] upgrade from MariaDB 10.2 to 10.5.13 results in [ERROR] InnoDB: corrupted TRX_NO Created: 2022-02-10 Updated: 2023-12-01 Resolved: 2022-02-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5.13 |
| Fix Version/s: | 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Yakov Kushnirsky | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
upgrade from Mariadb 10.5.10 to 10.5.13 results in [ERROR] InnoDB: corrupted TRX_NO innodb_change_buffering=none does not help to resolve corruption in undo. It does not happen all the times, but sporadically. But it's reproducible on the customer sites. |
| Comments |
| Comment by Marko Mäkelä [ 2022-02-11 ] |
|
Does the database always restart correctly when using 10.5.10? |
| Comment by Marko Mäkelä [ 2022-02-17 ] |
|
In Unfortunately, before MySQL 5.1.48 and MariaDB 5.1.48, InnoDB did not always initialize data fields. The TRX_NO field in undo log pages could actually contain garbage, instead of 0. When implementing I received a copy of an undo log page that trips the sanity check. Instead of being 0, the two most significant bytes of the 64-bit TRX_UNDO_TRX_NO field are ASCII bytes, as is the entire 64-bit field. InnoDB is only prepared to deal with transaction identifiers that fit in 48 bits, and the assumption is that this sequence will never wrap around. If the data files are already corrupted in 10.5.10 (they show transaction identifiers greater or equal to 281474976710656), then the correct course of action would seem to be to rebuild those data files. I can try to see if during recovery we can skip undo pages whose state is TRX_UNDO_CACHED. My intuition suggests that it is not possible. After a shutdown with innodb_fast_shutdown=0 (slow shutdown), all undo log pages should be in this state, and therefore, we can only restore the latest transaction identifier from one of the TRX_UNDO_CACHED pages. What we might be able to do is to ignore the TRX_UNDO_TRX_NO fields in TRX_UNDO_CACHED pages if the value does not fit in 48 bits. I do not think that it would really fix the problem. |
| Comment by Marko Mäkelä [ 2022-02-17 ] |
|
The consistency check was introduced in |
| Comment by Marko Mäkelä [ 2022-02-28 ] |
|
Unfortunately, it looks like if anyone has already upgraded data files that had been originally created before MySQL 5.1.48 or MariaDB 5.1.48, to MariaDB 10.3 or later, then the server may have fast-forwarded to a very large transaction identifier due to garbage contents of the TRX_UNDO_TRX_NO field. If those transaction identifiers are 2⁴⁸ or larger, they cannot be represented by the 6-byte DB_TRX_ID column of the clustered index records. The only way to reset the transaction ID sequence is to rebuild the database from a logical dump. This fix is only helping future upgrades of data files from MariaDB 10.2 or MySQL 5.6 or earlier to MariaDB 10.3 or later. The fix is twofold:
|