Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
InnoDB engine is a full transactional engine and one should not get normally get
corrupted tables. However there are cases outside of InnoDB control that can
generate corrupted tables:
- Hardware, OS or system caching errors that causes data loss within a page or
wrong page to be read. - Wrong configuration, like turning off double write buffering when using
hardware that does not support atomic writes. - Doing backups with cp or rsync on a live database without proper locking.
- Using a backup, like mariabackup or xtrabackup, that does live copying on
data files. The backup program detects conflicting writes to copied blocks
by verifying the checksum of the block and doing a re-read of the block
in case of mismatch. However if the legacy checksums are used, then
the backup can approve of a block that matches any of the old checksum
algorithms and we have seen cases where this can cause a faulty block to
be copied. This can be avoided by setting
innodb_checksum_algorithm=full_crc32 (Default since MariaDB 10.5.0, or Enterprise Server 10.4.6-1). - MariaDB server software errors that cause corrupted pages.
Here is some suggestion to improve current tools and some new tools to consider
for making it easier to find and fix InnoDB problems:
CHECK TABLE
- Addition to QUICK:
- When scanning keys, calculate a checksum of all primary keys.
- Verify that all keys has the same checksum and that it corresponds
to the checksum one got when all rows were scanned. - Addition to EXTENDED
- Do a lookup for secondary key to verify that the corresponding record
exists. - Ensure that CHECK does not crash or print errors to the .err log
if anything wrong is found. Everything important should be in the result
set (if possible).
REPAIR TABLE
- Scan the table file page per page and create a new table based on the
rows found. Skip any rows that cannot be reconstructed. In case of
corrupted blob pages, set the corresponding blob to NULL and generate
an error. - Ensure that REPAIR TABLE is robust against corruptions (no crashes).
ALTER TABLE ... innodb_checksum_algorithm=full_crc32
- This could work the following way (instead of doing a full table copy &
index creation) - Set the 'checksum_type' to 'current or full_crc32'
- Scan all pages in the table and update the the checksums
- Set the 'checksum_type' to 'full_crc32'
Recovery and flagging a table as corrupted
- It would be good it there was a way to flag a table as corrupted and
not usable until CHECK TABLE and possible REPAIR TABLE where run. - Recovery could set this bit when it finds something wrong and then
ignore any future updates to this table (there is not much things
that can be done in this case (except possible trying to do other
updates on other pages and try to get something back?)). - Instead of crashing when something is wrong on a page, InnoDB could
use this to flag the table as unusable to allow people to check and
repair the table without disturbing other users using other tables.
"dump_innodb_page"
Sometimes it is critical when diagnosing a problem to get a copy of the
corrupted page. For this we would need a new tool, "dump_innodb_page".
This should be a command line tool that when one feeds it the information
one gets from InnoDB (Table id, Index id and page number) it would do the
following:
- Connect to the server
- Find out the name of the table and table name and print them on the
stdout - Find out the checksum algorithm and print it out
- Print out the page in a hex format that is easy for a developer to
read. - If the page is encrypted, decrypt it with the encryption keys
supplied by the user.
"update_legacy_innodb_tables"
- This tool would help a user find all tables that uses old checksum
algorithms and change the checksum algorithm to full_crc32.
Documentation:
- Sometimes using the InnoDB change buffer makes it much harder to diagnose
a problem as the page on disk is not complete until the corresponding
change buffer page is applied to the page. This makes it harder to
find and diagnose the cause of a crash
If anyone has problems with MariaDB crashing we should suggest to
turn of the change buffer until the problem is diagnosed and
correct. - Document the procedure of how to change all InnoDB tables to use
full_crc32 so that one can then start using
innodb_checksum_algorithm=strict_full_crc32 (safest checksum mode).