One more message that had been displayed was this one in Datafile::validate_first_page():
if (page_get_page_no(m_first_page) != 0) {
|
/* First page must be number 0 */
|
error_txt = "Header page contains inconsistent data";
|
goto err_exit;
|
}
|
The first page of the 128KiB .ibd file, which contains some metadata and a page allocation bitmap for the first innodb_page_size pages, differs between two copies of the file as follows:
--- /dev/fd/63 2023-06-26 12:42:39.653103434 +0300
|
+++ /dev/fd/62 2023-06-26 12:42:39.649103383 +0300
|
@@ -1,5 +1,5 @@
|
-000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
-000010 00 00 00 07 a3 e6 84 b9 00 08 00 00 00 00 00 00
|
+000000 15 03 03 00 1a b2 47 d7 7c fe a6 d5 55 2f 46 90
|
+000010 74 c6 f6 79 51 ca b3 f8 a7 2d 90 58 5f 8c 03 00
|
000020 00 00 00 00 02 3a 00 00 02 3a 00 00 00 00 00 00
|
000030 00 08 00 00 00 40 00 00 00 15 00 00 00 06 00 00
|
000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
@@ -12,15 +12,15 @@
|
0000b0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00
|
0000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
*
|
-003ff0 00 00 00 00 00 00 00 00 a3 e6 84 b9 95 e2 81 f4
|
+003ff0 00 00 00 00 00 00 00 00 ab 7f 1c 69 67 9d 74 a2
|
For some reason, the first 32 bytes of the file were overwritten by something that looks like garbage. The 32-bit page number 0 would be stored at offset 4. The tablespace identifier at 34 and 38 is 0x023a in both files.
At offset FIL_PAGE_LSN (0x10) we have the 64-bit log sequence number of the page. It is 0x07a3e684b9 in the correct file, and some garbage in the corrupted file. At the end of the correct file, before the 32-bit checksum we have the 32 least significant bits of FIL_PAGE_LSN, that is, 0xa3e684b9. In the corrupted file, those bytes are 0xab7f1c69. Assuming that the corrupted file is newer, its correct LSN must be 0x07ab7f1c69 or more. In any case, the FIL_PAGE_LSN at the start of the corrupted file 0x74c6f67951cab3f8a7 does not match the LSN at the end of the page.
Something has corrupted the file. Theoretically, it could be anything that has write access to the file system or the block device. I think that it is unlikely that the page would have been corrupted in RAM when it was in the buffer pool of an InnoDB server or mariadb-backup, because right before when writing a page to disk, InnoDB would copy the least significant bits of the LSN and compute the page checksum. I would tend to blame the hardware on this. But, which hardware would use a 32-byte buffer size? In many processor caches and I suppose SDRAM transactions, the block size is 64 bytes.
Error 39 is simply DB_CORRUPTION. The message could simply mean that the checksum of the first page of the file is incorrect. What would innochecksum report on the file?
I would expect mariadb-backup to fail if the checksum of the first page of any data file is incorrect. Could it be the case that the SST script ignored the error?