[MDEV-33295] innodb.doublewrite,64k,strict_crc32 occasionally fails Created: 2024-01-22  Updated: 2024-01-22

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.11
Fix Version/s: 10.11

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: corruption
Environment:

Linux version 6.6.11-amd64 (debian-kernel@lists.debian.org) (gcc-13 (Debian 13.2.0-9) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41.50.20231227) #1 SMP PREEMPT_DYNAMIC Debian 6.6.11-1 (2024-01-14)


Issue Links:
Relates
relates to MDEV-32242 innodb.doublewrite test case always g... Closed

 Description   

I am not yet sure what is to blame here. The following test invocation would occasionally fail:

./mtr --parallel=80 --repeat=10 innodb.doublewrite{,,,,,,,} 

This would fail due to corruption of the tables mysql.innodb_index_stats or mysql.innodb_table_stats, like this:

10.11 35cc4b6c058aeb905e8d802569444bdfce02193a

innodb.doublewrite '64k,innodb,strict_crc32' w30 [ 2 fail ]  Found warnings/errors in server log file!
        Test ended at 2024-01-22 16:32:44
line
2024-01-22 16:32:32 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './mysql/innodb_table_stats.ibd' page [page id: space=1, page number=0]. You may have to recover from a backup.
2024-01-22 16:32:32 0 [ERROR] InnoDB: File './mysql/innodb_table_stats.ibd' is corrupted
2024-01-22 16:32:32 0 [ERROR] InnoDB: Failed to read page 0 from file './mysql/innodb_table_stats.ibd': Page read from tablespace is corrupted.
2024-01-22 16:32:37 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './mysql/innodb_table_stats.ibd' page [page id: space=1, page number=0]. You may have to recover from a backup.
2024-01-22 16:32:37 0 [ERROR] InnoDB: File './mysql/innodb_table_stats.ibd' is corrupted
2024-01-22 16:32:37 0 [ERROR] InnoDB: Failed to read page 0 from file './mysql/innodb_table_stats.ibd': Page read from tablespace is corrupted.
2024-01-22 16:32:42 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './mysql/innodb_table_stats.ibd' page [page id: space=1, page number=0]. You may have to recover from a backup.
2024-01-22 16:32:42 0 [ERROR] InnoDB: File './mysql/innodb_table_stats.ibd' is corrupted
2024-01-22 16:32:42 0 [ERROR] InnoDB: Failed to read page 0 from file './mysql/innodb_table_stats.ibd': Page read from tablespace is corrupted.
^ Found warnings in /dev/shm/10.11/mysql-test/var/30/log/mysqld.1.err

The first page of the data file would look as follows:

od -Ax -t x1 -N 65536 var/log/innodb.doublewrite-64k,innodb,strict_crc32/mysqld.1/data/mysql/innodb_table_stats.ibd 
000000 01 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff
000010 00 00 00 00 00 00 b3 d3 00 08 00 00 00 00 00 00
000020 00 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00
000030 00 04 00 00 00 40 00 00 01 e1 00 00 00 04 00 00
000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
000050 00 01 00 00 00 00 00 9e 00 00 00 00 00 9e 00 00
000060 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
000070 00 00 00 00 00 03 00 00 00 00 ff ff ff ff 00 00
000080 ff ff ff ff 00 00 00 00 00 01 00 00 00 02 00 26
000090 00 00 00 02 00 26 00 00 00 00 00 00 00 00 ff ff
0000a0 ff ff 00 00 ff ff ff ff 00 00 00 00 00 02 aa ff
0000b0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00
0000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
00fff0 00 00 00 00 00 00 00 00 c5 21 ff 07 00 00 b3 d3

The error is that the first 4 bytes are 0x1000000 big-endian instead of a copy of the checksum bytes that are at 0xfff8 near the end of the page.

It is notable that during the test, this table should not be modified. The FIL_PAGE_LSN (in the above case 0xb3d3) is smaller than the LSN after the end of the bootstrap.

I was able to reproduce the failure also by running exclusively the strict_crc32 combination, and always for the 64k page size. But I can’t seem to be able to reproduce it when invoking the following:

while nice ./mtr --repeat=2 --parallel=80 innodb.doublewrite,strict_full_crc32{,,,}{,}{,}; do :; done

The test is running on /dev/shm, and we could get close to exceeding the available RAM. There are a couple of kernel OOM kill messages in the sudo dmesg output, but nothing recently.



 Comments   
Comment by Marko Mäkelä [ 2024-01-22 ]

The following just completed successfully on the same 10.11 revision:

./mtr --repeat=30 --parallel=80 innodb.doublewrite,strict_full_crc32{,,,}{,}{,}

Comment by Marko Mäkelä [ 2024-01-22 ]

For the record, the following did not fail either:

./mtr --repeat=30 --parallel=80 innodb.doublewrite,strict_crc32,64k{,,,}{,}{,}{,,,,}

One hypothetical explanation of the failure would be some race condition around the serve bootstrap in mtr, when the test is being prepared for different innodb_page_size values.

Comment by Marko Mäkelä [ 2024-01-22 ]

I was not able to reproduce any failure with the 3 smaller page sizes (excluding 32k and 64k):

diff --git a/mysql-test/suite/innodb/t/doublewrite.test b/mysql-test/suite/innodb/t/doublewrite.test
index d73009908db..8c5ed5d0a49 100644
--- a/mysql-test/suite/innodb/t/doublewrite.test
+++ b/mysql-test/suite/innodb/t/doublewrite.test
@@ -3,7 +3,7 @@
 --echo # MDEV-32242 innodb.doublewrite test case always is skipped
 --echo #
 
---source include/innodb_page_size.inc
+--source include/innodb_page_size_small.inc
 --source include/not_embedded.inc
 
 --disable_query_log

./mtr --repeat=30 --parallel=120 innodb.doublewrite{,,,,}{,}{,}

Generated at Thu Feb 08 10:37:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.