[MDEV-23653] Server crash: InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350] Created: 2020-09-02  Updated: 2021-07-23  Resolved: 2021-07-23

Status: Closed
Project: MariaDB Server
Component/s: Server, Storage Engine - InnoDB
Affects Version/s: 10.4.12, 10.4.13
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Manuel Arostegui Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 1
Labels: None
Environment:

debian


Issue Links:
Relates
relates to MDEV-19978 Page read from tablespace is corrupted Closed

 Description   

We have one slave running 10.4.13 which all of a sudden crashed with:

Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350]. You may have to recover from a backup.
Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
Sep  2 13:58:20 db2120 mysqld[1095]:  len 16384; hex 58252d810000c0c60000c0c50000c0c7000014f062d2a6a045bf00000000000000000000134800323b8c80c9000000003b47
 
 -cut-
 
 
 
Sep  2 13:58:20 db2120 mysqld[1095]: InnoDB: End of page dump
Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Uncompressed page, stored checksum in field1 1478831489, calculated checksums for field1: crc32 1725978419, innodb 1478831489,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 1725978419, calculated checksums for field2: crc32 1725978419, innodb 1725978419, none 3735928559,  page LSN 5360 1657972384, low 4 bytes of LSN at page end 1657972384, page number (if stored to page already) 49350, space id (if created with >= MySQL-4.1.1 and stored already) 4936
Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page may be an index page where index id is 19838
Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Index 19838 is `PRIMARY` in table `metawiki`.`content`
Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Index 19838 is `PRIMARY` in table `metawiki`.`content`
Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace metawiki/content page [page id: space=4936, page number=49350]. You may have to recover from a backup.
Sep  2 13:58:20 db2120 mysqld[1095]: 2020-09-02 13:58:20 107197705 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):

There are no HW errors on that host or any other trace of OS corruption, so this looks entirely related to InnoDB.
Could it be the same as https://jira.mariadb.org/browse/MDEV-21165 and/or https://jira.mariadb.org/browse/MDEV-19978?

We do use

innodb_checksum_algorithm | crc32 |

It is hard to send a way to reproduce this crash as it happened all of a sudden.
We've assumed the data is corrupted.



 Comments   
Comment by Manuel Arostegui [ 2020-09-02 ]

A few days ago we had a similar crash on another 10.4.13 instance:

Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace dewiki/flaggedimages page [page id: space=80, page number=619577]. You may have to recover from a backup.
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
Aug 26 08:38:55 db1110 mysqld[4559]:  len 16384; hex 6943c56f00097439000974380009743a0000002b7a8481b645bf00000000000000000000005000273b9e809b000000003b2f0002009800990000000000000000000000000000000001110000000000000000000000000000000000000000010002001e696e66696d756d0006000
b000073757072656d756d1f0e2100000010006884e25495556e7465726e65686d656e5f416c62696f6e5f31372e31302e313931372e6a706700000001409ed8000001a407b032303039303631353232313834373469393537346a736c716b7930633436647563347663303732666e636e78781f0e2300000018006a84e254955761725f456e73696
76e5f6f665f4765726d616e795f313930332d313931382e73766700000001409ed8000001a407df3230303930353331313934333432316265616b7165387278647a623768723065336474767a7671376e6a7577781f0e0e00000020005584e2549b416b6b61507269736f6e2e6a706700000001409ed8000001a4081032303037303830353135333
331326f7379773165667477613164686466336e6a7830636e676d75697a766462341f0e0a00040028005184e2549b4261676461642e6a706700000001409ed8000001a4082c323030383032303231373537343536393175696a6d3578666a743873323339336c6e30636e36656d7a616575611f0e1800000030005f84e2549b4261686169735f696
e5f41647269616e6f706c652e6a706700000001409ed8000001a408443230303830373237303634303531396e666b307577617361766d6c3578367535766330326f75766c356d72376f1f0e1200000038005984e2549b42616861756c6c61682d626f726e2e6a706700000001409ed8000001a4086a32303038313231323136343632326533626a7
963386a3737773772776f6c6b62633836646471647036306970631f0e1600000040005d84e2549b42616861756c6c61682d70617373706f72742e6a706700000001409ed8000001a4088a32303035303231333136333434373464646b397273347733753832347038306674766e6a6538333931646e316c1f0e0e00040048005584e2549b4261686
1756c6c6168332e6a706700000001409ed8000001a408ae32303035303231333136333034343930706f6e727278787230697269386c6e64366269397935306f6571786f6c1f0e1c00000050006384e2549b42616861756c6c61685f56657262616e6e756e67737765672e706e6700000001409ed8000001a408ca323030383034323831373137313733767264676b766c7363707a6a786b6f706c7a63796f797439326e786e6d661f0e1900000058006084e2549b42616861756c6c61685f66726f6d5f6d696c6c65722e6a706700000001409ed8000001a408f43230303530393031323033393531316265327339327035747a33367068657a786f7475646d787178696e3974391f0e0900000060005084e2549b4261686a692e6a706700000001409ed8000001a4091b323030373038303531353338343363747a73676e6d676b7865777665796b6469376670663777356d786c6a6d791f0e1000040068005784e2549b436f6d6d6f6e732d6c6f676f2e73766700000001409ed8000001a409323230303730393136303134333239397a683372766a3562737a62656c3933656935366b66636f377136766336751f0e1200000070005984e2549b4561726c795f4261686169732d312e6a706700000001409ed8000001a4095032303037303631353136323231326d7130626b366e34616f7530696a32736861733267317a74783278776d7a361f0e1500000078005c84e2549b4d617a72616968486f757365526f7365732e6a706700000001409ed8000001a409703230303730383035313534353539316379646b336d38747237713668357977716e616e6f776b3131396e3862371f0e2f00000080007684e2549b5072657a696f73695f2d5f5374726164c4835f64696e5f436f6e7374616e74696e6f706f6c2c5f313836382e6a706700000001409ed8000001a4099332303036303932313233303233376a7063366d666565753469706e336d6d78357474786866623478796f7372341f0e1900040088006084e2549b52696476616e2d67617264656e2d626167686461642e6a706700000001409ed8000001a409d0323030373131313431333439343431657a7266757a68657435736531716d6d66663674317a3970346c746c68301f0e1700000090005e84e2549b536872696e652d6f662d42616861756c6c61682e6a706700000001409ed8000001a409f7323030373032323730323034313571707a79326d71357872737a6564617733656c67786a3131736b7a666c70631f0e1400000098005b84e2549b54656872616e2d62616861756c6c61682e6a706700000001409ed8000001a40a1c32303038313231323136343835383972713238316230366468636162653334667661707a6f7a3377796d7375321f0e12000000a0005984e2549b57696b6971756f74652d6c6f676f2e73766700000001409ed8000001a40a3e323030363131323431373338343866713862676f3669746f3077666e756377683564777a706f3479337431776c1f0e11000400a8005884e2549e427261756e66c3a4756c6532332e4a504700000001409ed8000001a40a5e32303130303931363134333431363973727a69303537343963663465346d38317a3131686a77336a72626c32371f0e10000000b0005784e2549e436f6d6d6f6e732d6c6f676f2e73766700000001409ed8000001a40a7d3230303730393136303134333239397a683372766a3562737a62656c3933656935366b66636f377136766336751f0e24000000b8006b84e2549e48657465726f6261736964696f6e2e616e6e6f73756d2e2d2e6c696e647365792e6a706700000001409ed8000001a40a9b323030383032313431353431353830616b3667796961366737676538627a72757035347a6364386f726a3937691f0e25000000c0006c84e2549e48657465726f6261736964696f6e2e616e6e6f73756d322e2d2e6c696e647365792e6a706700000001409ed8000001a40acd32303038303231343135343230396169356a6a6d316c6577313638686e6d72396763666b64327231706a796d321f0e0d000400c8005484e2549e526f74666165756c652e6a706700000001409ed8000001a40b003230303730343133303731373439323869366a3438623635706465736a6764786c6c776471756c3439786c6d351f0e11000000d0005884e2549e5775727a656c73636877616d6d2e6a706700000001409ed8000001a40b1b32303037303431333038353935366a7331733070723978717a663870377930636362723173676b7836646231691f0e1c000000d8006384e254a4466169727974616c655f54726173685f5175657374696f6e2e73766700000001409ed8000001a40b3a3230303830333138303234343330663035636a7270696778753671683561396e6533716e73336c7637367277361f0e10000000e0005784e254a452617069645f64656c6574652e73766700000001409ed8000001a40b6432303130303133303139313430313031696665666276716a3767366f3274706c76386465696a38386b783067341f0e10000400e8005784e254a6436f6d6d6f6e732d6c6f676f2e73766700000001409ed8000001a40b8232303037303931363031343332393:
 
                                                                                                                                                                      p8 7!5 3 2 0 /%- ,A* );' & $ " !d   T   c   R   -             V
Aug 26 08:38:55 db1110 mysqld[4559]:    j   v     c  U z   ;
Aug 26 08:38:55 db1110 mysqld[4559]: InnoDB: End of page dump
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Uncompressed page, stored checksum in field1 1766049135, calculated checksums for field1: crc32 49239442, innodb 1766049135,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 49239442, calculated checksums for field2: crc32 49239442, innodb 49239442, none 3735928559,  page LSN 43 2055504310, low 4 bytes of LSN at page end 2055504310, page number (if stored to page already) 619577, space id (if created with >= MySQL-4.1.1 and stored already) 80
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Page may be an index page where index id is 273
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Index 273 is `PRIMARY` in table `dewiki`.`flaggedimages`
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace dewiki/flaggedimages page [page id: space=80, page number=619577]. You may have to recover from a backup.
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Uncompressed page, stored checksum in field1 1766049135, calculated checksums for field1: crc32 49239442, innodb 1766049135,  page type 17855 == INDEX.none 3735928559, stored checksum in fie
ld2 49239442, calculated checksums for field2: crc32 49239442, innodb 49239442, none 3735928559,  page LSN 43 2055504310, low 4 bytes of LSN at page end 2055504310, page number (if stored to page already) 619577, space id (if created with >= MySQL-4.1.1 and stored already
) 80

Comment by Manuel Arostegui [ 2020-09-03 ]

Just happened on a 10.4.12 instance too

Comment by Marko Mäkelä [ 2020-09-21 ]

marostegui, both page dumps (in hexadecimal) are incomplete. The second dump corresponds to 2693.5 bytes (missing the least significant nibble of the last byte) and it looks like it could be a B-tree page. I converted the hexadecimal digits back to binary by perl:

print unpack ("H*","6943…");

I would need the entire page dump to see in which way it is corrupted. Also the SHOW CREATE TABLE output for the table would help to interpret the records. It looks strange to me that the next-record pointer bytes of each user record that I checked seem to be 0x84e2, which I would think should be invalid for the default innodb_page_size=16k (but I may be missing something). The rest of it looks like the last column of the PRIMARY KEY could be a string, and that there are at least one or two string columns that are not part of the PRIMARY KEY.

Side note: MariaDB 10.4 introduced innodb_checksum_algorithm=full_crc32, but the default setting remained at innodb_checksum_algorithm=crc32. Even the new setting should be compatible with old files; the new checksum algorithm only applies to files that were created while the setting was in effect.

Comment by Manuel Arostegui [ 2020-09-21 ]

Logs are unfortunately gone on both hosts
The tables that crashed were:

1st report:

# mysql metawiki -e "show create table metawiki.content\G"
*************************** 1. row ***************************
       Table: content
Create Table: CREATE TABLE `content` (
  `content_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `content_size` int(10) unsigned NOT NULL,
  `content_sha1` varbinary(32) NOT NULL,
  `content_model` smallint(5) unsigned NOT NULL,
  `content_address` varbinary(255) NOT NULL,
  PRIMARY KEY (`content_id`)
) ENGINE=InnoDB AUTO_INCREMENT=20211532 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED

2nd report:

# mysql dewiki -e "show create table dewiki.flaggedimages\G"
*************************** 1. row ***************************
       Table: flaggedimages
Create Table: CREATE TABLE `flaggedimages` (
  `fi_rev_id` int(11) NOT NULL DEFAULT 0,
  `fi_name` varbinary(255) NOT NULL DEFAULT '',
  `fi_img_timestamp` varbinary(14) DEFAULT NULL,
  `fi_img_sha1` varbinary(32) NOT NULL DEFAULT '',
  PRIMARY KEY (`fi_rev_id`,`fi_name`)
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED

Comment by Marko Mäkelä [ 2020-09-21 ]

For the 2nd report, I was looking for the next-record pointer at the wrong place. (I was not fully woken up yet.) The bytes 0x84e2 are of course part of the first PRIMARY KEY column, e.g., 0x84e25495 is the first user record on the page. The sign bit is inverted to keep the data memcmp() compatible, so the first fi_rev_id in the page is 0x4e25495 = 81941653. The two next-record bytes are right before the first byte of the key. For the first few records, the data looks OK to me.

By the way, the columns fi_img_timestamp and fi_img_sha1 are being stored as hexadecimal digits in ASCII. Binary storage should halve the size.

This might be a bug in the checksum validation, but I would need the full page dump to confirm that. So, please try to get the full log output next time it occurs.

The new innodb_checksum_algorithm=full_crc32 is much more straightforward: just one CRC-32C checksum over the entire page contents, instead of the exclusive-or of two CRC-32C over two subsets of the page (plus some logic to check if a different checksum algorithm might have been used, which effectively weakens the checksum validation). See also MDEV-17958, MDEV-18529, MDEV-12112, MDEV-12026.

Comment by Manuel Arostegui [ 2020-09-21 ]

Yeah, we re-installed those hosts to reclone their data.
Let's see if it happens again and I will try to get the full log, I didn't notice it was cut.

We can switch to innodb_checksum_algorithm=full_crc32, but we don't create new tables that often anyways.

Comment by Manuel Arostegui [ 2020-10-01 ]

Marko, we got another one, and this is the full dump:

-- The job identifier is 14462.
Sep 30 12:28:39 db2125 mysqld[30220]: 2020-09-30 12:28:39 8 [Note] Slave I/O thread: connected to master 'repl@db2107.codfw.wmnet:3306',replication starts at GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-180359271-191513097,171966574-171966574-2221092918,180359241-180359241-146491165,171966670-171966670-2410812544,180363270-180363270-170'
Sep 30 12:43:10 db2125 mysqld[30220]: 2020-09-30 12:43:10 9 [ERROR] InnoDB: Record in index `namespace_title` of table `zhwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x8000000B),[21]Taxonomy/Acmaeopleura(0x5461786F6E6F6D792F41636D61656F706C65757261),[4] &" (0x012622B1)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x8000000B),[18]Taxonomy/Acmaeidae(0x5461786F6E6F6D792F41636D616569646165),[4]    (0x00B7838E)}
Sep 30 12:43:10 db2125 mysqld[30220]: 2020-09-30 12:43:10 9 [ERROR] InnoDB: Record in index `namespace_title` of table `zhwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x8000000B),[16]Taxonomy/Acartia(0x5461786F6E6F6D792F41636172746961),[4] &# (0x01262385)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x8000000B),[20]Taxonomy/Acariformes(0x5461786F6E6F6D792F4163617269666F726D6573),[4]    (0x00D6A3C3)}
Sep 30 12:43:10 db2125 mysqld[30220]: 2020-09-30 12:43:10 9 [ERROR] InnoDB: Record in index `namespace_title` of table `zhwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x8000000B),[19]Taxonomy/Acartiidae(0x5461786F6E6F6D792F41636172746969646165),[4] &# (0x01262389)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x8000000B),[20]Taxonomy/Acariformes(0x5461786F6E6F6D792F4163617269666F726D6573),[4]    (0x00D6A3C3)}
Sep 30 12:43:10 db2125 mysqld[30220]: 2020-09-30 12:43:10 9 [ERROR] InnoDB: Record in index `namespace_title` of table `zhwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x8000000B),[19]Taxonomy/Acartiella(0x5461786F6E6F6D792F416361727469656C6C61),[4] &$ (0x012624E5)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x8000000B),[20]Taxonomy/Acariformes(0x5461786F6E6F6D792F4163617269666F726D6573),[4]    (0x00D6A3C3)}
Sep 30 12:55:26 db2125 mysqld[30220]: 2020-09-30 12:55:26 0 [Note] InnoDB: Buffer pool(s) load completed at 200930 12:55:26
Sep 30 13:44:02 db2125 mysqld[30220]: 2020-09-30 13:44:02 9 [ERROR] InnoDB: Record in index `linter_page` of table `plwiki`.`linter` was not found on update: TUPLE (info_bits=0, 2 fields): {[4]  Sh(0x001B5368),[4]    (0x00A5BDD7)} at: COMPACT RECORD(info_bits=0, 2 fields): {[4]  Sh(0x001B5368),[4]   D(0x00A5BB44)}
Sep 30 13:46:15 db2125 mysqld[30220]: 2020-09-30 13:46:15 8 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Sep 30 13:46:15 db2125 mysqld[30220]: 2020-09-30 13:46:15 8 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2107-bin.001729' at position 103635891; GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-180
Sep 30 14:09:26 db2125 mysqld[30220]: 2020-09-30 14:09:26 9 [ERROR] InnoDB: Record in index `cat_pages` of table `idwiki`.`category` was not found on update: TUPLE (info_bits=0, 2 fields): {[4]    (0x80000620),[4]  d (0x00886419)} at: COMPACT RECORD(info_bits=32, 2 fields): {[4]    (0x80000620),[4]  < (0x00873CC5)}
Sep 30 14:39:21 db2125 mysqld[30220]: 2020-09-30 14:39:21 9 [ERROR] InnoDB: Record in index `wl_user_notificationtimestamp` of table `fiwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x0003A1EF),NULL,[4] 3  (0x00339DCE)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x0003A1EF),NULL,[4] 3  (0x00339C13)}
Sep 30 14:59:47 db2125 mysqld[30220]: 2020-09-30 14:59:47 9 [ERROR] InnoDB: Record in index `cat_pages` of table `nowiki`.`category` was not found on update: TUPLE (info_bits=0, 2 fields): {[4]  4m(0x8000346D),[4] lBa(0x006C4261)} at: COMPACT RECORD(info_bits=32, 2 fields): {[4]  4m(0x8000346D),[4] [Iw(0x005B4977)}
Sep 30 15:23:29 db2125 mysqld[30220]: 2020-09-30 15:23:29 8 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Sep 30 15:23:29 db2125 mysqld[30220]: 2020-09-30 15:23:29 8 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2107-bin.001729' at position 289795744; GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-180
Sep 30 15:35:30 db2125 mysqld[30220]: 2020-09-30 15:35:30 9 [ERROR] InnoDB: Record in index `rd_ns_title` of table `itwiki`.`redirect` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x80000001),[26]Castelli_della_val_di_Susa(0x43617374656C6C695F64656C6C615F76616C5F64695F53757361),[4]  (z(0x0084287A)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x80000001),[33]Castel_di_Sangro_Calcio_1996-1997(0x43617374656C5F64695F53616E67726F5F43616C63696F5F313939362D31393937),[4] ^mT(0x005E6D54)}
Sep 30 16:25:00 db2125 mysqld[30220]: 2020-09-30 16:25:00 8 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Sep 30 16:25:00 db2125 mysqld[30220]: 2020-09-30 16:25:00 8 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2107-bin.001729' at position 427327501; GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-180
Sep 30 17:05:32 db2125 mysqld[30220]: 2020-09-30 17:05:32 9 [ERROR] InnoDB: Record in index `wl_user_notificationtimestamp` of table `nlwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x0002C2EA),NULL,[4]  " (0x00B022E6)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x0002C2EA),NULL,[4]    (0x00B01D9F)}
Sep 30 17:05:32 db2125 mysqld[30220]: 2020-09-30 17:05:32 9 [ERROR] InnoDB: Record in index `namespace_title` of table `nlwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x80000001),[34]Vrouwen_Electriciteits_Vereeniging(0x56726F7577656E5F456C6563747269636974656974735F56657265656E6967696E67),[4]  " (0x00B022E7)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x80000001),[34]Vrouwen_Electriciteits_Vereeniging(0x56726F7577656E5F456C6563747269636974656974735F56657265656E6967696E67),[4]   D(0x0098EE44)}
Sep 30 17:05:32 db2125 mysqld[30220]: 2020-09-30 17:05:32 9 [ERROR] InnoDB: Record in index `wl_user_notificationtimestamp` of table `nlwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x0002C2EA),NULL,[4]  " (0x00B022E7)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x0002C2EA),NULL,[4]    (0x00B01D9F)}
Sep 30 17:29:09 db2125 mysqld[30220]: 2020-09-30 17:29:09 8 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Sep 30 17:29:09 db2125 mysqld[30220]: 2020-09-30 17:29:09 8 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2107-bin.001729' at position 564603880; GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-180
Sep 30 18:42:11 db2125 mysqld[30220]: 2020-09-30 18:42:11 8 [Note] Slave: received end packet from server, apparent master shutdown:
Sep 30 18:42:11 db2125 mysqld[30220]: 2020-09-30 18:42:11 8 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2107-bin.001729' at position 732916829; GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-180
Sep 30 18:45:04 db2125 mysqld[30220]: 2020-09-30 18:45:04 9 [ERROR] InnoDB: Record in index `namespace_title` of table `zhwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x8000000A),[21]Taxonomy/Chinapotamon(0x5461786F6E6F6D792F4368696E61706F74616D6F6E),[4] &#r(0x01262372)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x8000000A),[21]Taxonomy/Chimonanthus(0x5461786F6E6F6D792F4368696D6F6E616E74687573),[4]  ! (0x01162116)}
Sep 30 18:45:04 db2125 mysqld[30220]: 2020-09-30 18:45:04 9 [ERROR] InnoDB: Record in index `namespace_title` of table `zhwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x8000000A),[24]Taxonomy/Canthocamptidae(0x5461786F6E6F6D792F43616E74686F63616D707469646165),[4] &%$(0x01262524)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x8000000A),[20]Taxonomy/Cantharinae(0x5461786F6E6F6D792F43616E74686172696E6165),[4]   m(0x00BDBA6D)}
Sep 30 19:21:48 db2125 mysqld[30220]: 2020-09-30 19:21:48 9 [ERROR] InnoDB: Record in index `wl_user_notificationtimestamp` of table `nowiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x0000FB93),NULL,[4] MS (0x004D53E6)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x0000FB93),NULL,[4] MJ (0x004D4AD9)}
Sep 30 19:46:59 db2125 mysqld[30220]: 2020-09-30 19:46:59 9 [ERROR] InnoDB: Record in index `wl_user_notificationtimestamp` of table `nlwiki`.`watchlist` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]  HI(0x00014849),NULL,[4] 4  (0x00347F1F)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]  HI(0x00014849),NULL,[4] 4  (0x00347F1E)}
Sep 30 20:00:28 db2125 mysqld[30220]: 2020-09-30 20:00:28 8 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Sep 30 20:00:28 db2125 mysqld[30220]: 2020-09-30 20:00:28 8 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2107-bin.001729' at position 902518257; GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-180
Sep 30 21:02:29 db2125 mysqld[30220]: 2020-09-30 21:02:29 8 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Sep 30 21:02:29 db2125 mysqld[30220]: 2020-09-30 21:02:29 8 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2107-bin.001729' at position 1010128690; GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-18
Sep 30 21:31:08 db2125 mysqld[30220]: 2020-09-30 21:31:08 9 [ERROR] InnoDB: Record in index `tl_namespace` of table `itwiki`.`templatelinks` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]   <(0x8000033C),[9]Linguaggi(0x4C696E677561676769),[4]    (0x0007F500)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]   <(0x8000033C),[9]Linguaggi(0x4C696E677561676769),[4]    (0x0007F4F9)}
Sep 30 21:36:01 db2125 mysqld[30220]: 2020-09-30 21:36:01 9 [ERROR] InnoDB: Record in index `tl_namespace` of table `itwiki`.`templatelinks` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]   <(0x8000033C),[9]Arguments(0x417267756D656E7473),[4] KZ (0x004B5A1B)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]   <(0x8000033C),[9]Arguments(0x417267756D656E7473),[4] KZ (0x004B5A1A)}
Sep 30 22:21:32 db2125 mysqld[30220]: 2020-09-30 22:21:32 8 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Sep 30 22:21:32 db2125 mysqld[30220]: 2020-09-30 22:21:32 8 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2107-bin.001730' at position 87230375; GTID position '171970567-171970567-390719905,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2108483244,180359271-1803
Sep 30 22:51:48 db2125 mysqld[30220]: 2020-09-30 22:51:48 9 [ERROR] InnoDB: Record in index `rd_ns_title` of table `thwiki`.`redirect` was not found on update: TUPLE (info_bits=0, 3 fields): {[4]    (0x80000000),[45]                                             (0xE0B8A7E0B8B1E0B894E0B983E0B8ABE0B8A1E0B988E0B89EE0B8B4E0B980E0B8A3E0B899E0B897E0B8A3E0B98C),[4]  , (0x00112CC8)} at: COMPACT RECORD(info_bits=0, 3 fields): {[4]    (0x80000000),[45]                                             (0xE0B8A7E0B8B1E0B894E0B983E0B8ABE0B88DE0B988E0B8ADE0B8B4E0B899E0B897E0B8B2E0B8A3E0B8B2E0B8A1),[4]    (0x000BF88C)}
Sep 30 22:52:02 db2125 mysqld[30220]: 2020-09-30 22:52:02 9 [ERROR] InnoDB: Record in index `linter_page` of table `thwiki`.`linter` was not found on update: TUPLE (info_bits=0, 2 fields): {[4]    (0x000AB8FB),[4] ;= (0x003B3DE1)} at: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x000AB8FB),[4] :  (0x003AB796)}
Sep 30 22:54:16 db2125 mysqld[30220]: 2020-09-30 22:54:16 9 [ERROR] InnoDB: Clustered record for sec rec not found index `linter_page` of table `thwiki`.`linter`
Sep 30 22:54:16 db2125 mysqld[30220]: InnoDB: sec index record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
Sep 30 22:54:16 db2125 mysqld[30220]:  0: len 4; hex 000ab8fb; asc     ;;
Sep 30 22:54:16 db2125 mysqld[30220]:  1: len 4; hex 003ab796; asc  :  ;;
Sep 30 22:54:16 db2125 mysqld[30220]: InnoDB: clust index record PHYSICAL RECORD: n_fields 8; compact format; info bits 0
Sep 30 22:54:16 db2125 mysqld[30220]:  0: len 4; hex 003ab795; asc  :  ;;
Sep 30 22:54:16 db2125 mysqld[30220]:  1: len 6; hex 000c1bf83559; asc     5Y;;
Sep 30 22:54:16 db2125 mysqld[30220]:  2: len 7; hex d300018fc50110; asc        ;;
Sep 30 22:54:16 db2125 mysqld[30220]:  3: len 4; hex 000430cc; asc   0 ;;
Sep 30 22:54:16 db2125 mysqld[30220]:  4: len 4; hex 00000002; asc     ;;
Sep 30 22:54:16 db2125 mysqld[30220]:  5: len 4; hex 00000d81; asc     ;;
Sep 30 22:54:16 db2125 mysqld[30220]:  6: len 4; hex 00000dd8; asc     ;;
Sep 30 22:54:16 db2125 mysqld[30220]:  7: len 30; hex 7b226e616d65223a2263656e746572222c2274656d706c617465496e666f; asc {"name":"center","templateInfo; (total 73 bytes);
Sep 30 22:54:16 db2125 mysqld[30220]: TRANSACTION 52208853465, ACTIVE 0 sec starting index read
Sep 30 22:54:16 db2125 mysqld[30220]: mysql tables in use 1, locked 1
Sep 30 22:54:16 db2125 mysqld[30220]: 2 lock struct(s), heap size 1128, 1 row lock(s)
Sep 30 22:54:16 db2125 mysqld[30220]: MySQL thread id 9, OS thread handle 140403288561408, query id 6496648 Updating
Sep 30 22:54:16 db2125 mysqld[30220]: DELETE /* MediaWiki\Linter\Database::setForPage  */ FROM `linter` WHERE linter_page = 702715
Sep 30 22:54:16 db2125 mysqld[30220]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Sep 30 23:08:12 db2125 mysqld[30220]: 2020-09-30 23:08:12 9 [ERROR] InnoDB: Record in index `cat_pages` of table `nowiki`.`category` was not found on update: TUPLE (info_bits=0, 2 fields): {[4]    (0x8000101C),[4] m  (0x006D8D03)} at: COMPACT RECORD(info_bits=32, 2 fields): {[4]    (0x8000101C),[4] l  (0x006CB006)}
Sep 30 23:08:12 db2125 mysqld[30220]: 2020-09-30 23:08:12 0x7fb23023d700  InnoDB: Assertion failure in file /root/mariadb-10.4.14/storage/innobase/row/row0ins.cc line 218
Sep 30 23:08:12 db2125 mysqld[30220]: InnoDB: Failing assertion: !cursor->index->is_committed()
Sep 30 23:08:12 db2125 mysqld[30220]: InnoDB: We intentionally generate a memory trap.
Sep 30 23:08:12 db2125 mysqld[30220]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Sep 30 23:08:12 db2125 mysqld[30220]: InnoDB: If you get repeated assertion failures or crashes, even
Sep 30 23:08:12 db2125 mysqld[30220]: InnoDB: immediately after the mysqld startup, there may be
Sep 30 23:08:12 db2125 mysqld[30220]: InnoDB: corruption in the InnoDB tablespace. Please refer to
Sep 30 23:08:12 db2125 mysqld[30220]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
Sep 30 23:08:12 db2125 mysqld[30220]: InnoDB: about forcing recovery.
Sep 30 23:08:12 db2125 mysqld[30220]: 200930 23:08:12 [ERROR] mysqld got signal 6 ;
Sep 30 23:08:12 db2125 mysqld[30220]: This could be because you hit a bug. It is also possible that this binary
Sep 30 23:08:12 db2125 mysqld[30220]: or one of the libraries it was linked against is corrupt, improperly built,
Sep 30 23:08:12 db2125 mysqld[30220]: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep 30 23:08:12 db2125 mysqld[30220]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Sep 30 23:08:12 db2125 mysqld[30220]: We will try our best to scrape up some info that will hopefully help
Sep 30 23:08:12 db2125 mysqld[30220]: diagnose the problem, but since we have already crashed,
Sep 30 23:08:12 db2125 mysqld[30220]: something is definitely wrong and this may fail.
Sep 30 23:08:12 db2125 mysqld[30220]: Server version: 10.4.14-MariaDB-log
Sep 30 23:08:12 db2125 mysqld[30220]: key_buffer_size=134217728
Sep 30 23:08:12 db2125 mysqld[30220]: read_buffer_size=131072
Sep 30 23:08:12 db2125 mysqld[30220]: max_used_connections=13
Sep 30 23:08:12 db2125 mysqld[30220]: max_threads=2010
Sep 30 23:08:12 db2125 mysqld[30220]: thread_count=17
Sep 30 23:08:12 db2125 mysqld[30220]: It is possible that mysqld could use up to
Sep 30 23:08:12 db2125 mysqld[30220]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4752266 K  bytes of memory
Sep 30 23:08:12 db2125 mysqld[30220]: Hope that's ok; if not, decrease some variables in the equation.
Sep 30 23:08:12 db2125 mysqld[30220]: Thread pointer: 0x7f51f40014f8
Sep 30 23:08:12 db2125 mysqld[30220]: Attempting backtrace. You can use the following information to find out
Sep 30 23:08:12 db2125 mysqld[30220]: where mysqld died. If you see no messages after this, something went
Sep 30 23:08:12 db2125 mysqld[30220]: terribly wrong...
Sep 30 23:08:12 db2125 mysqld[30220]: stack_bottom = 0x7fb23023c698 thread_stack 0x30000
Sep 30 23:08:12 db2125 mysqld[30220]: *** buffer overflow detected ***: /opt/wmf-mariadb104/bin/mysqld terminated
Sep 30 23:08:13 db2125 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
-- Subject: Unit process exited

Comment by Daniel Black [ 2020-10-02 ]

Via marko

One currently suspected work around is to configuring innodb_purge_threads=1. The default was changed to 4 in MariaDB 10.2.2 via MySQL 5.7. Most complaints are related to upgrading to 10.2 or later, which makes sense given that 5.5 and 10.1 are at or near EOL.

Do you have a load generating script that we could run on a local copy of your dumps (https://dumps.wikimedia.org)?

Comment by Manuel Arostegui [ 2020-10-02 ]

Thanks for the comment Daniel.
Our default is indeed innodb_purge_threads=1
Unfortunately we do not have such script

Comment by Marko Mäkelä [ 2020-10-02 ]

marostegui, the dump that you posted is for MDEV-22373. This bug was filed for InnoDB data page corruption.

Since when do you have innodb_purge_threads=1? Was it ever set to something else while the data files existed in InnoDB? If it was set before the data was imported, then that would be an interesting piece of input for MDEV-22373 (proving that the corruption can be triggered even when solely using single-threaded purge).

Comment by Manuel Arostegui [ 2020-10-02 ]

Sorry Marko, so many comments and bugs lately. I will copy it there and comment there too

Comment by Manuel Arostegui [ 2020-12-14 ]

I have posted the reply on MDEV-22373 answering innodb_purge_threads=1 question

Comment by Marko Mäkelä [ 2021-03-22 ]

marostegui, is any further information available on this? Are you still seeing these failures? Was any hardware fault found?

Comment by Manuel Arostegui [ 2021-03-22 ]

We've seen errors like the ones described on the comments above (https://jira.mariadb.org/browse/MDEV-23653?focusedCommentId=167372&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167372) but we've not seen mysql crashing because of them anymore.
We are checking hosts tables after upgrading them from 10.1 to 10.4 and rebuilding the tables that get index corruption flagged.

Comment by Manuel Arostegui [ 2021-04-26 ]

Not sure how much feedback I can give on this

Comment by Marko Mäkelä [ 2021-07-23 ]

marostegui, I am closing this ticket because this does not appear to be reproducible, and we never encountered anything like that in our internal testing. I double-checked that no complete page dump of a corrupted page was provided in this ticket. Hence, there is not enough information to diagnose any bug in the page checksum calculation.

In 10.4, I would recommend

SET GLOBAL innodb_checksum_algorithm=full_crc32;

to get faster and more secure page checksums. The setting is the default starting with the 10.5 release. It only affects tables that have been created or rebuilt while the setting was in effect. Old-format files will be treated as crc32 (or if you use strict_full_crc32, as strict_crc32).

One thing worth noting is that the definition of innodb_checksum_algorithm=crc32 was slightly changed in MDEV-17958 so that we would no longer accept an incorrect variant of a CRC-32C algorithm that was implemented in MySQL 5.6, MariaDB 10.0, 10.1 on processors using big-endian byte order. That incorrect variant was introduced in MySQL 5.7 and MariaDB 10.2.

If you were using innodb_encrypt_tables, there were some improvements to the way how the checksums are calculated, in MDEV-18529 and some related tickets.

I would assume that you are using unencrypted tables on commodity AMD64 hardware.

I am afraid that this will remain a mystery. Page corruption could remain dormant for a long time, until the page is actually accessed. Maybe some very rarely accessed BLOB contained a corrupted page?

Index corruption (such as MDEV-22373) occurs at a much higher level and cannot be blamed on hardware as easily as page corruption.

Comment by Manuel Arostegui [ 2021-07-23 ]

Thank you Marko

Generated at Thu Feb 08 09:24:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.