Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.5.12
-
None
-
Custom Debian Buster Docker container on Debian Buster host OS.
Description
Server version: 10.5.12-MariaDB-1:10.5.12+maria~buster-log
At some point in time prior to 2021-10-28 09:13 UTC the index on the serial column in the certificates table became corrupted. The attempt to insert a new certificate caused the following error:
Oct 28 09:13:51 2021-10-28 09:13:51 0x7ff648064700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.12/storage/innobase/btr/btr0cur.cc line 336
|
This type of entry is one of the most common inserts on this database. We were unable to identify anything unusual or problematic about this specific insert. This database exists in a Galera cluster and the service accesses the cluster via ProxySQL. Once the first node failed, ProxySQL shifted to the next node and the service resent the insert. This killed the next database node as well and so on until quorum was lost and nodes began to refuse queries.
Our cluster nodes go through a regular wipe and rebuild process. That means that state transfers are relatively common in our cluster, and we expect that that is how the corruption affected all nodes in the cluster equally.
To diagnose we shut off all Galera functionality and brought up a backup of the DB on a single node. We attempted to insert the query and were able to consistently duplicate the crash with the previously listed error. We have retained this backup for purposes of duplicating this failure as needed. We performed CHECK TABLE certificates and this also caused the database to crash, but with the following error:
2021-10-28 20:36:12 4 [ERROR] InnoDB: In pages [page id: space=30, page number=924] and [page id: space=30, page number=3830] of index `serial` of table `boulder`.`certificates` /* Partition `p_start` */
|
InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links
|
2021-10-28 20:36:12 4 [ERROR] InnoDB: Corruption of an index tree: table `boulder`.`certificates` /* Partition `p_start` */ index `serial`, father ptr page no 7545, child page no 924
|
We found that running optimize tables was adequate to repair the corrupted index and we were then able to perform the insert that had been failing. In our review of our database and logs we do not appear to have any data loss.
At this point we have no clear idea as to what caused the index corruption or when/if it will reoccur. In the hope that it may help others in the community, we're opening this bug report.
Attachments
Issue Links
- duplicates
-
MDEV-27053 Crash on assertion failure in btr0cur.cc - apparent index corruption
- Closed