Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27053

Crash on assertion failure in btr0cur.cc - apparent index corruption




      Server version: 10.5.12-MariaDB-1:10.5.12+maria~buster-log

      At some point in time prior to 2021-10-28 09:13 UTC the index on the serial column in the certificates table became corrupted. The attempt to insert a new certificate caused the following error:

      Oct 28 09:13:51 2021-10-28 09:13:51 0x7ff648064700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.12/storage/innobase/btr/btr0cur.cc line 336

      This type of entry is one of the most common inserts on this database. We were unable to identify anything unusual or problematic about this specific insert. This database exists in a Galera cluster and the service accesses the cluster via ProxySQL. Once the first node failed, ProxySQL shifted to the next node and the service resent the insert. This killed the next database node as well and so on until quorum was lost and nodes began to refuse queries.

      Our cluster nodes go through a regular wipe and rebuild process. That means that state transfers are relatively common in our cluster, and we expect that that is how the corruption affected all nodes in the cluster equally.

      To diagnose we shut off all Galera functionality and brought up a backup of the DB on a single node. We attempted to insert the query and were able to consistently duplicate the crash with the previously listed error. We have retained this backup for purposes of duplicating this failure as needed. We performed CHECK TABLE certificates and this also caused the database to crash, but with the following error:

      2021-10-28 20:36:12 4 [ERROR] InnoDB: In pages [page id: space=30, page number=924] and [page id: space=30, page number=3830] of index `serial` of table `boulder`.`certificates` /* Partition `p_start` */
      InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links
      2021-10-28 20:36:12 4 [ERROR] InnoDB: Corruption of an index tree: table `boulder`.`certificates` /* Partition `p_start` */ index `serial`, father ptr page no 7545, child page no 924

      We found that running optimize tables was adequate to repair the corrupted index and we were then able to perform the insert that had been failing. In our review of our database and logs we do not appear to have any data loss.

      At this point we have no clear idea as to what caused the index corruption or when/if it will reoccur. In the hope that it may help others in the community, we're opening this bug report.


        Issue Links



              jplindst Jan Lindström (Inactive)
              sohelpful Daniel Jeffery
              0 Vote for this issue
              5 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.