Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27054

Crash on assertion failure in btr0cur.cc - apparent index corruption

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 10.5.12
    • Fix Version/s: N/A
    • Labels:
      None
    • Environment:
      Custom Debian Buster Docker container on Debian Buster host OS.

      Description

      Server version: 10.5.12-MariaDB-1:10.5.12+maria~buster-log

      At some point in time prior to 2021-10-28 09:13 UTC the index on the serial column in the certificates table became corrupted. The attempt to insert a new certificate caused the following error:

      Oct 28 09:13:51 2021-10-28 09:13:51 0x7ff648064700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.12/storage/innobase/btr/btr0cur.cc line 336
      

      This type of entry is one of the most common inserts on this database. We were unable to identify anything unusual or problematic about this specific insert. This database exists in a Galera cluster and the service accesses the cluster via ProxySQL. Once the first node failed, ProxySQL shifted to the next node and the service resent the insert. This killed the next database node as well and so on until quorum was lost and nodes began to refuse queries.

      Our cluster nodes go through a regular wipe and rebuild process. That means that state transfers are relatively common in our cluster, and we expect that that is how the corruption affected all nodes in the cluster equally.

      To diagnose we shut off all Galera functionality and brought up a backup of the DB on a single node. We attempted to insert the query and were able to consistently duplicate the crash with the previously listed error. We have retained this backup for purposes of duplicating this failure as needed. We performed CHECK TABLE certificates and this also caused the database to crash, but with the following error:

      2021-10-28 20:36:12 4 [ERROR] InnoDB: In pages [page id: space=30, page number=924] and [page id: space=30, page number=3830] of index `serial` of table `boulder`.`certificates` /* Partition `p_start` */
      InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links
      2021-10-28 20:36:12 4 [ERROR] InnoDB: Corruption of an index tree: table `boulder`.`certificates` /* Partition `p_start` */ index `serial`, father ptr page no 7545, child page no 924
      

      We found that running optimize tables was adequate to repair the corrupted index and we were then able to perform the insert that had been failing. In our review of our database and logs we do not appear to have any data loss.

      At this point we have no clear idea as to what caused the index corruption or when/if it will reoccur. In the hope that it may help others in the community, we're opening this bug report.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              sohelpful Daniel Jeffery
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.