Server version: 10.5.12-MariaDB-1:10.5.12+maria~buster-log
At some point in time prior to 2021-10-28 09:13 UTC the index on the serial column in the certificates table became corrupted. The attempt to insert a new certificate caused the following error:
This type of entry is one of the most common inserts on this database. We were unable to identify anything unusual or problematic about this specific insert. This database exists in a Galera cluster and the service accesses the cluster via ProxySQL. Once the first node failed, ProxySQL shifted to the next node and the service resent the insert. This killed the next database node as well and so on until quorum was lost and nodes began to refuse queries.
Our cluster nodes go through a regular wipe and rebuild process. That means that state transfers are relatively common in our cluster, and we expect that that is how the corruption affected all nodes in the cluster equally.
To diagnose we shut off all Galera functionality and brought up a backup of the DB on a single node. We attempted to insert the query and were able to consistently duplicate the crash with the previously listed error. We have retained this backup for purposes of duplicating this failure as needed. We performed CHECK TABLE certificates and this also caused the database to crash, but with the following error:
We found that running optimize tables was adequate to repair the corrupted index and we were then able to perform the insert that had been failing. In our review of our database and logs we do not appear to have any data loss.
At this point we have no clear idea as to what caused the index corruption or when/if it will reoccur. In the hope that it may help others in the community, we're opening this bug report.