[MDEV-27053] Crash on assertion failure in btr0cur.cc - apparent index corruption Created: 2021-11-15 Updated: 2023-04-11 Resolved: 2023-04-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Manipulation - Insert, Galera SST |
| Affects Version/s: | 10.5.12 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Jeffery | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | corruption | ||
| Environment: |
Custom Debian Buster Docker container on Debian Buster host OS. |
||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
Server version: 10.5.12-MariaDB-1:10.5.12+maria~buster-log At some point in time prior to 2021-10-28 09:13 UTC the index on the serial column in the certificates table became corrupted. The attempt to insert a new certificate caused the following error:
This type of entry is one of the most common inserts on this database. We were unable to identify anything unusual or problematic about this specific insert. This database exists in a Galera cluster and the service accesses the cluster via ProxySQL. Once the first node failed, ProxySQL shifted to the next node and the service resent the insert. This killed the next database node as well and so on until quorum was lost and nodes began to refuse queries. Our cluster nodes go through a regular wipe and rebuild process. That means that state transfers are relatively common in our cluster, and we expect that that is how the corruption affected all nodes in the cluster equally. To diagnose we shut off all Galera functionality and brought up a backup of the DB on a single node. We attempted to insert the query and were able to consistently duplicate the crash with the previously listed error. We have retained this backup for purposes of duplicating this failure as needed. We performed CHECK TABLE certificates and this also caused the database to crash, but with the following error:
We found that running optimize tables was adequate to repair the corrupted index and we were then able to perform the insert that had been failing. In our review of our database and logs we do not appear to have any data loss. At this point we have no clear idea as to what caused the index corruption or when/if it will reoccur. In the hope that it may help others in the community, we're opening this bug report. |
| Comments |
| Comment by Marko Mäkelä [ 2021-11-16 ] | ||
|
The assertion fails because a corruption of the doubly linked list of sibling pages is noticed. Apparently the same error is reported in a friendlier way by CHECK TABLE. We’d better propagate the error to the client connection, instead of crashing the server. That is the topic of Normally, such inconsistency between linked pages is prevented by atomic mini-transactions, which combine page locking and logging. Any operation that would modify multiple pages is logged as an atomic mini-transaction. If for some reason the ib_logfile0 is not being copied in a consistent way and some logged changes are lost, you can get such corruption. (Note: Also starting up the server with innodb_force_recovery=6 or without the ib_logfile0 file is a recipe for disaster.) Because Galera has been mentioned, I would primarily suspect an error in the snapshot transfer mechanisms that were used. Any "backup" or "restore" step could potentially cause this type of corruption. Can you trace the history of this data directory in terms of such corruption? | ||
| Comment by Lex Neva [ 2021-11-16 ] | ||
|
Hi, I work with Dan and was involved with this incident. This is a really interesting theory! I guess I had just assumed that the corruption happened and then the state transfer spread it to the rest of the cluster. It hadn't occurred to me that a state transfer could be the source of the corruption. We can't precisely trace the origin of this data directory, but I can tell you a couple of things about it. This data directory has been passed around from node to node, because each of our nodes is wiped and re-installed from the ground up every two weeks. We have always used mariabackup-based state transfers. Over the life of this database, no node has been restored from an external backup and joined the cluster. It is easily possible that corruption was introduced during an SST and then passed to the rest of the nodes through subsequent SSTs. Based on your theory, we're wondering if we should switch to the mysqldump or rsync SST method. We care quite a bit about the integrity of the data in this cluster, and much less about the speed of SSTs. Do you feel that mariabackup is risky? Could the data itself be at risk and not just the indices? | ||
| Comment by Lex Neva [ 2021-11-16 ] | ||
|
And if a mariabackup state transfer is risky, maybe Galera should run a CHECK TABLES after the SST completes just to be safe? | ||
| Comment by Marko Mäkelä [ 2021-11-18 ] | ||
|
I am not deeply familiar with the Galera cluster or its snapshot transfers (SST), but I know InnoDB. Theoretically speaking, the mysqldump method should be safest. But it would also likely be the slowest. The rsync method is inherently risky if the source can theoretically be modified during the transfer. I suppose that this method would be protected by the setting innodb_disallow_writes, which causes some problems on its own ( The mariabackup a.k.a. mariadb-backup method is what we would recommend for backups. It is possible that you are hitting some bug in it. It is also possible that the Galera SST scripting around it is ignoring some error, and letting corruption propagate. A repeatable test case that mimics your workload (similar table structures and DML operations) would be very much appreciated. | ||
| Comment by Lex Neva [ 2021-11-18 ] | ||
|
Thanks, that agrees with my impression that mysqldump is the safest. We have been using mariabackup for backups as well as SSTs. One of our learnings from this incident is that we also need to do regular mysqldump backups, because this corruption was propagated into our backups. A repeatable test case is not feasible, I'm afraid. The exact circumstances that introduced the corruption are unknown, and we also have no clear idea when or how the corruption was introduced. The only thing we do know is that our query pattern did not change, nor did our cluster management / SST patterns. | ||
| Comment by Marko Mäkelä [ 2021-11-19 ] | ||
|
In I believe that with a correct implementation of necessary and sufficient locks, the rsync snapshot method should be the safest and fastest. But, currently we do not seem to be there. | ||
| Comment by Marko Mäkelä [ 2022-05-16 ] | ||
|
The error message about corrupted FIL_PAGE_NEXT or FIL_PAGE_PREV links is written to the error log by CHECK TABLE. The assertion that failed and caused the server crash is in the function btr_cur_latch_leaves():
I have rewritten that function in When it comes to the root cause of this corruption, I believe that the fix of | ||
| Comment by Marko Mäkelä [ 2022-06-07 ] | ||
|
If wsrep_sst_method=mariabackup was used, this corruption might be explained by |