[MDEV-28167] MariaDB Core Dump / [ERROR] InnoDB: Corruption of an index tree Created: 2022-03-24 Updated: 2022-05-01 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Galera SST |
| Affects Version/s: | 10.5.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Michael Landin | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Compiled from source on FreeBSD 12.3 |
||
| Description |
|
My mariadb instance(s) crashed with the following error:
I tried opening the core file with gdb, to get more insights from not really getting any clues:
I guess SIGABRT is in libc? |
| Comments |
| Comment by Sergei Golubchik [ 2022-03-30 ] | |
|
gdb complained that
Are you sure the executable matches the core file? | |
| Comment by Marko Mäkelä [ 2022-03-30 ] | |
|
InnoDB invokes abort() when encountering a fatal error. Two known sources of secondary index corruption are the InnoDB change buffer (see | |
| Comment by Michael Landin [ 2022-03-30 ] | |
|
@sergei - I created a new build box with the same version of mariadb (with debug symbols) to correctly debug the core file. That is the cause of this error. | |
| Comment by Marko Mäkelä [ 2022-03-30 ] | |
|
michbsd, are you sure that it was a reproducible build? | |
| Comment by Michael Landin [ 2022-03-30 ] | |
|
Euh.. ? But, I guess you answered my question on your earlier comment - InnoDB threw a FATAL error, that invoked abort() - and that is why we got a core. Now, I need to try to understand what could have caused the FATAL error - no schema changes or otherwise potentially breaking things were happening at the moment. IWe were just doing normal INSERT/UPDATE/SELECT statements when the corruption of the Index occurred. | |
| Comment by Marko Mäkelä [ 2022-03-30 ] | |
|
michbsd, if you build something from source code without using an equivalent environment and tools that were used for building an executable, you will likely not get an executable that has the same addresses. For example, source code file names (with full paths) may be embedded in the executable. If the source code directory name or the build directory name differs from the original build, you could already have lost. That is what reproducible builds are about. A stack trace will appear corrupted if the top of the stack does not match the libraries or executables. On GNU/Linux, the typical cause of this is when an executable and core dump are copied to a different system where libc.so differs from the one that generated the core dump. Here, the more likely cause for corrupted stack traces could be that the mariadbd executable differs. Can you reproduce the crash with your self-built executable, to get a proper stack trace? | |
| Comment by Michael Landin [ 2022-03-30 ] | |
|
Lol.. I am not able to reproduce the crash on the production system. I have no idea what caused it (what I am trying to figure out) But regardless, I would say the focus should be on these lines: "[ERROR] InnoDB: Corruption of an index tree: table" right? Those caused InnoDB to invoke abort().. Or did I miss something? | |
| Comment by Marko Mäkelä [ 2022-03-30 ] | |
|
The fatal message is output when the internal links of a table are found to be corrupted. CHECK TABLE without QUICK should exercise this code. You could start by executing CHECK TABLE on every InnoDB table. Once you have identified the corrupted table, the schema of that table would be helpful to know. I do not have any idea what could cause this type of corruption in normal circumstances. Abnormal circumstances could include the following:
| |
| Comment by Michael Landin [ 2022-03-30 ] | |
|
Thank you for the insights. I restarted mariadb with --innodb-force-recovery=2 flag - after that I ran OPTIMIZE on the database showing problems and then everything was fine again. I do not believe it was faulty hardware (as I run galera on top of my mariadb - and the change cuased all 3 cluster nodes to core dump with the same error) | |
| Comment by Marko Mäkelä [ 2022-03-30 ] | |
|
Thank you for clarifying that Galera snapshot transfers are involved. I think that there have been some problems with that, and also some recent fixes by sysprg. FreeBSD does not support asynchronous I/O (or at least we do not implement any API for that), and therefore my remarks in Which Galera snapshot transfer method do you use? | |
| Comment by Michael Landin [ 2022-03-30 ] | |
|
I use rsync |