[MDEV-19743] Crash while reorganizing an index page Created: 2019-06-12 Updated: 2020-08-25 Resolved: 2019-11-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.3.14 |
| Fix Version/s: | 10.3.17, 10.4.7 |
| Type: | Bug | Priority: | Major |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Marko Mäkelä |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
A user saw the following crash:
There seems to be two different errors here. The first error is:
This first error seems to occur here: https://github.com/MariaDB/server/blob/mariadb-10.3.14/storage/innobase/btr/btr0btr.cc#L1706 This first error was also seen in The second error is:
This second error seems to occur here: https://github.com/MariaDB/server/blob/mariadb-10.3.14/storage/innobase/page/page0page.cc#L612 Some of the messages immediately prior to the crash indicate that a large Galera Cluster transaction was executed:
Note that "Failed to report last committed" sounds scary, but it is pretty harmless. See MDEV-17550. |
| Comments |
| Comment by Marko Mäkelä [ 2019-06-13 ] | |
|
The error messages immediately preceding the intentional crash are complaining that the payload size of the index page is changing during a reorganize operation. The reorganize should only do ‘garbage collection’, not affecting the total size of the contained records. It is possible that the page was corrupted when the reorganize was initiated. I remember seeing this a lot during the development of I am assigning this to Matthias, for the creation of a test case. | |
| Comment by Bernard Grymonpon [ 2019-06-17 ] | |
|
I just found this case, and this seems similar to what we encountered and I tried to describe in https://jira.mariadb.org/browse/MDEV-19783 (i linked both cases). | |
| Comment by Marko Mäkelä [ 2019-08-23 ] | |
|
I have a plausible explanation of this bug in | |
| Comment by Marko Mäkelä [ 2019-11-12 ] | |
|
I believe that I may have found an explanation why mleich failed to repeat this corruption with recent 10.3 or 10.4. Note: upgrading to a fixed version will not fix existing corruption. I am afraid that the corruption can only be fixed by restoring the table from a logical or physical backup. You might try your luck with
Even if it did not crash, it could mean that some data (in particular, the contents of any instantly added columns) could be corrupted. | |
| Comment by Marko Mäkelä [ 2019-11-15 ] | |
|
I think that this report duplicates |