[MDEV-28122] OPTIMIZE TABLE crash Created: 2022-03-18 Updated: 2024-01-15 Resolved: 2023-10-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table, Storage Engine - InnoDB |
| Affects Version/s: | 10.6.0, 10.6.7, 10.11.4 |
| Fix Version/s: | 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Axel Dörfler | Assignee: | Thirunarayanan Balathandayuthapani |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | crash, regression | ||
| Environment: |
Windows Server 2022 Standard |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
The InnoDB table file is 144 GB; the #sql-alter-2964-a.ibd file is 56 GB in size, which might very well be its final size; about 700 million rows had been deleted from the table. There are currently 174 GB free on the data device (with both files in place). I received the following crash:
|
| Comments |
| Comment by Alice Sherepa [ 2022-03-18 ] | ||||||||||||||||||||||||||||||||||||
|
probably the same bug as | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-03-22 ] | ||||||||||||||||||||||||||||||||||||
|
The latest trace for | ||||||||||||||||||||||||||||||||||||
| Comment by Axel Dörfler [ 2022-03-23 ] | ||||||||||||||||||||||||||||||||||||
|
The server has garbled its boot disk to a point where it could not be repaired. It has now been reset from a backup. While the server uses a RAID-1 and ECC memory, and did not log anything suspicious, there is a good chance that this is a hardware error. And if that's true, the crash could well be caused by that, too. I have now resurrected the server's predecessor, and a SHOW TABLE STATUS shows this:
But this was before I deleted more than half the table (its oldest entries). | ||||||||||||||||||||||||||||||||||||
| Comment by Axel Dörfler [ 2022-03-23 ] | ||||||||||||||||||||||||||||||||||||
|
I will try the same procedure on the old server, and see how it goes there... | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-19 ] | ||||||||||||||||||||||||||||||||||||
|
Both in 10.6.7 and 10.11.4, the crash occurs here in row_log_table_apply_op():
I now realize that we do not check for mrec > mrec_end until some steps later. In another case we do have a proper check for end-of-buffer:
Also in ROW_T_UPDATE we are potentially reading after the end of the buffer. In row_log_apply_op() (which is used during online CREATE INDEX) we have proper overflow checks in place. | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-19 ] | ||||||||||||||||||||||||||||||||||||
|
I failed to notice that there actually is a check right before the switch statement:
So, the check after case ROW_T_DELETE: is actually redundant, and the reason for this crash remains a mystery until we have a reproducible test case. | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-13 ] | ||||||||||||||||||||||||||||||||||||
|
The check that I quoted had been removed as part of I think that we just need to put back the check right before the switch(*mrec++) statement, like this:
It would be great to have a test for this in our regression test suite. |