[MDEV-18807] Slave crash with index corruption during UPDATE Created: 2019-03-04  Updated: 2023-05-12  Resolved: 2023-05-12

Status: Closed
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB
Affects Version/s: 10.3.12
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Assen Totin (Inactive) Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: None
Environment:

RHEL-7.6 using MariaDB RPMs


Attachments: File mariadb.log    

 Description   

A slave crashed on trivial, single-line UPDATE from master. Queries like this one were executed successfully thousands of times before.

RHEL runs in a VMWare VM with storage over FC/SAN, so no immediate hardware failure is suspected. The VM runs with 8 GB RAM and usually has less than 100 concurrent connections; no OOM actions logged by the OS. While there are multiple databases active, no other MariaDB or OS errors were logged before the crash (e.g., like max open files limit
exhausted et al). The VM is restarted on weekly basis.

The immediate error messages were:

InnoDB: Page old data size 270 new data size 402, page old max ins size 15980 new max ins size 15848
Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.12/storage/innobase/btr/btr0cur.cc line 4879
Failing assertion: page_zip || optim_err != DB_UNDERFLOW

System log messages are attached.

After this point MariaDB as unable to restart at all with, which required a backup/restore from master.

I could not find anything similar in Jira, hence opening this for possible consideration.



 Comments   
Comment by Elena Stepanova [ 2019-03-04 ]

Do I understand correctly that it's the error which happens after the initial crash, upon an attempt to restart the server?
Could you please provide a bigger portion of the log, which covers the initial crash as well? (Preferably everything from the last successful server startup, up to including the initial crash, attempt to restart and failure upon restart).
Thanks.

Comment by Assen Totin (Inactive) [ 2019-03-04 ]

Hi, Elena,

Negative, the messages I gave and the attached log are from the first server crash - during the processing of the received UPDATE query. I don't have anything else as the crash was several days ago, but went unnoticed since there were other slaves to cover up for it. It is a production environment, so no debug symbols were loaded and server's debug level is kept low, etc. I was able to check that the problem did not result from VM migration as there was not such at the time of the crash (which was my hope to still link the case to a "hardware" issue).

All subsequent crashes logged virtually the same message and a very similar, if not identical, stack trace, pointing to come file corruption.

I don't really have big expectations that this particular crash can be chased down, but rather opened the ticket more to serve as hint in case anybody else finds himself in a similar situation and searches the Jira. The place where the error occurred and the stack trace make me think this particular crash is more or less "random" and came as a side effect of some other bug, so that it was not related to the actual query, but to the overall system state at that moment. That said, feel free to lower the priority or reclassify this bug as seen appropriate.

Comment by Marko Mäkelä [ 2023-04-14 ]

This could be a duplicate of MDEV-19916. If you ever executed ALTER TABLE…ADD COLUMN, it is the most likely cause. Is it?

Generated at Thu Feb 08 08:46:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.