[MDEV-27949] [crash] Unable to find a record to delete-mark - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.5.12
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB
Labels:
- crash
- innodb
Environment:
Debian Bullseye on Kobol Helios4
SoC: Marvell Armada 380/385

MariaDB 1:10.5.12-0+deb11u1 armhf

Description

This morning mysql crashed with error below.

Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: Unable to find a record to delete-mark
Feb 25 01:49:56 ains mysqld: InnoDB: tuple DATA TUPLE: 3 fields;
Feb 25 01:49:56 ains mysqld: 0: len 8; hex 8000000000000003; asc ;;
Feb 25 01:49:56 ains mysqld: 1: len 32; hex 3636643739393733366632656563343732393865323738393734366661393930; asc 66d799736f2eec47298e2789746fa990;;
Feb 25 01:49:56 ains mysqld: 2: len 8; hex 8000000000052c42; asc ,B;;
{{Feb 25 01:49:56 ains mysqld: }}
Feb 25 01:49:56 ains mysqld: InnoDB: record PHYSICAL RECORD: n_fields 3; compact format; info bits 0
Feb 25 01:49:56 ains mysqld: 0: len 8; hex 8000000000000003; asc ;;
Feb 25 01:49:56 ains mysqld: 1: len 30; hex 363664373939373336663265656334373239386532373839373436666139; asc 66d799736f2eec47298e2789746fa9; (total 32 bytes);
Feb 25 01:49:56 ains mysqld: 2: len 8; hex 8000000000052c01; asc , ;;
Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: page [page id: space=242, page number=447] (161 records, index id 954).
Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

any attempt to interact with this specific database table after that failed.

restarting only made things worse, as the process crashes with signal 11. part of restart log below.

Feb 25 08:32:10 ains mysqld: 2022-02-25 8:32:10 0 [ERROR] InnoDB: Unable to decompress ./nextcloud/oc_filecache.ibd[page id: space=242, page number=447]
Feb 25 08:32:10 ains mysqld: 220225 8:32:10 [ERROR] mysqld got signal 11 ;

What is needed to resolve this issue and have mariadb (and as a result nextcloud) online again?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb_crash.txt
3.01 MB
2022-02-25 13:23

Issue Links

duplicates

MDEV-27765 MariaDB stopped to work randomly - misery started at "Unable to find a record to delete-mark"

Closed

relates to

MDEV-26917 InnoDB: Clustered record for sec rec not found index

Closed

Activity

Ascending order - Click to sort in descending order

Michael Widenius added a comment - 2022-02-25 14:50

How was the database created?

What it means is that somehow your data on disk is "logically faulty".
The reason that this happens could be one of the things below:

Wrong configuration (disabling double write buffer or removing syncing to disk)
Faulty disk/memory
Copying a live database with cp, copy or rsync to another machine.
A bug in InnoDB. This is the least likely scenario as the InnoDB code is very well tested.

The error:
Feb 25 08:32:10 ains mysqld: 2022-02-25 8:32:10 0 [ERROR] InnoDB: Unable to decompress ./nextcloud/oc_filecache.ibd[page id: space=242, page number=447]

Indicates that a block on disk is wrong. If this is a virtual disk, I would suspect that there is a bug in the
caching layer for that.

To fix the issue, you can try to do a mysqldump of all tables and restore them on another system/directory.
You can start mysqld with one of the InnoDB recovery options to try to get past the above error.
See https://mariadb.com/kb/en/innodb-recovery-modes/ for how to do that

Michael Widenius added a comment - 2022-02-25 14:50 How was the database created? What it means is that somehow your data on disk is "logically faulty". The reason that this happens could be one of the things below: Wrong configuration (disabling double write buffer or removing syncing to disk) Faulty disk/memory Copying a live database with cp, copy or rsync to another machine. A bug in InnoDB. This is the least likely scenario as the InnoDB code is very well tested. The error: Feb 25 08:32:10 ains mysqld: 2022-02-25 8:32:10 0 [ERROR] InnoDB: Unable to decompress ./nextcloud/oc_filecache.ibd [page id: space=242, page number=447] Indicates that a block on disk is wrong. If this is a virtual disk, I would suspect that there is a bug in the caching layer for that. To fix the issue, you can try to do a mysqldump of all tables and restore them on another system/directory. You can start mysqld with one of the InnoDB recovery options to try to get past the above error. See https://mariadb.com/kb/en/innodb-recovery-modes/ for how to do that

RobJE added a comment - 2022-02-25 15:14

the message "unable to decompress" is after the first crash at Feb 25 01:49:56.

Initial crash message is the first 12 lines in the attached log.

This database has been "running" on this physical machine for quite some time (over a year).

Though hard to completely exclude faulty disk or memory are IMO not the cause.

machine is a physical machine
machine has 2 disk raid-1 setup
machine was shipped august 2019, which is not that old
also no logs indicating bad ram

That leafs faulty configuration (although not touched for week/months) or a bug in InnoDB

RobJE added a comment - 2022-02-25 15:14 the message "unable to decompress" is after the first crash at Feb 25 01:49:56. Initial crash message is the first 12 lines in the attached log. This database has been "running" on this physical machine for quite some time (over a year). Though hard to completely exclude faulty disk or memory are IMO not the cause. machine is a physical machine machine has 2 disk raid-1 setup machine was shipped august 2019, which is not that old also no logs indicating bad ram That leafs faulty configuration (although not touched for week/months) or a bug in InnoDB

RobJE added a comment - 2022-02-26 20:17

removing ./nextcloud/oc_filecache.ibd file allowed mariadb to start.
After deleting and re-creating table oc_filecache I'm up and running.

Still interested what caused the initial error and crash

RobJE added a comment - 2022-02-26 20:17 removing ./nextcloud/oc_filecache.ibd file allowed mariadb to start. After deleting and re-creating table oc_filecache I'm up and running. Still interested what caused the initial error and crash

Marko Mäkelä added a comment - 2022-11-02 07:53

I believe that ~~MDEV-26917~~ can be the consequence of this error message. The subsequent crash that you experienced has likely been fixed in ~~MDEV-13542~~.

What would be highly interesting to me is how to reproduce the corruption of the change buffer (InnoDB: Unable to find a record to delete-mark). It could be specific to ROW_FORMAT=COMPRESSED, like the subsequent crash definitely was.

Marko Mäkelä added a comment - 2022-11-02 07:53 I believe that MDEV-26917 can be the consequence of this error message. The subsequent crash that you experienced has likely been fixed in MDEV-13542 . What would be highly interesting to me is how to reproduce the corruption of the change buffer ( InnoDB: Unable to find a record to delete-mark ). It could be specific to ROW_FORMAT=COMPRESSED , like the subsequent crash definitely was.

Marko Mäkelä added a comment - 2023-05-08 10:02

In ~~MDEV-27765~~ I had posted a possible cause of this bug. ~~MDEV-30009~~ was fixed in MariaDB Server 10.5.19. But that fix would not heal any dormant corruption in the change buffer.

Would recent versions of MariaDB Server 10.6 avoid the crash?

Marko Mäkelä added a comment - 2023-05-08 10:02 In MDEV-27765 I had posted a possible cause of this bug. MDEV-30009 was fixed in MariaDB Server 10.5.19. But that fix would not heal any dormant corruption in the change buffer. Would recent versions of MariaDB Server 10.6 avoid the crash?

Marko Mäkelä added a comment - 2023-06-28 06:33

robje, let me try to clarify what sort of feedback I am requesting.

Before ~~MDEV-13542~~ and some similar bugs were fixed in MariaDB Server 10.6, InnoDB would very easily crash when encountering any form of corruption. Those fixes are not feasible to apply to older major versions of MariaDB Server, because they depend on some heavy refactoring that was done in the 10.6 branch.

In the attached log file mariadb_crash.txt there is no stack trace of the crash, so I can’t know where the signal 11 (SIGSEGV, segmentation violation, typically an attempt to dereference a null pointer) would have been raised. I would like to know how MariaDB Server 10.6.14 would behave on this corrupted data. If it would crash, I would like to see the stack trace so that the crash can be prevented.

The log records in Description might actually be a sign of two bugs: a bug in the InnoDB change buffer (see ~~MDEV-27765~~), and something specific to ROW_FORMAT=COMPRESSED tables, which OpenCloud or NextCloud used to enable by default. For ROW_FORMAT=COMPRESSED tables, I recently implemented a partial fix of ~~MDEV-30882~~. The user who provided the data for reproducing that bug also provided data for another bug related to ROW_FORMAT=COMPRESSED page overflow. I have not yet filed that bug, because I wanted to reproduce it first.

Marko Mäkelä added a comment - 2023-06-28 06:33 robje , let me try to clarify what sort of feedback I am requesting. Before MDEV-13542 and some similar bugs were fixed in MariaDB Server 10.6, InnoDB would very easily crash when encountering any form of corruption. Those fixes are not feasible to apply to older major versions of MariaDB Server, because they depend on some heavy refactoring that was done in the 10.6 branch. In the attached log file mariadb_crash.txt there is no stack trace of the crash, so I can’t know where the signal 11 (SIGSEGV, segmentation violation, typically an attempt to dereference a null pointer) would have been raised. I would like to know how MariaDB Server 10.6.14 would behave on this corrupted data. If it would crash, I would like to see the stack trace so that the crash can be prevented. The log records in Description might actually be a sign of two bugs: a bug in the InnoDB change buffer (see MDEV-27765 ), and something specific to ROW_FORMAT=COMPRESSED tables, which OpenCloud or NextCloud used to enable by default. For ROW_FORMAT=COMPRESSED tables, I recently implemented a partial fix of MDEV-30882 . The user who provided the data for reproducing that bug also provided data for another bug related to ROW_FORMAT=COMPRESSED page overflow. I have not yet filed that bug, because I wanted to reproduce it first.

RobJE added a comment - 2023-06-28 09:49

I'm not sure I can help with this.

MariaDB was moved to a virtual-machine on Intel and upgraded to a later MariaDB version. All the logs that were there were included in this bug.

The crash happened once. Not sure how reproducible this crash was/is.

RobJE added a comment - 2023-06-28 09:49 I'm not sure I can help with this. MariaDB was moved to a virtual-machine on Intel and upgraded to a later MariaDB version. All the logs that were there were included in this bug. The crash happened once. Not sure how reproducible this crash was/is.

Sergei Golubchik added a comment - 2023-07-08 09:41

robje, thanks. Then let's do the following: we'll keep this bug open for a month, if you experience another crash — please, add a comment with the info, may be even you'll have the logs. After a month we'll close it, in the assumption that the latest MariaDB version has this crash fixed. But nevertheless, if you experience another crash — add a comment and we'll reopen the issue again.

Sergei Golubchik added a comment - 2023-07-08 09:41 robje , thanks. Then let's do the following: we'll keep this bug open for a month, if you experience another crash — please, add a comment with the info, may be even you'll have the logs. After a month we'll close it, in the assumption that the latest MariaDB version has this crash fixed. But nevertheless, if you experience another crash — add a comment and we'll reopen the issue again.

People

Assignee:: Marko Mäkelä

Reporter:: RobJE

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2022-02-25 13:24

Updated:: 2023-08-07 11:29

Resolved:: 2023-08-07 11:29

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.