Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27949

[crash] Unable to find a record to delete-mark

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.5.12
    • N/A
    • Debian Bullseye on Kobol Helios4
      SoC: Marvell Armada 380/385

      MariaDB 1:10.5.12-0+deb11u1 armhf

    Description

      This morning mysql crashed with error below.

      Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: Unable to find a record to delete-mark
      Feb 25 01:49:56 ains mysqld: InnoDB: tuple DATA TUPLE: 3 fields;
      Feb 25 01:49:56 ains mysqld: 0: len 8; hex 8000000000000003; asc ;;
      Feb 25 01:49:56 ains mysqld: 1: len 32; hex 3636643739393733366632656563343732393865323738393734366661393930; asc 66d799736f2eec47298e2789746fa990;;
      Feb 25 01:49:56 ains mysqld: 2: len 8; hex 8000000000052c42; asc ,B;;
      {{Feb 25 01:49:56 ains mysqld: }}
      Feb 25 01:49:56 ains mysqld: InnoDB: record PHYSICAL RECORD: n_fields 3; compact format; info bits 0
      Feb 25 01:49:56 ains mysqld: 0: len 8; hex 8000000000000003; asc ;;
      Feb 25 01:49:56 ains mysqld: 1: len 30; hex 363664373939373336663265656334373239386532373839373436666139; asc 66d799736f2eec47298e2789746fa9; (total 32 bytes);
      Feb 25 01:49:56 ains mysqld: 2: len 8; hex 8000000000052c01; asc , ;;
      Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: page [page id: space=242, page number=447] (161 records, index id 954).
      Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

      any attempt to interact with this specific database table after that failed.

      restarting only made things worse, as the process crashes with signal 11. part of restart log below.

      Feb 25 08:32:10 ains mysqld: 2022-02-25 8:32:10 0 [ERROR] InnoDB: Unable to decompress ./nextcloud/oc_filecache.ibd[page id: space=242, page number=447]
      Feb 25 08:32:10 ains mysqld: 220225 8:32:10 [ERROR] mysqld got signal 11 ;

      What is needed to resolve this issue and have mariadb (and as a result nextcloud) online again?

      Attachments

        Issue Links

          Activity

            How was the database created?

            What it means is that somehow your data on disk is "logically faulty".
            The reason that this happens could be one of the things below:

            • Wrong configuration (disabling double write buffer or removing syncing to disk)
            • Faulty disk/memory
            • Copying a live database with cp, copy or rsync to another machine.
            • A bug in InnoDB. This is the least likely scenario as the InnoDB code is very well tested.

            The error:
            Feb 25 08:32:10 ains mysqld: 2022-02-25 8:32:10 0 [ERROR] InnoDB: Unable to decompress ./nextcloud/oc_filecache.ibd[page id: space=242, page number=447]

            Indicates that a block on disk is wrong. If this is a virtual disk, I would suspect that there is a bug in the
            caching layer for that.

            To fix the issue, you can try to do a mysqldump of all tables and restore them on another system/directory.
            You can start mysqld with one of the InnoDB recovery options to try to get past the above error.
            See https://mariadb.com/kb/en/innodb-recovery-modes/ for how to do that

            monty Michael Widenius added a comment - How was the database created? What it means is that somehow your data on disk is "logically faulty". The reason that this happens could be one of the things below: Wrong configuration (disabling double write buffer or removing syncing to disk) Faulty disk/memory Copying a live database with cp, copy or rsync to another machine. A bug in InnoDB. This is the least likely scenario as the InnoDB code is very well tested. The error: Feb 25 08:32:10 ains mysqld: 2022-02-25 8:32:10 0 [ERROR] InnoDB: Unable to decompress ./nextcloud/oc_filecache.ibd [page id: space=242, page number=447] Indicates that a block on disk is wrong. If this is a virtual disk, I would suspect that there is a bug in the caching layer for that. To fix the issue, you can try to do a mysqldump of all tables and restore them on another system/directory. You can start mysqld with one of the InnoDB recovery options to try to get past the above error. See https://mariadb.com/kb/en/innodb-recovery-modes/ for how to do that
            robje RobJE added a comment -

            the message "unable to decompress" is after the first crash at Feb 25 01:49:56.

            Initial crash message is the first 12 lines in the attached log.

            This database has been "running" on this physical machine for quite some time (over a year).

            Though hard to completely exclude faulty disk or memory are IMO not the cause.

            • machine is a physical machine
            • machine has 2 disk raid-1 setup
            • machine was shipped august 2019, which is not that old
            • also no logs indicating bad ram

            That leafs faulty configuration (although not touched for week/months) or a bug in InnoDB

            robje RobJE added a comment - the message "unable to decompress" is after the first crash at Feb 25 01:49:56. Initial crash message is the first 12 lines in the attached log. This database has been "running" on this physical machine for quite some time (over a year). Though hard to completely exclude faulty disk or memory are IMO not the cause. machine is a physical machine machine has 2 disk raid-1 setup machine was shipped august 2019, which is not that old also no logs indicating bad ram That leafs faulty configuration (although not touched for week/months) or a bug in InnoDB
            robje RobJE added a comment -

            removing ./nextcloud/oc_filecache.ibd file allowed mariadb to start.
            After deleting and re-creating table oc_filecache I'm up and running.

            Still interested what caused the initial error and crash

            robje RobJE added a comment - removing ./nextcloud/oc_filecache.ibd file allowed mariadb to start. After deleting and re-creating table oc_filecache I'm up and running. Still interested what caused the initial error and crash

            I believe that MDEV-26917 can be the consequence of this error message. The subsequent crash that you experienced has likely been fixed in MDEV-13542.

            What would be highly interesting to me is how to reproduce the corruption of the change buffer (InnoDB: Unable to find a record to delete-mark). It could be specific to ROW_FORMAT=COMPRESSED, like the subsequent crash definitely was.

            marko Marko Mäkelä added a comment - I believe that MDEV-26917 can be the consequence of this error message. The subsequent crash that you experienced has likely been fixed in MDEV-13542 . What would be highly interesting to me is how to reproduce the corruption of the change buffer ( InnoDB: Unable to find a record to delete-mark ). It could be specific to ROW_FORMAT=COMPRESSED , like the subsequent crash definitely was.

            In MDEV-27765 I had posted a possible cause of this bug. MDEV-30009 was fixed in MariaDB Server 10.5.19. But that fix would not heal any dormant corruption in the change buffer.

            Would recent versions of MariaDB Server 10.6 avoid the crash?

            marko Marko Mäkelä added a comment - In MDEV-27765 I had posted a possible cause of this bug. MDEV-30009 was fixed in MariaDB Server 10.5.19. But that fix would not heal any dormant corruption in the change buffer. Would recent versions of MariaDB Server 10.6 avoid the crash?

            robje, let me try to clarify what sort of feedback I am requesting.

            Before MDEV-13542 and some similar bugs were fixed in MariaDB Server 10.6, InnoDB would very easily crash when encountering any form of corruption. Those fixes are not feasible to apply to older major versions of MariaDB Server, because they depend on some heavy refactoring that was done in the 10.6 branch.

            In the attached log file mariadb_crash.txt there is no stack trace of the crash, so I can’t know where the signal 11 (SIGSEGV, segmentation violation, typically an attempt to dereference a null pointer) would have been raised. I would like to know how MariaDB Server 10.6.14 would behave on this corrupted data. If it would crash, I would like to see the stack trace so that the crash can be prevented.

            The log records in Description might actually be a sign of two bugs: a bug in the InnoDB change buffer (see MDEV-27765), and something specific to ROW_FORMAT=COMPRESSED tables, which OpenCloud or NextCloud used to enable by default. For ROW_FORMAT=COMPRESSED tables, I recently implemented a partial fix of MDEV-30882. The user who provided the data for reproducing that bug also provided data for another bug related to ROW_FORMAT=COMPRESSED page overflow. I have not yet filed that bug, because I wanted to reproduce it first.

            marko Marko Mäkelä added a comment - robje , let me try to clarify what sort of feedback I am requesting. Before MDEV-13542 and some similar bugs were fixed in MariaDB Server 10.6, InnoDB would very easily crash when encountering any form of corruption. Those fixes are not feasible to apply to older major versions of MariaDB Server, because they depend on some heavy refactoring that was done in the 10.6 branch. In the attached log file mariadb_crash.txt there is no stack trace of the crash, so I can’t know where the signal 11 (SIGSEGV, segmentation violation, typically an attempt to dereference a null pointer) would have been raised. I would like to know how MariaDB Server 10.6.14 would behave on this corrupted data. If it would crash, I would like to see the stack trace so that the crash can be prevented. The log records in Description might actually be a sign of two bugs: a bug in the InnoDB change buffer (see MDEV-27765 ), and something specific to ROW_FORMAT=COMPRESSED tables, which OpenCloud or NextCloud used to enable by default. For ROW_FORMAT=COMPRESSED tables, I recently implemented a partial fix of MDEV-30882 . The user who provided the data for reproducing that bug also provided data for another bug related to ROW_FORMAT=COMPRESSED page overflow. I have not yet filed that bug, because I wanted to reproduce it first.
            robje RobJE added a comment -

            I'm not sure I can help with this.

            MariaDB was moved to a virtual-machine on Intel and upgraded to a later MariaDB version. All the logs that were there were included in this bug.

            The crash happened once. Not sure how reproducible this crash was/is.

            robje RobJE added a comment - I'm not sure I can help with this. MariaDB was moved to a virtual-machine on Intel and upgraded to a later MariaDB version. All the logs that were there were included in this bug. The crash happened once. Not sure how reproducible this crash was/is.

            robje, thanks. Then let's do the following: we'll keep this bug open for a month, if you experience another crash — please, add a comment with the info, may be even you'll have the logs. After a month we'll close it, in the assumption that the latest MariaDB version has this crash fixed. But nevertheless, if you experience another crash — add a comment and we'll reopen the issue again.

            serg Sergei Golubchik added a comment - robje , thanks. Then let's do the following: we'll keep this bug open for a month, if you experience another crash — please, add a comment with the info, may be even you'll have the logs. After a month we'll close it, in the assumption that the latest MariaDB version has this crash fixed. But nevertheless, if you experience another crash — add a comment and we'll reopen the issue again.

            People

              marko Marko Mäkelä
              robje RobJE
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.