Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29938

InnoDB: Assertion failure in btr0pcur.cc line 532

Details

    Description

      The issue appeared after update to 10.4.25.

      While running musqldump for backup (recurses over all the databases present in the dataset), mysqld (a member of galera-26.4.12 cluster) started to crash with symptoms like this:

      [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=16968115, page number=3268], should be [page id: space=1668204, page number=37092]
      [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=16968115, page number=3268], should be [page id: space=1668204, page number=37092]
      [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=16968115, page number=3268], should be [page id: space=1668204, page number=37092]
      

      with following

      0x7e5748b7d640  InnoDB: Assertion failure in file /var/tmp/portage/dev-db/mariadb-10.4.26/work/mysql/storage/innobase/btr/btr0pcur.cc line 532
      InnoDB: Failing assertion: btr_page_get_prev(next_page) == btr_pcur_get_block(cursor)->page.id.page_no()
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      

      and exiting with signal 6.

      This doesn't happen on every backup execution: after umping whole dataset successfully for 4-5 times a crash like this happens.
      The server has enough free memory to dump the data. Happened on different servers, however, two conditions were the same: mariadb version and running mysqldump execution (backup).

      The behavior looks like serious bug. Whole log can be found attached.

      Is there any possible workaround for this issue?

      Attachments

        Issue Links

          Activity

            The crash should have been fixed in MDEV-13542.

            The cause of the corruption is somewhat of a mystery. It appears that a data page from the wrong tablespace has been written to a file. We added some debug assertion to catch an incorrect write, but that assertion has never fired in our internal testing. Up to MariaDB Server 10.4, the check in fil_io() looks like this:

            	ut_ad(!req_type.is_write()
            	      || page_id.space() == SRV_LOG_SPACE_FIRST_ID
            	      || !fil_is_user_tablespace_id(page_id.space())
            	      || offset == page_id.page_no() * zip_size);
            

            In MDEV-23855 (MariaDB Server 10.5.7), the function was renamed to fil_space_t::io(), and it looks like this assertion was removed. We never encountered the symptoms nor a failure of this assertion in our internal testing up to then.

            Are you using innodb_encrypt_tables or scrubbing? The latter was broken until MDEV-8139 was fixed in MariaDB Server 10.5.5. If you are using neither of these, I would be keen to shift the blame on a bug in the file system or in the underlying storage.

            marko Marko Mäkelä added a comment - The crash should have been fixed in MDEV-13542 . The cause of the corruption is somewhat of a mystery. It appears that a data page from the wrong tablespace has been written to a file. We added some debug assertion to catch an incorrect write, but that assertion has never fired in our internal testing. Up to MariaDB Server 10.4, the check in fil_io() looks like this: ut_ad(!req_type.is_write() || page_id.space() == SRV_LOG_SPACE_FIRST_ID || !fil_is_user_tablespace_id(page_id.space()) || offset == page_id.page_no() * zip_size); In MDEV-23855 (MariaDB Server 10.5.7), the function was renamed to fil_space_t::io() , and it looks like this assertion was removed. We never encountered the symptoms nor a failure of this assertion in our internal testing up to then. Are you using innodb_encrypt_tables or scrubbing? The latter was broken until MDEV-8139 was fixed in MariaDB Server 10.5.5. If you are using neither of these, I would be keen to shift the blame on a bug in the file system or in the underlying storage.
            euglorg Eugene added a comment - - edited

            Hello Marko,

            Neither innodb_encrypt_tables nore scrubbing are used.

            This is not underlying storage (as same hardware is used on all the servers, and none of them has issues until you try to run backup with mysqldump. I moved the backup role across the servers, the behavior is the same: mariadb is stable while you are not performing backups, but in case backups are performed, the node crashes within few days. Always during dumping the data.
            And the query reported to be executing in the moment of crash is usually some custom select (not the mysqldump itself).

            I'll try to replace xfs with ext4fs and see whether behavior will change...

            Just for the case... Such a behavior was also noticed on the node running kernel 5.10, too. So this is not the kernel. And it never happened before mariadb-10.4.25

            euglorg Eugene added a comment - - edited Hello Marko, Neither innodb_encrypt_tables nore scrubbing are used. This is not underlying storage (as same hardware is used on all the servers, and none of them has issues until you try to run backup with mysqldump. I moved the backup role across the servers, the behavior is the same: mariadb is stable while you are not performing backups, but in case backups are performed, the node crashes within few days. Always during dumping the data. And the query reported to be executing in the moment of crash is usually some custom select (not the mysqldump itself). I'll try to replace xfs with ext4fs and see whether behavior will change... Just for the case... Such a behavior was also noticed on the node running kernel 5.10, too. So this is not the kernel. And it never happened before mariadb-10.4.25

            euglorg, if you are using mariadb-backup or file system snapshots for backups, then I believe that there are bugs in it, at least MDEV-29943 and MDEV-21403. MariaDB 10.5 and later releases are not affected by those, thanks to the new redo log record format (MDEV-12353) and the rewritten recovery logic. Possibly we can also thank the I/O layer refactoring for this.

            You should be aware that once something is corrupted, the corruption does not usually heal by itself. Physical backups (as opposed to logical SQL dumps) will propagate such corruption. Sometimes, corruption can be healed by rebuilding the table, for example, by OPTIMIZE TABLE. I would recommend that you rebuild all your data from a SQL dump, to get into a known good state.

            Since you mention Galera, there are some known problems with its snapshot transfer, both with wsrep_sst_method=mariabackup (see the above recovery bugs) and with wsrep_sst_method=rsync. If you ever initiated Galera SST with wsrep_sst_method=rsync (the default setting) from a source node that is older than 10.4.25, a likely explanation of the corruption is that writes were actually not blocked during the snapshot transfer. Even after the snapshot transfer was refactored in MDEV-24845, some failures still occasionally occur on debug builds in our internal testing.

            marko Marko Mäkelä added a comment - euglorg , if you are using mariadb-backup or file system snapshots for backups, then I believe that there are bugs in it, at least MDEV-29943 and MDEV-21403 . MariaDB 10.5 and later releases are not affected by those, thanks to the new redo log record format ( MDEV-12353 ) and the rewritten recovery logic. Possibly we can also thank the I/O layer refactoring for this. You should be aware that once something is corrupted, the corruption does not usually heal by itself. Physical backups (as opposed to logical SQL dumps) will propagate such corruption. Sometimes, corruption can be healed by rebuilding the table, for example, by OPTIMIZE TABLE . I would recommend that you rebuild all your data from a SQL dump, to get into a known good state. Since you mention Galera, there are some known problems with its snapshot transfer, both with wsrep_sst_method=mariabackup (see the above recovery bugs) and with wsrep_sst_method=rsync . If you ever initiated Galera SST with wsrep_sst_method=rsync (the default setting) from a source node that is older than 10.4.25, a likely explanation of the corruption is that writes were actually not blocked during the snapshot transfer. Even after the snapshot transfer was refactored in MDEV-24845 , some failures still occasionally occur on debug builds in our internal testing.
            euglorg Eugene added a comment -

            Hello Marko.

            Thank you for advise.
            We never use different versions (even minor releases) in the cluster, always trying to keep them same.
            The corruption was logged for different tables. But it seems that you was right and there's a problem with running mariadb on xfs: since change to ext4fs (a week ago) there was no corruption listed (donor was very same while performing SST) and no assertion failure logged. If it will run for one more week stable, I would be able to say that this is the second problem in fact indicating that mariadb being not fully compatible with xfs (first is compression - the very same dataset we have consumes 3TB on ext4fs but 4TB on xfs).
            However, as ext4fs is started to be treated as legacy by modern distros (thus, will require module loaded or kernel recompiled in nearest future to use it), there are two more questions:

            • what filesystem is recommended to be used with mariadb (except ext4fs) and
            • how to get details on what's happening while running on xfs? This should be treated as a bug for either mariadb or xfs (or both?)
            euglorg Eugene added a comment - Hello Marko. Thank you for advise. We never use different versions (even minor releases) in the cluster, always trying to keep them same. The corruption was logged for different tables. But it seems that you was right and there's a problem with running mariadb on xfs: since change to ext4fs (a week ago) there was no corruption listed (donor was very same while performing SST) and no assertion failure logged. If it will run for one more week stable, I would be able to say that this is the second problem in fact indicating that mariadb being not fully compatible with xfs (first is compression - the very same dataset we have consumes 3TB on ext4fs but 4TB on xfs). However, as ext4fs is started to be treated as legacy by modern distros (thus, will require module loaded or kernel recompiled in nearest future to use it), there are two more questions: what filesystem is recommended to be used with mariadb (except ext4fs) and how to get details on what's happening while running on xfs? This should be treated as a bug for either mariadb or xfs (or both?)

            People

              marko Marko Mäkelä
              euglorg Eugene
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.