Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.4.25, 10.4.26
-
Gentoo linux 5.18 and newer, AMD EPYC 7451 CPU
Description
The issue appeared after update to 10.4.25.
While running musqldump for backup (recurses over all the databases present in the dataset), mysqld (a member of galera-26.4.12 cluster) started to crash with symptoms like this:
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=16968115, page number=3268], should be [page id: space=1668204, page number=37092] |
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=16968115, page number=3268], should be [page id: space=1668204, page number=37092] |
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=16968115, page number=3268], should be [page id: space=1668204, page number=37092] |
with following
0x7e5748b7d640 InnoDB: Assertion failure in file /var/tmp/portage/dev-db/mariadb-10.4.26/work/mysql/storage/innobase/btr/btr0pcur.cc line 532 |
InnoDB: Failing assertion: btr_page_get_prev(next_page) == btr_pcur_get_block(cursor)->page.id.page_no()
|
InnoDB: We intentionally generate a memory trap.
|
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ |
and exiting with signal 6.
This doesn't happen on every backup execution: after umping whole dataset successfully for 4-5 times a crash like this happens.
The server has enough free memory to dump the data. Happened on different servers, however, two conditions were the same: mariadb version and running mysqldump execution (backup).
The behavior looks like serious bug. Whole log can be found attached.
Is there any possible workaround for this issue?
Attachments
Issue Links
- relates to
-
MDEV-13542 Crashing on a corrupted page is unhelpful
-
- Closed
-
-
MDEV-19871 Add page id matching check in innochecksum tool
-
- Closed
-
-
MDEV-21109 Table corruption not detected with CHECK TABLE or innochecksum, only with mariabackup
-
- Closed
-
The crash should have been fixed in
MDEV-13542.The cause of the corruption is somewhat of a mystery. It appears that a data page from the wrong tablespace has been written to a file. We added some debug assertion to catch an incorrect write, but that assertion has never fired in our internal testing. Up to MariaDB Server 10.4, the check in fil_io() looks like this:
ut_ad(!req_type.is_write()
|| page_id.space() == SRV_LOG_SPACE_FIRST_ID
|| !fil_is_user_tablespace_id(page_id.space())
|| offset == page_id.page_no() * zip_size);
In
MDEV-23855(MariaDB Server 10.5.7), the function was renamed to fil_space_t::io(), and it looks like this assertion was removed. We never encountered the symptoms nor a failure of this assertion in our internal testing up to then.Are you using innodb_encrypt_tables or scrubbing? The latter was broken until
MDEV-8139was fixed in MariaDB Server 10.5.5. If you are using neither of these, I would be keen to shift the blame on a bug in the file system or in the underlying storage.