[MDEV-30397] InnoDB crash due to DB_FAIL reported for a corrupted page - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL)
Fix Version/s: 10.7.8, 11.0.1, 10.6.13, 10.8.8, 10.9.6, 10.10.4, 10.11.3
Component/s: Storage Engine - InnoDB
Labels:
None
Environment:
windows 10 pro 22H2 64-bit 19045.2364

Description

I can connect to MariaDB with HeidiSQL and MySqlConnector, but when I try to view table data (all tables InnoDB) or make a query the MariaDB service crashes.

I get a popup: "SQL Error (2013): Lost connection to the MySQL server during query."

From the .err log

2023-01-12 17:33:30 3 [ERROR] [FATAL] InnoDB: Unknown error Failed, retry may succeed
230112 17:33:30 [ERROR] mysqld got exception 0x80000003 ;

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

DX-APP06.err
2023-01-12 22:47
5 kB
Richard Green
my.ini
2023-01-12 22:47
0.2 kB
Richard Green
mysqld.dmp
2023-01-12 22:47
77 kB
Richard Green

Issue Links

relates to

MDEV-13542 Crashing on a corrupted page is unhelpful

Closed

MDEV-30598 Mariadb crashes

Closed

Activity

Ascending order - Click to sort in descending order

View 8 older comments

Marko Mäkelä added a comment - 2023-01-19 15:01

I do not see how the change buffer code could return DB_FAIL up to the caller. The cause of this crash remains a mystery.

Marko Mäkelä added a comment - 2023-01-19 15:01 I do not see how the change buffer code could return DB_FAIL up to the caller. The cause of this crash remains a mystery.

Marko Mäkelä added a comment - 2023-02-08 15:36

I can reproduce this behaviour on 10.6 by modifying buf_LRU_free_page() so that it would attempt to evict modified pages of temporary tables. There obviously is some flaw with that modification, but it would reproduce this error in a couple of tests that exercise temporary tables. In other words, we are attempting some operation on a corrupted page, getting DB_FAIL and not handling it in a consistent fashion (I suppose, by initiating a rollback).

Marko Mäkelä added a comment - 2023-02-08 15:36 I can reproduce this behaviour on 10.6 by modifying buf_LRU_free_page() so that it would attempt to evict modified pages of temporary tables. There obviously is some flaw with that modification, but it would reproduce this error in a couple of tests that exercise temporary tables. In other words, we are attempting some operation on a corrupted page, getting DB_FAIL and not handling it in a consistent fashion (I suppose, by initiating a rollback).

Marko Mäkelä added a comment - 2023-02-08 16:05

In my case (caused by a buggy code change that I am working on), we are reading a page that has been filled by NUL bytes, and returning DB_FAIL due to that. Other corruption detected by that function results in a different error code:

diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc

index 5c81e34856b..7b18906f395 100644

--- a/storage/innobase/buf/buf0buf.cc

+++ b/storage/innobase/buf/buf0buf.cc

@@ -3600,7 +3600,7 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node)

     else if (read_id == page_id_t(0, 0))

       /* This is likely an uninitialized (all-zero) page. */

-      err= DB_FAIL;

+      err= DB_PAGE_CORRUPTED;

       goto release_page;

     else if (!node.space->full_crc32() &&

Marko Mäkelä added a comment - 2023-02-08 16:05 In my case (caused by a buggy code change that I am working on), we are reading a page that has been filled by NUL bytes, and returning DB_FAIL due to that. Other corruption detected by that function results in a different error code: diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc index 5c81e34856b..7b18906f395 100644 --- a/storage/innobase/buf/buf0buf.cc +++ b/storage/innobase/buf/buf0buf.cc @@ -3600,7 +3600,7 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node) else if (read_id == page_id_t(0, 0)) { /* This is likely an uninitialized (all-zero) page. */ - err= DB_FAIL; + err= DB_PAGE_CORRUPTED; goto release_page; } else if (!node.space->full_crc32() &&

Marko Mäkelä added a comment - 2023-02-08 16:22

The DB_FAIL return value was added to that function in ~~MDEV-13542~~. I would not call this bug a regression of that, but an omission of that fix. Before ~~MDEV-13542~~, we would typically simply crash due to the corrupted page.

Marko Mäkelä added a comment - 2023-02-08 16:22 The DB_FAIL return value was added to that function in MDEV-13542 . I would not call this bug a regression of that, but an omission of that fix. Before MDEV-13542 , we would typically simply crash due to the corrupted page.

Marko Mäkelä added a comment - 2023-02-09 07:58

The special return value is needed so that fil_aio_callback() can avoid reporting an error when read-ahead is covering an unallocated page. The error code needs to be mapped for synchronous reads, in buf_read_page_low().

Marko Mäkelä added a comment - 2023-02-09 07:58 The special return value is needed so that fil_aio_callback() can avoid reporting an error when read-ahead is covering an unallocated page. The error code needs to be mapped for synchronous reads, in buf_read_page_low() .

People

Assignee:: Marko Mäkelä

Reporter:: Richard Green

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2023-01-12 22:48

Updated:: 2023-03-02 09:43

Resolved:: 2023-02-16 14:04

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration