Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30397

InnoDB crash due to DB_FAIL reported for a corrupted page

Details

    Description

      I can connect to MariaDB with HeidiSQL and MySqlConnector, but when I try to view table data (all tables InnoDB) or make a query the MariaDB service crashes.

      I get a popup: "SQL Error (2013): Lost connection to the MySQL server during query."

      From the .err log

      2023-01-12 17:33:30 3 [ERROR] [FATAL] InnoDB: Unknown error Failed, retry may succeed
      230112 17:33:30 [ERROR] mysqld got exception 0x80000003 ;

      Attachments

        1. DX-APP06.err
          5 kB
          Richard Green
        2. my.ini
          0.2 kB
          Richard Green
        3. mysqld.dmp
          77 kB
          Richard Green

        Issue Links

          Activity

            I do not see how the change buffer code could return DB_FAIL up to the caller. The cause of this crash remains a mystery.

            marko Marko Mäkelä added a comment - I do not see how the change buffer code could return DB_FAIL up to the caller. The cause of this crash remains a mystery.

            I can reproduce this behaviour on 10.6 by modifying buf_LRU_free_page() so that it would attempt to evict modified pages of temporary tables. There obviously is some flaw with that modification, but it would reproduce this error in a couple of tests that exercise temporary tables. In other words, we are attempting some operation on a corrupted page, getting DB_FAIL and not handling it in a consistent fashion (I suppose, by initiating a rollback).

            marko Marko Mäkelä added a comment - I can reproduce this behaviour on 10.6 by modifying buf_LRU_free_page() so that it would attempt to evict modified pages of temporary tables. There obviously is some flaw with that modification, but it would reproduce this error in a couple of tests that exercise temporary tables. In other words, we are attempting some operation on a corrupted page, getting DB_FAIL and not handling it in a consistent fashion (I suppose, by initiating a rollback).

            In my case (caused by a buggy code change that I am working on), we are reading a page that has been filled by NUL bytes, and returning DB_FAIL due to that. Other corruption detected by that function results in a different error code:

            diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
            index 5c81e34856b..7b18906f395 100644
            --- a/storage/innobase/buf/buf0buf.cc
            +++ b/storage/innobase/buf/buf0buf.cc
            @@ -3600,7 +3600,7 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node)
                 else if (read_id == page_id_t(0, 0))
                 {
                   /* This is likely an uninitialized (all-zero) page. */
            -      err= DB_FAIL;
            +      err= DB_PAGE_CORRUPTED;
                   goto release_page;
                 }
                 else if (!node.space->full_crc32() &&
            

            marko Marko Mäkelä added a comment - In my case (caused by a buggy code change that I am working on), we are reading a page that has been filled by NUL bytes, and returning DB_FAIL due to that. Other corruption detected by that function results in a different error code: diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc index 5c81e34856b..7b18906f395 100644 --- a/storage/innobase/buf/buf0buf.cc +++ b/storage/innobase/buf/buf0buf.cc @@ -3600,7 +3600,7 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node) else if (read_id == page_id_t(0, 0)) { /* This is likely an uninitialized (all-zero) page. */ - err= DB_FAIL; + err= DB_PAGE_CORRUPTED; goto release_page; } else if (!node.space->full_crc32() &&

            The DB_FAIL return value was added to that function in MDEV-13542. I would not call this bug a regression of that, but an omission of that fix. Before MDEV-13542, we would typically simply crash due to the corrupted page.

            marko Marko Mäkelä added a comment - The DB_FAIL return value was added to that function in MDEV-13542 . I would not call this bug a regression of that, but an omission of that fix. Before MDEV-13542 , we would typically simply crash due to the corrupted page.

            The special return value is needed so that fil_aio_callback() can avoid reporting an error when read-ahead is covering an unallocated page. The error code needs to be mapped for synchronous reads, in buf_read_page_low().

            marko Marko Mäkelä added a comment - The special return value is needed so that fil_aio_callback() can avoid reporting an error when read-ahead is covering an unallocated page. The error code needs to be mapped for synchronous reads, in buf_read_page_low() .

            People

              marko Marko Mäkelä
              rgsilver Richard Green
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.