Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32552

Write-ahead logging is broken for freed pages

Details

    Description

      MDEV-15528 implemented an optimization: pages that have been marked as freed will not be written back to data files. Unfortunately, this optimization violates the write-ahead logging protocol, because we fail to ensure that all log up to the freeing of the page has been durably written before we elide the write of the data page.

      If InnoDB is killed before the log for freeing a page was durably written, crash recovery could fail because it would read an older version of the data page and attempt to apply log records that are for a newer version of the page.

      We can observe some occasional recovery test failures that could be explained by this, such as the following:

      10.6 56c9b0bca0576985c31f20b46dcb060a

      atomic.alter_table 'innodb'              w1 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2023-01-10 11:25:39
      line
      2023-01-10 11:24:24 0 [ERROR] InnoDB: Corrupted page [page id: space=228, page number=0] of datafile './test/t1.ibd' could not be found in the doublewrite buffer.
      

      The scenario that I have in mind would be fixed by making buf_page_free() mark the freed block as modified in the mini-transaction, and buf_flush_page() check that everything up to the FIL_PAGE_LSN of the page has been durably written to the redo log. After first change, that field would be updated by buf_flush_note_modification(), which is invoked by mtr_t::commit().

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            In 10.6, buf_page_free() invokes mtr->memo_push(block, MTR_MEMO_PAGE_X_MODIFY) ever since MDEV-29374 was fixed. But the check in buf_page_t::flush() is missing.

            marko Marko Mäkelä added a comment - In 10.6, buf_page_free() invokes mtr->memo_push(block, MTR_MEMO_PAGE_X_MODIFY) ever since MDEV-29374 was fixed. But the check in buf_page_t::flush() is missing.
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2023-10-23 14:29:41.0 2023-10-23 14:29:41.161
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5.23 [ 29012 ]
            Fix Version/s 10.6.16 [ 29014 ]
            Fix Version/s 10.10.7 [ 29018 ]
            Fix Version/s 10.11.6 [ 29020 ]
            Fix Version/s 11.0.4 [ 29021 ]
            Fix Version/s 11.1.3 [ 29023 ]
            Fix Version/s 11.2.2 [ 29035 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            Description MDEV-15528 implemented an optimization: pages that have been marked as freed will not be written back to data files. Unfortunately, this optimization violates the write-ahead logging protocol, because we fail to ensure that all log up to the freeing of the page has been durably written before we elide the write of the data page.

            If InnoDB is killed before the log for freeing a page was durably written, crash recovery could fail because it would read an older version of the data page and attempt to apply log records that are for a newer version of the page.

            We can some occasional recovery test failures that could be explained by this, such as the following:
            {noformat:title=10.6 56c9b0bca0576985c31f20b46dcb060a}
            atomic.alter_table 'innodb' w1 [ fail ] Found warnings/errors in server log file!
                    Test ended at 2023-01-10 11:25:39
            line
            2023-01-10 11:24:24 0 [ERROR] InnoDB: Corrupted page [page id: space=228, page number=0] of datafile './test/t1.ibd' could not be found in the doublewrite buffer.
            {noformat}
            The scenario that I have in mind would be fixed by making {{buf_page_free()}} mark the freed block as modified in the mini-transaction, and {{buf_flush_page()}} check that everything up to the {{FIL_PAGE_LSN}} of the page has been durably written to the redo log. After first change, that field would be updated by {{buf_flush_note_modification()}}, which is invoked by {{mtr_t::commit()}}.
            MDEV-15528 implemented an optimization: pages that have been marked as freed will not be written back to data files. Unfortunately, this optimization violates the write-ahead logging protocol, because we fail to ensure that all log up to the freeing of the page has been durably written before we elide the write of the data page.

            If InnoDB is killed before the log for freeing a page was durably written, crash recovery could fail because it would read an older version of the data page and attempt to apply log records that are for a newer version of the page.

            We can observe some occasional recovery test failures that could be explained by this, such as the following:
            {noformat:title=10.6 56c9b0bca0576985c31f20b46dcb060a}
            atomic.alter_table 'innodb' w1 [ fail ] Found warnings/errors in server log file!
                    Test ended at 2023-01-10 11:25:39
            line
            2023-01-10 11:24:24 0 [ERROR] InnoDB: Corrupted page [page id: space=228, page number=0] of datafile './test/t1.ibd' could not be found in the doublewrite buffer.
            {noformat}
            The scenario that I have in mind would be fixed by making {{buf_page_free()}} mark the freed block as modified in the mini-transaction, and {{buf_flush_page()}} check that everything up to the {{FIL_PAGE_LSN}} of the page has been durably written to the redo log. After first change, that field would be updated by {{buf_flush_note_modification()}}, which is invoked by {{mtr_t::commit()}}.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.