Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34830

LSN in the future is not being treated as serious corruption

Details

    Description

      InnoDB crash recovery is suppressing error messages about FIL_PAGE_LSN being in the future (recv_lsn_checks_on) and instead implementing a partly duplicated logic (recv_max_page_lsn). At least ever since the recovery was refactored in MDEV-29911, we’d better always enable these checks.

      Furthermore, InnoDB fails to treat a page as corrupted when it has a valid checksum but the FIL_PAGE_LSN is too large. This could lead to the database becoming increasingly more corrupted.

      Attachments

        Issue Links

          Activity

            In a multi-batch recovery (typically, when innodb_buffer_pool_size is reduced between the time the server was killed and restarted, as in the test innodb.recovery_memory), we will start reading pages and applying log before the final LSN will be known. In this case, we will need to maintain something similar to recv_max_page_lsn so that at the end of the recovery, we can check if any page LSN was in the future.

            A single-batch recovery would be fine with a straightforward strict check:

            diff --git a/mysql-test/suite/innodb/t/recovery_memory.test b/mysql-test/suite/innodb/t/recovery_memory.test
            index 06101377f10..dbebe7c3091 100644
            --- a/mysql-test/suite/innodb/t/recovery_memory.test
            +++ b/mysql-test/suite/innodb/t/recovery_memory.test
            @@ -37,7 +37,7 @@ let $restart_parameters=--innodb_buffer_pool_size=5242880 --debug_dbug=+d,ibuf_i
             }
             if (!$have_debug) {
             --echo SET DEBUG_DBUG="+d,ib_log_checkpoint_avoid_hard";
            -let $restart_parameters=--innodb_buffer_pool_size=5242880;
            +let $restart_parameters=--innodb_buffer_pool_size=15242880;
             }
             CREATE TABLE t1(f1 INT NOT NULL)ENGINE=InnoDB;
             INSERT INTO t1 SELECT * FROM seq_1_to_65536;
            

            marko Marko Mäkelä added a comment - In a multi-batch recovery (typically, when innodb_buffer_pool_size is reduced between the time the server was killed and restarted, as in the test innodb.recovery_memory ), we will start reading pages and applying log before the final LSN will be known. In this case, we will need to maintain something similar to recv_max_page_lsn so that at the end of the recovery, we can check if any page LSN was in the future. A single-batch recovery would be fine with a straightforward strict check: diff --git a/mysql-test/suite/innodb/t/recovery_memory.test b/mysql-test/suite/innodb/t/recovery_memory.test index 06101377f10..dbebe7c3091 100644 --- a/mysql-test/suite/innodb/t/recovery_memory.test +++ b/mysql-test/suite/innodb/t/recovery_memory.test @@ -37,7 +37,7 @@ let $restart_parameters=--innodb_buffer_pool_size=5242880 --debug_dbug=+d,ibuf_i } if (!$have_debug) { --echo SET DEBUG_DBUG="+d,ib_log_checkpoint_avoid_hard"; -let $restart_parameters=--innodb_buffer_pool_size=5242880; +let $restart_parameters=--innodb_buffer_pool_size=15242880; } CREATE TABLE t1(f1 INT NOT NULL)ENGINE=InnoDB; INSERT INTO t1 SELECT * FROM seq_1_to_65536;

            I realized that before recovery starts to read data pages (including the doublewrite buffer), it will have updated log_sys.lsn to the recovered LSN, a.k.a. recv_sys.scanned_lsn. Therefore, the FIL_PAGE_LSN validation can be simplified even further.

            marko Marko Mäkelä added a comment - I realized that before recovery starts to read data pages (including the doublewrite buffer), it will have updated log_sys.lsn to the recovered LSN, a.k.a. recv_sys.scanned_lsn . Therefore, the FIL_PAGE_LSN validation can be simplified even further.

            A failure of the test innodb.doublewrite hints that MDEV-25909 may have to be fixed. That is, fil_ibd_load() had better not attempt any doublewrite recovery before all the log has been parsed.

            #6  0x0000563393d5de45 in recv_dblwr_t::find_page (this=0x563394c3dff0 <recv_sys+2416>, page_id={m_id = 0x500000000}, space=0x0, tmp_buf=0x0) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4846
            #7  0x0000563393d5e506 in recv_dblwr_t::restore_first_page (this=0x563394c3dff0 <recv_sys+2416>, space_id=0x5, name=0x5633a0a28490 "./test/t1.ibd", file={m_file = 0x9})
                at /mariadb/10.6/storage/innobase/log/log0recv.cc:4941
            #8  0x000056339405c154 in Datafile::validate_for_recovery (this=0x7fff056d2d00) at /mariadb/10.6/storage/innobase/fsp/fsp0file.cc:463
            #9  0x000056339403c98b in fil_ibd_load (space_id=0x5, filename=0x7fff056d31d0 "./test/t1.ibd", space=@0x7fff056d3188: 0x0) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2576
            #10 0x0000563393d4d367 in fil_name_process (name=0x7f03ec200131 "./test/t1.ibd", len=0xd, space_id=0x5, ftype=FILE_MODIFY, lsn=0x1256e, store=STORE_YES) at /mariadb/10.6/storage/innobase/log/log0recv.cc:1316
            #11 0x0000563393d5457f in recv_sys_t::parse (this=0x563394c3d680 <recv_sys>, checkpoint_lsn=0x12441, store=0x7fff056d535c, apply=0x1) at /mariadb/10.6/storage/innobase/log/log0recv.cc:2898
            #12 0x0000563393d5a1ee in recv_scan_log_recs (store=0x7fff056d535c, log_block=0x7f03ecb00a00 "", checkpoint_lsn=0x12441, start_lsn=0x12400, end_lsn=0x13000, contiguous_lsn=0x7fff056d5538, 
                group_scanned_lsn=0x563395657108 <log_sys+584>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4066
            #13 0x0000563393d5a626 in recv_group_scan_log_recs (checkpoint_lsn=0x12441, contiguous_lsn=0x7fff056d5538, last_phase=0x0) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4128
            #14 0x0000563393d5cce5 in recv_recovery_from_checkpoint_start (flush_lsn=0x12441) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4572
            #15 0x0000563393ed0a89 in srv_start (create_new_db=0x0) at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1485
            

            marko Marko Mäkelä added a comment - A failure of the test innodb.doublewrite hints that MDEV-25909 may have to be fixed. That is, fil_ibd_load() had better not attempt any doublewrite recovery before all the log has been parsed. #6 0x0000563393d5de45 in recv_dblwr_t::find_page (this=0x563394c3dff0 <recv_sys+2416>, page_id={m_id = 0x500000000}, space=0x0, tmp_buf=0x0) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4846 #7 0x0000563393d5e506 in recv_dblwr_t::restore_first_page (this=0x563394c3dff0 <recv_sys+2416>, space_id=0x5, name=0x5633a0a28490 "./test/t1.ibd", file={m_file = 0x9}) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4941 #8 0x000056339405c154 in Datafile::validate_for_recovery (this=0x7fff056d2d00) at /mariadb/10.6/storage/innobase/fsp/fsp0file.cc:463 #9 0x000056339403c98b in fil_ibd_load (space_id=0x5, filename=0x7fff056d31d0 "./test/t1.ibd", space=@0x7fff056d3188: 0x0) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2576 #10 0x0000563393d4d367 in fil_name_process (name=0x7f03ec200131 "./test/t1.ibd", len=0xd, space_id=0x5, ftype=FILE_MODIFY, lsn=0x1256e, store=STORE_YES) at /mariadb/10.6/storage/innobase/log/log0recv.cc:1316 #11 0x0000563393d5457f in recv_sys_t::parse (this=0x563394c3d680 <recv_sys>, checkpoint_lsn=0x12441, store=0x7fff056d535c, apply=0x1) at /mariadb/10.6/storage/innobase/log/log0recv.cc:2898 #12 0x0000563393d5a1ee in recv_scan_log_recs (store=0x7fff056d535c, log_block=0x7f03ecb00a00 "", checkpoint_lsn=0x12441, start_lsn=0x12400, end_lsn=0x13000, contiguous_lsn=0x7fff056d5538, group_scanned_lsn=0x563395657108 <log_sys+584>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4066 #13 0x0000563393d5a626 in recv_group_scan_log_recs (checkpoint_lsn=0x12441, contiguous_lsn=0x7fff056d5538, last_phase=0x0) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4128 #14 0x0000563393d5cce5 in recv_recovery_from_checkpoint_start (flush_lsn=0x12441) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4572 #15 0x0000563393ed0a89 in srv_start (create_new_db=0x0) at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1485

            I ended up refactoring the doublewrite recovery but not touching the logic that is the topic of MDEV-25909. All recovered pages would be written back to data files in buf_dblwr_t::recover() while we know the final recovered LSN. Before that point, we may consult the doublewrite buffer so that we can determine the tablespace ID and flags (page size) of files whose first page happens to be corrupted. For that purpose, I think that it is acceptable to refer to a possibly too new version in the doublewrite buffer. As a bonus in those code paths, if a page had been read into recv_sys.dblwr.pages, we may skip reading the pages from the data files.

            marko Marko Mäkelä added a comment - I ended up refactoring the doublewrite recovery but not touching the logic that is the topic of MDEV-25909 . All recovered pages would be written back to data files in buf_dblwr_t::recover() while we know the final recovered LSN. Before that point, we may consult the doublewrite buffer so that we can determine the tablespace ID and flags (page size) of files whose first page happens to be corrupted. For that purpose, I think that it is acceptable to refer to a possibly too new version in the doublewrite buffer. As a bonus in those code paths, if a page had been read into recv_sys.dblwr.pages , we may skip reading the pages from the data files.

            One more interesting anomaly, caught by some fault injection tests of the doublewrite buffer, is that fil_node_t::read_page0() was not validating the page checksum or FIL_PAGE_LSN at all, that is, InnoDB could happily open a file whose first page is corrupted. Okay, there is some validation in the Datafile class that is duplicating part of the logic, but I’m not sure if it would be invoked outside crash recovery.

            marko Marko Mäkelä added a comment - One more interesting anomaly, caught by some fault injection tests of the doublewrite buffer, is that fil_node_t::read_page0() was not validating the page checksum or FIL_PAGE_LSN at all, that is, InnoDB could happily open a file whose first page is corrupted. Okay, there is some validation in the Datafile class that is duplicating part of the logic, but I’m not sure if it would be invoked outside crash recovery.

            marko Thanks for the change. I have added some notes in the pull request.

            debarun Debarun Banerjee added a comment - marko Thanks for the change. I have added some notes in the pull request.

            The 10.11 version still needs to be reviewed. It is slightly different due to the changed redo log file format (MDEV-14425) and the removal of FIL_PAGE_FILE_FLUSH_LSN writes (MDEV-27199).

            marko Marko Mäkelä added a comment - The 10.11 version still needs to be reviewed. It is slightly different due to the changed redo log file format ( MDEV-14425 ) and the removal of FIL_PAGE_FILE_FLUSH_LSN writes ( MDEV-27199 ).

            10.11 changes look fine to me.

            debarun Debarun Banerjee added a comment - 10.11 changes look fine to me.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.