Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31827

InnoDB multi-batch recovery stops prematurely

Details

    Description

      In the final testing of MDEV-31791, I observed the following failure:

      11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

      innodb.recovery_memory 'innodb,release'  w4 [ 89 fail ]  Found warnings/errors in server log file!
              Test ended at 2023-08-02 12:21:21
      line
      2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
      2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
      2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=0] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
      2023-08-02 12:21:18 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      

      A larger excerpt of the server error log indicates that we actually scanned more of the log, but for some reason failed to fully apply the final recovery batch and stopped prematurely at LSN 418951162 instead of 421869561:

      2023-08-02 12:21:18 0 [Note] InnoDB: Multi-batch recovery needed at LSN 418950921
      2023-08-02 12:21:18 0 [Note] InnoDB: End of log at LSN=421869561
      2023-08-02 12:21:18 0 [Note] InnoDB: To recover: LSN 418950921/421869561; 106 pages
      2023-08-02 12:21:18 0 [ERROR] InnoDB: We scanned the log up to 418951162. A checkpoint was at 418061736 and the maximum LSN on a database page was 421016108. It is possible that the database is now corrupt!
      2023-08-02 12:21:18 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 15548 row operations to undo
      2023-08-02 12:21:18 0 [Note] InnoDB: Trx id counter is 652795
      2023-08-02 12:21:18 0 [Note] InnoDB: Opened 3 undo tablespaces
      2023-08-02 12:21:18 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
      2023-08-02 12:21:18 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
      2023-08-02 12:21:18 0 [ERROR] InnoDB: Page [page id: space=1, page number=88] log sequence number 421016108 is in the future! Current system log sequence number 418966369.
      ...
      2023-08-02 12:21:19 0 [Note] InnoDB: log sequence number 418951162; transaction id 652796
      

      This might be a regression due to MDEV-29911.

      Attachments

        Issue Links

          Activity

            The LSN check in buf_page_check_lsn() is guarded both by the function parameter check_lsn and by the global variable recv_lsn_checks_on. That variable would be set in recv_recovery_from_checkpoint_start(), after parsing the log.

            In MDEV-29911, the parsing of the log was changed, and recv_recovery_from_checkpoint_start() may no longer assign to log_sys.lsn the latest LSN. In the core dump that I am analyzing, I see log_sys.write_lsn=recv_sys.lsn=272313244, which corresponds to the LSN at which we noticed that multi-batch recovery will be needed. The end of the log was parsed at 276035576.

            I am yet to evaluate whether this also affects the 10.6 version of MDEV-29911, or only 10.9 and later versions.

            marko Marko Mäkelä added a comment - The LSN check in buf_page_check_lsn() is guarded both by the function parameter check_lsn and by the global variable recv_lsn_checks_on . That variable would be set in recv_recovery_from_checkpoint_start() , after parsing the log. In MDEV-29911 , the parsing of the log was changed, and recv_recovery_from_checkpoint_start() may no longer assign to log_sys.lsn the latest LSN. In the core dump that I am analyzing, I see log_sys.write_lsn=recv_sys.lsn=272313244 , which corresponds to the LSN at which we noticed that multi-batch recovery will be needed. The end of the log was parsed at 276035576. I am yet to evaluate whether this also affects the 10.6 version of MDEV-29911 , or only 10.9 and later versions.

            The recovery fails nondeterministically. An attempt to recover a copy of the failed data directory happened to works just fine, at least under rr record:

            2023-08-02 16:43:13 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
            2023-08-02 16:43:13 0 [Note] InnoDB: Completed initialization of buffer pool
            2023-08-02 16:43:13 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
            2023-08-02 16:43:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=271813827
            2023-08-02 16:43:13 0 [Note] InnoDB: Multi-batch recovery needed at LSN 272340327
            2023-08-02 16:43:13 0 [Note] InnoDB: End of log at LSN=276035576
            2023-08-02 16:43:13 0 [Note] InnoDB: To recover: LSN 272340327/276035576; 278 pages
            2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 273245127/276035576; 54 pages
            2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 274181976/276035576; 55 pages
            2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 275118954/276035576; 54 pages
            2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 276008436/276035576; 59 pages
            2023-08-02 16:43:14 0 [Note] InnoDB: To recover: 7 pages
            2023-08-02 16:43:14 0 [Warning] InnoDB: Cannot change innodb_undo_tablespaces=0 because previous shutdown was not with innodb_fast_shutdown=0
            …
            2023-08-02 16:43:14 0 [Note] InnoDB: log sequence number 276035576; transaction id 535920
            

            In the rr replay of that execution, I see that recv_sys.lsn (which would be assigned to the log_sys.lsn before recv_lsn_checks_on is set) was last updated to the correct value here:

            #0  recv_sys_t::parse<recv_buf, true> (
                this=this@entry=0x55dd04a86c40 <recv_sys>, 
                l=@0x7ffffe498a68: {ptr = 0x7f29b0eab3f8 ""}, 
                if_exists=if_exists@entry=true)
                at /mariadb/11/storage/innobase/include/log0log.h:313
            #1  0x000055dd04733452 in recv_sys_t::parse_mtr<true> (if_exists=true)
                at /mariadb/11/storage/innobase/log/log0recv.cc:2996
            #2  recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=true)
                at /mariadb/11/storage/innobase/log/log0recv.cc:3007
            #3  0x000055dd04724f98 in recv_scan_log (last_phase=last_phase@entry=true)
                at /mariadb/11/storage/innobase/log/log0recv.cc:4009
            #4  0x000055dd04725d07 in recv_recovery_from_checkpoint_start ()
                at /mariadb/11/storage/innobase/log/log0recv.cc:4506
            #5  0x000055dd047ba383 in srv_start (create_new_db=false)
                at /mariadb/11/storage/innobase/srv/srv0start.cc:1513
            

            It looks like we were reading the log wrong, hitting premature EOF while re-reading it. In the core dump, I saw recv_sys.lsn=272313244 and recv_sys.scanned_lsn=276035576. I will rerun the test with an additional assertion, hoping to catch this earlier.

            marko Marko Mäkelä added a comment - The recovery fails nondeterministically. An attempt to recover a copy of the failed data directory happened to works just fine, at least under rr record : 2023-08-02 16:43:13 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB 2023-08-02 16:43:13 0 [Note] InnoDB: Completed initialization of buffer pool 2023-08-02 16:43:13 0 [Note] InnoDB: Buffered log writes (block size=512 bytes) 2023-08-02 16:43:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=271813827 2023-08-02 16:43:13 0 [Note] InnoDB: Multi-batch recovery needed at LSN 272340327 2023-08-02 16:43:13 0 [Note] InnoDB: End of log at LSN=276035576 2023-08-02 16:43:13 0 [Note] InnoDB: To recover: LSN 272340327/276035576; 278 pages 2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 273245127/276035576; 54 pages 2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 274181976/276035576; 55 pages 2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 275118954/276035576; 54 pages 2023-08-02 16:43:14 0 [Note] InnoDB: To recover: LSN 276008436/276035576; 59 pages 2023-08-02 16:43:14 0 [Note] InnoDB: To recover: 7 pages 2023-08-02 16:43:14 0 [Warning] InnoDB: Cannot change innodb_undo_tablespaces=0 because previous shutdown was not with innodb_fast_shutdown=0 … 2023-08-02 16:43:14 0 [Note] InnoDB: log sequence number 276035576; transaction id 535920 In the rr replay of that execution, I see that recv_sys.lsn (which would be assigned to the log_sys.lsn before recv_lsn_checks_on is set) was last updated to the correct value here: #0 recv_sys_t::parse<recv_buf, true> ( this=this@entry=0x55dd04a86c40 <recv_sys>, l=@0x7ffffe498a68: {ptr = 0x7f29b0eab3f8 ""}, if_exists=if_exists@entry=true) at /mariadb/11/storage/innobase/include/log0log.h:313 #1 0x000055dd04733452 in recv_sys_t::parse_mtr<true> (if_exists=true) at /mariadb/11/storage/innobase/log/log0recv.cc:2996 #2 recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=true) at /mariadb/11/storage/innobase/log/log0recv.cc:3007 #3 0x000055dd04724f98 in recv_scan_log (last_phase=last_phase@entry=true) at /mariadb/11/storage/innobase/log/log0recv.cc:4009 #4 0x000055dd04725d07 in recv_recovery_from_checkpoint_start () at /mariadb/11/storage/innobase/log/log0recv.cc:4506 #5 0x000055dd047ba383 in srv_start (create_new_db=false) at /mariadb/11/storage/innobase/srv/srv0start.cc:1513 It looks like we were reading the log wrong, hitting premature EOF while re-reading it. In the core dump, I saw recv_sys.lsn=272313244 and recv_sys.scanned_lsn=276035576 . I will rerun the test with an additional assertion, hoping to catch this earlier.

            The attached MDEV-31827-data.tar.xz triggers an assertion failure of the following patch:

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 6f58be6ed35..07887e082c9 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -4662,6 +4662,7 @@ dberr_t recv_recovery_from_checkpoint_start()
             	}
             
             	mysql_mutex_lock(&recv_sys.mutex);
            +	ut_a(recv_sys.scanned_lsn == recv_sys.lsn);
             	recv_sys.apply_log_recs = true;
             	recv_no_ibuf_operations = false;
             	ut_d(recv_no_log_write = srv_operation == SRV_OPERATION_RESTORE
            

            If the datadir is located in /dev/shm and therefore the "fake PMEM" is being used, the dataset will recover fine as far as InnoDB is concerned:

            10.9 8760fe11b89450d56269ed062b36b4238d45707f

            2023-08-02 17:31:00 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
            2023-08-02 17:31:00 0 [Note] InnoDB: Completed initialization of buffer pool
            2023-08-02 17:31:00 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
            2023-08-02 17:31:00 0 [Note] InnoDB: Opened 3 undo tablespaces
            2023-08-02 17:31:00 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
            2023-08-02 17:31:00 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=72811914
            2023-08-02 17:31:00 0 [Note] InnoDB: Multi-batch recovery needed at LSN 73324548
            2023-08-02 17:31:00 0 [Note] InnoDB: End of log at LSN=73844162
            2023-08-02 17:31:00 0 [Note] InnoDB: To recover: LSN 73324548/73844162; 328 pages
            2023-08-02 17:31:00 0 [Note] InnoDB: To recover: LSN 73808457/73844162; 315 pages
            2023-08-02 17:31:00 0 [ERROR] InnoDB: The change buffer is corrupted or has been removed on upgrade to MariaDB 11.0 or later
            2023-08-02 17:31:00 0 [Note] InnoDB: Opened 3 undo tablespaces
            2023-08-02 17:31:00 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
            2023-08-02 17:31:00 0 [Note] InnoDB: To recover: 5 pages
            2023-08-02 17:31:00 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
            

            marko Marko Mäkelä added a comment - The attached MDEV-31827-data.tar.xz triggers an assertion failure of the following patch: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 6f58be6ed35..07887e082c9 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -4662,6 +4662,7 @@ dberr_t recv_recovery_from_checkpoint_start() } mysql_mutex_lock(&recv_sys.mutex); + ut_a(recv_sys.scanned_lsn == recv_sys.lsn); recv_sys.apply_log_recs = true; recv_no_ibuf_operations = false; ut_d(recv_no_log_write = srv_operation == SRV_OPERATION_RESTORE If the datadir is located in /dev/shm and therefore the "fake PMEM" is being used, the dataset will recover fine as far as InnoDB is concerned: 10.9 8760fe11b89450d56269ed062b36b4238d45707f 2023-08-02 17:31:00 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB 2023-08-02 17:31:00 0 [Note] InnoDB: Completed initialization of buffer pool 2023-08-02 17:31:00 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed 2023-08-02 17:31:00 0 [Note] InnoDB: Opened 3 undo tablespaces 2023-08-02 17:31:00 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces 2023-08-02 17:31:00 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=72811914 2023-08-02 17:31:00 0 [Note] InnoDB: Multi-batch recovery needed at LSN 73324548 2023-08-02 17:31:00 0 [Note] InnoDB: End of log at LSN=73844162 2023-08-02 17:31:00 0 [Note] InnoDB: To recover: LSN 73324548/73844162; 328 pages 2023-08-02 17:31:00 0 [Note] InnoDB: To recover: LSN 73808457/73844162; 315 pages 2023-08-02 17:31:00 0 [ERROR] InnoDB: The change buffer is corrupted or has been removed on upgrade to MariaDB 11.0 or later 2023-08-02 17:31:00 0 [Note] InnoDB: Opened 3 undo tablespaces 2023-08-02 17:31:00 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces 2023-08-02 17:31:00 0 [Note] InnoDB: To recover: 5 pages 2023-08-02 17:31:00 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.

            This patch did the trick:

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 6f58be6ed35..d8d976f5e67 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -4147,7 +4147,8 @@ static bool recv_scan_log(bool last_phase)
                 if (recv_sys.is_corrupt_log())
                   break;
             
            -    if (recv_sys.offset < log_sys.get_block_size())
            +    if (recv_sys.offset < log_sys.get_block_size() &&
            +        recv_sys.lsn == recv_sys.scanned_lsn)
                   goto got_eof;
             
                 if (recv_sys.offset > buf_size / 4 ||
            

            The failure scenario was that we did premature goto got_eof when the very first mini-transaction at the "rewind LSN" wraps around the circular ib_logfile0 to byte offset 12288. The mini-transaction started 2048-4 bytes before the end of ib_logfile0. 2048 bytes were read from the end of the file into the parsing buffer, and the recv_sys.offset was 4.

            After applying that patch (and retaining all above mentioned assertions), while testing on /run/user/$UID (tmpfs with pwrite() and pread() based ib_logfile0 access) I only got a timing-related failure that does not affect correctness:

            11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

            innodb.recovery_memory 'innodb,release'  w79 [ 41 fail ]  Found warnings/errors in server log file!
                    Test ended at 2023-08-02 18:21:00
            line
            2023-08-02 18:20:55 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync): 0. 480 OS file reads, 430 OS file writes, 0 OS fsyncs.
            

            I restarted the test with innodb_log_file_size=5m (half the mtr default size), which should improve the failure rate. Got the same non-failure again:

            11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

            innodb.recovery_memory 'innodb,release'  w34 [ 20 fail ]  Found warnings/errors in server log file!
                    Test ended at 2023-08-02 18:40:52
            line
            2023-08-02 18:40:48 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync): 0. 595 OS file reads, 432 OS file writes, 0 OS fsyncs.
            

            I temporarily suppressed that warning (it should eventually be investigated and fixed), hoping to get 100×100 successful test runs with regular file I/O on RAM disk. We should have the results in 30 minutes.

            marko Marko Mäkelä added a comment - This patch did the trick: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 6f58be6ed35..d8d976f5e67 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -4147,7 +4147,8 @@ static bool recv_scan_log(bool last_phase) if (recv_sys.is_corrupt_log()) break; - if (recv_sys.offset < log_sys.get_block_size()) + if (recv_sys.offset < log_sys.get_block_size() && + recv_sys.lsn == recv_sys.scanned_lsn) goto got_eof; if (recv_sys.offset > buf_size / 4 || The failure scenario was that we did premature goto got_eof when the very first mini-transaction at the "rewind LSN" wraps around the circular ib_logfile0 to byte offset 12288. The mini-transaction started 2048-4 bytes before the end of ib_logfile0 . 2048 bytes were read from the end of the file into the parsing buffer, and the recv_sys.offset was 4. After applying that patch (and retaining all above mentioned assertions), while testing on /run/user/$UID (tmpfs with pwrite() and pread() based ib_logfile0 access) I only got a timing-related failure that does not affect correctness: 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches innodb.recovery_memory 'innodb,release' w79 [ 41 fail ] Found warnings/errors in server log file! Test ended at 2023-08-02 18:21:00 line 2023-08-02 18:20:55 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync): 0. 480 OS file reads, 430 OS file writes, 0 OS fsyncs. I restarted the test with innodb_log_file_size=5m (half the mtr default size), which should improve the failure rate. Got the same non-failure again: 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches innodb.recovery_memory 'innodb,release' w34 [ 20 fail ] Found warnings/errors in server log file! Test ended at 2023-08-02 18:40:52 line 2023-08-02 18:40:48 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync): 0. 595 OS file reads, 432 OS file writes, 0 OS fsyncs. I temporarily suppressed that warning (it should eventually be investigated and fixed), hoping to get 100×100 successful test runs with regular file I/O on RAM disk. We should have the results in 30 minutes.

            The following test with the patch and the suppressed warning was successful, on /run/user/$UID:

            ./mtr --mysqld=--innodb-log-file-size=5m --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}
            

            11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches

            …
            innodb.recovery_memory 'innodb,release'  w11 [ 100 pass ]  14533
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 158268.156 of 1651 seconds executing testcases
             
            Completed: All 10000 tests were successful.
            

            marko Marko Mäkelä added a comment - The following test with the patch and the suppressed warning was successful, on /run/user/$UID : . /mtr --mysqld=--innodb-log- file -size=5m --parallel=100 --repeat=100 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,} 11.2 e81fa345020ec6a067583db6a7019d6404b26f93 with patches … innodb.recovery_memory 'innodb,release' w11 [ 100 pass ] 14533 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 158268.156 of 1651 seconds executing testcases   Completed: All 10000 tests were successful.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.