Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32971

Assertion !recv_sys.is_corrupt_fs() failed on recovery

Details

    Description

      An encryption test failed on a mandatory builder:

      10.5 c8346c0bacfdbe3fd61c67becff9934e75e08ed3

      encryption.innodb-redo-nokeys 'ctr,innodb' w12 [ fail ]
      ...
      2023-12-08  7:46:08 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=53205,1435867
      2023-12-08  7:46:08 0 [ERROR] InnoDB: Recovery cannot access file ./test/t1.ibd (tablespace 5)
      2023-12-08  7:46:08 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
      mariadbd: /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/log/log0recv.cc:3510: dberr_t recv_recovery_from_checkpoint_start(lsn_t): Assertion `!recv_sys.found_corrupt_fs' failed.
      

      This is reproducible also when attempting to start up 10.6 on a copy of the data directory (data.tar.xz).
      The code path that outputs the error message when processing a FILE_MODIFY record will also set recv_sys.found_corrupt_fs. This is triggered by a fil_ibd_load() return value FIL_LOAD_INVALID because of missing encryption information:

      	if (crypt_data && !crypt_data->is_key_found()) {
      		crypt_data->~fil_space_crypt_t();
      		ut_free(crypt_data);
      		return FIL_LOAD_INVALID;
      	}
      

      In 10.6 the code is different:

      	if (crypt_data && !fil_crypt_check(crypt_data, filename)) {
      		return FIL_LOAD_INVALID;
      	}
      

      All we need to do is to relax the too strict debug assertion:

      diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
      index d52a62cec99..05120871b0a 100644
      --- a/storage/innobase/log/log0recv.cc
      +++ b/storage/innobase/log/log0recv.cc
      @@ -3507,7 +3507,7 @@ recv_recovery_from_checkpoint_start(lsn_t flush_lsn)
       	recv_group_scan_log_recs(checkpoint_lsn, &contiguous_lsn, false);
       	/* The first scan should not have stored or applied any records. */
       	ut_ad(recv_sys.pages.empty());
      -	ut_ad(!recv_sys.found_corrupt_fs);
      +	ut_ad(!recv_sys.found_corrupt_fs || !srv_force_recovery);
       
       	if (srv_read_only_mode && recv_needed_recovery) {
       		mysql_mutex_unlock(&log_sys.mutex);
      

      In this way, the server startup will fail gracefully, just like it is expected by the test:

      10.5 c8346c0bacfdbe3fd61c67becff9934e75e08ed3 with patch

      2023-12-08 10:47:50 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=53205,1435867
      2023-12-08 10:47:50 0 [ERROR] InnoDB: Recovery cannot access file ./test/t1.ibd (tablespace 5)
      2023-12-08 10:47:50 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
      2023-12-08 10:47:50 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT at 1435867 between the checkpoint 53205 and the end 1501184.
      2023-12-08 10:47:50 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error
      2023-12-08 10:47:50 0 [Note] InnoDB: Starting shutdown...
      2023-12-08 10:47:51 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      

      In 10.8, the log format was changed and the logic around FILE_CHECKPOINT was rewritten in MDEV-14425. We would still invoke fil_name_process() on the FILE_MODIFY records, and recv_sys.found_corrupt_fs could be set for a similar reason, but there is no assertion failure in recv_recovery_from_checkpoint_start() after the initial call to recv_scan_log(). In our CI systems, I only found this type of failure on the 10.5 and 10.6 branches.

      Attachments

        Issue Links

          Activity

            There are no comments yet on this issue.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.