Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32971

Assertion !recv_sys.is_corrupt_fs() failed on recovery

    XMLWordPrintable

Details

    Description

      An encryption test failed on a mandatory builder:

      10.5 c8346c0bacfdbe3fd61c67becff9934e75e08ed3

      encryption.innodb-redo-nokeys 'ctr,innodb' w12 [ fail ]
      ...
      2023-12-08  7:46:08 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=53205,1435867
      2023-12-08  7:46:08 0 [ERROR] InnoDB: Recovery cannot access file ./test/t1.ibd (tablespace 5)
      2023-12-08  7:46:08 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
      mariadbd: /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/log/log0recv.cc:3510: dberr_t recv_recovery_from_checkpoint_start(lsn_t): Assertion `!recv_sys.found_corrupt_fs' failed.
      

      This is reproducible also when attempting to start up 10.6 on a copy of the data directory (data.tar.xz).
      The code path that outputs the error message when processing a FILE_MODIFY record will also set recv_sys.found_corrupt_fs. This is triggered by a fil_ibd_load() return value FIL_LOAD_INVALID because of missing encryption information:

      	if (crypt_data && !crypt_data->is_key_found()) {
      		crypt_data->~fil_space_crypt_t();
      		ut_free(crypt_data);
      		return FIL_LOAD_INVALID;
      	}
      

      In 10.6 the code is different:

      	if (crypt_data && !fil_crypt_check(crypt_data, filename)) {
      		return FIL_LOAD_INVALID;
      	}
      

      All we need to do is to relax the too strict debug assertion:

      diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
      index d52a62cec99..05120871b0a 100644
      --- a/storage/innobase/log/log0recv.cc
      +++ b/storage/innobase/log/log0recv.cc
      @@ -3507,7 +3507,7 @@ recv_recovery_from_checkpoint_start(lsn_t flush_lsn)
       	recv_group_scan_log_recs(checkpoint_lsn, &contiguous_lsn, false);
       	/* The first scan should not have stored or applied any records. */
       	ut_ad(recv_sys.pages.empty());
      -	ut_ad(!recv_sys.found_corrupt_fs);
      +	ut_ad(!recv_sys.found_corrupt_fs || !srv_force_recovery);
       
       	if (srv_read_only_mode && recv_needed_recovery) {
       		mysql_mutex_unlock(&log_sys.mutex);
      

      In this way, the server startup will fail gracefully, just like it is expected by the test:

      10.5 c8346c0bacfdbe3fd61c67becff9934e75e08ed3 with patch

      2023-12-08 10:47:50 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=53205,1435867
      2023-12-08 10:47:50 0 [ERROR] InnoDB: Recovery cannot access file ./test/t1.ibd (tablespace 5)
      2023-12-08 10:47:50 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
      2023-12-08 10:47:50 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT at 1435867 between the checkpoint 53205 and the end 1501184.
      2023-12-08 10:47:50 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error
      2023-12-08 10:47:50 0 [Note] InnoDB: Starting shutdown...
      2023-12-08 10:47:51 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      

      In 10.8, the log format was changed and the logic around FILE_CHECKPOINT was rewritten in MDEV-14425. We would still invoke fil_name_process() on the FILE_MODIFY records, and recv_sys.found_corrupt_fs could be set for a similar reason, but there is no assertion failure in recv_recovery_from_checkpoint_start() after the initial call to recv_scan_log(). In our CI systems, I only found this type of failure on the 10.5 and 10.6 branches.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.