Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12103

Reduce the time of looking for MLOG_CHECKPOINT during crash recovery

Details

    Description

      We should fix MySQL Bug #80788 in MariaDB 10.2.

      When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that could be avoided.
      Before MariaDB 10.2 is released as GA, we are free to change the redo log format and add extra information to the redo log checkpoint page, so that the extra scan can be avoided.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä added a comment - bb-10.2-marko

            ok to push after considering the comment about removing the error message, there should be mechanism to detect that two or more redo log files do not form a consistent one redo log.

            jplindst Jan Lindström (Inactive) added a comment - ok to push after considering the comment about removing the error message, there should be mechanism to detect that two or more redo log files do not form a consistent one redo log.

            The individual redo log files form one logical redo log file, as if the files had been catenated together. I am afraid that we cannot easily extend the consistency checks. In the long term, I would like to have a single log file only. Starting with 10.2 (MDEV-12061 Allow innodb_log_files_in_group=1) we can use a single file.

            In this innodb.innodb_bug59641 test failure it is clear that some revision to the logic is needed. I can occasionally repeat the failure locally by running a few of the same preceding tests on the same instance:

            ./mtr --no-reorder innodb.innodb_bug52663 innodb.innodb_bug53290 innodb.innodb_bug53592 innodb.innodb_bug54044 innodb.innodb_bug56143 innodb.innodb_bug56716 innodb.innodb_bug57252 innodb.innodb_bug57255 innodb.innodb_bug57904 innodb.innodb_bug59410 innodb.innodb_bug59641
            

            The following should fix it:

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index b0e0652470b..218e1367e83 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -1156,6 +1156,7 @@ recv_parse_or_apply_log_rec_body(
             		ut_d(page_type = fil_page_get_type(page));
             	} else if (apply
             		   && !is_predefined_tablespace(space_id)
            +		   && recv_sys->scanned_lsn >= recv_sys->mlog_checkpoint_lsn
             		   && recv_spaces.find(space_id) == recv_spaces.end()) {
             		ib::fatal() << "Missing MLOG_FILE_NAME or MLOG_FILE_DELETE"
             			" for redo log record " << type << " (page "
            

            I think that we need something more to ensure that we will catch tablespaces that are entered into recv_sys->addr_hash but missing from recv_spaces. Also a test for this kind of redo log corruption will be needed.

            marko Marko Mäkelä added a comment - The individual redo log files form one logical redo log file, as if the files had been catenated together. I am afraid that we cannot easily extend the consistency checks. In the long term, I would like to have a single log file only. Starting with 10.2 ( MDEV-12061 Allow innodb_log_files_in_group=1) we can use a single file. In this innodb.innodb_bug59641 test failure it is clear that some revision to the logic is needed. I can occasionally repeat the failure locally by running a few of the same preceding tests on the same instance: ./mtr --no-reorder innodb.innodb_bug52663 innodb.innodb_bug53290 innodb.innodb_bug53592 innodb.innodb_bug54044 innodb.innodb_bug56143 innodb.innodb_bug56716 innodb.innodb_bug57252 innodb.innodb_bug57255 innodb.innodb_bug57904 innodb.innodb_bug59410 innodb.innodb_bug59641 The following should fix it: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index b0e0652470b..218e1367e83 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -1156,6 +1156,7 @@ recv_parse_or_apply_log_rec_body( ut_d(page_type = fil_page_get_type(page)); } else if (apply && !is_predefined_tablespace(space_id) + && recv_sys->scanned_lsn >= recv_sys->mlog_checkpoint_lsn && recv_spaces.find(space_id) == recv_spaces.end()) { ib::fatal() << "Missing MLOG_FILE_NAME or MLOG_FILE_DELETE" " for redo log record " << type << " (page " I think that we need something more to ensure that we will catch tablespaces that are entered into recv_sys->addr_hash but missing from recv_spaces. Also a test for this kind of redo log corruption will be needed.

            After extending the test innodb.log_corruption, I found out that recv_sys->recovered_lsn should be used instead of recv_sys->scanned_lsn. The inconsistency will be reported in recv_init_crash_recovery_spaces().

            marko Marko Mäkelä added a comment - After extending the test innodb.log_corruption, I found out that recv_sys->recovered_lsn should be used instead of recv_sys->scanned_lsn. The inconsistency will be reported in recv_init_crash_recovery_spaces().
            marko Marko Mäkelä added a comment - bb-10.2-marko

            ok to push.

            jplindst Jan Lindström (Inactive) added a comment - ok to push.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.