Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25666

Atomic DDL: InnoDB: Operating system error number 2, Could not find a valid tablespace file

Details

    Description

      bb-10.6-monty-innodb 98ff31770

      2021-05-13  4:00:12 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 100663296 bytes
      2021-05-13  4:00:12 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
      2021-05-13  4:00:12 0 [Note] InnoDB: New log file created, LSN=8337128
      2021-05-13  4:00:12 0 [Note] InnoDB: 2 transaction(s) which must be rolled back or cleaned up in total 17 row operations to undo
      2021-05-13  4:00:12 0 [Note] InnoDB: Trx id counter is 4846
      2021-05-13  4:00:12 0 [Note] InnoDB: 128 rollback segments are active.
      2021-05-13  4:00:12 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
      2021-05-13  4:00:12 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
      2021-05-13  4:00:12 0 [ERROR] InnoDB: Cannot open datafile for read-only: './test/#sql-alter-44b2b-a.ibd' OS error: 71
      2021-05-13  4:00:12 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
      2021-05-13  4:00:12 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
      2021-05-13  4:00:12 0 [ERROR] InnoDB: Could not find a valid tablespace file for ``test`.`#sql-alter-44b2b-a``. Please refer to https://mariadb.com/kb/en/innodb-data-dictionary-troubleshooting/ for how to resolve the issue.
      2021-05-13  4:00:12 0 [Warning] InnoDB: Ignoring tablespace for `test`.`#sql-alter-44b2b-a` because it could not be opened.
      

      The test runs a mix of DML and DDL in a few threads and periodically executes MariaBackup to take a full data backup. Afterwards, the original server is shut down, each backup in turn is prepared and the server is started upon it.
      On one of backups the errors above occur.

      The error appears to be specific to recovery in the atomic DDL branch and reproducible fairly easily on the revision above. If the same prepared backup is given to vanilla 10.6 server, it starts without any errors.

      Logs, data directories and rr profiles are available.

      To reproduce from scratch:

      git clone https://github.com/MariaDB/randgen --branch mdev25666 rqg-mdev25666
      cd rqg-mdev25666
      . ./mdev25666.cmd <basedir>
      

      The command line runs max 5 test attempts, until the first unsuccessful one. It currently fails for me on the first or second attempt.

      Attachments

        Issue Links

          Activity

            Error 2 is ENOENT (No such file or directory), and 71 is OS_FILE_NOT_FOUND.

            marko Marko Mäkelä added a comment - Error 2 is ENOENT (No such file or directory), and 71 is OS_FILE_NOT_FOUND .

            Would the following fix this?

            diff --git a/extra/mariabackup/xtrabackup.cc b/extra/mariabackup/xtrabackup.cc
            index c590886d6e5..60958eb05ae 100644
            --- a/extra/mariabackup/xtrabackup.cc
            +++ b/extra/mariabackup/xtrabackup.cc
            @@ -2621,10 +2621,6 @@ check_if_skip_table(
             		tbname = ptr + 1;
             	}
             
            -	if (strncmp(tbname, tmp_file_prefix, tmp_file_prefix_length) == 0) {
            -		return TRUE;
            -	}
            -
             	if (regex_exclude_list.empty() &&
             		regex_include_list.empty() &&
             		!tables_include_hash.array &&
            

            I do not see why mariabackup should be allowed to skip copying any files whose name start with #sql. At that point we cannot possibly know whether the DDL operation will be committed or rolled back, or what the final name of that file should be. (OK, maybe we could know that the file will be dropped in the end, if backup locks work, but it is better to wear belt and suspenders.)

            The InnoDB recovery logic seems sound to me:

            			trx_rollback_recovered(false);
            …
            			dict_check_tablespaces_and_store_max_id();
            

            That is, we will first roll back any recovered DDL transaction that was in progress when the server was killed (or the backup was taken). Subsequently, dict_check_sys_tables() inside dict_check_tablespaces_and_store_max_id() would complain if a data file for anything listed in SYS_TABLES is inaccessible.

            If we had not rolled back a recovered DDL transaction first, then there could be false alarms from dict_check_tablespaces_and_store_max_id() due to the fact that after MDEV-18518, we would first write the SYS_TABLES and SYS_INDEXES record, and only after that start creating the data file.

            marko Marko Mäkelä added a comment - Would the following fix this? diff --git a/extra/mariabackup/xtrabackup.cc b/extra/mariabackup/xtrabackup.cc index c590886d6e5..60958eb05ae 100644 --- a/extra/mariabackup/xtrabackup.cc +++ b/extra/mariabackup/xtrabackup.cc @@ -2621,10 +2621,6 @@ check_if_skip_table( tbname = ptr + 1; } - if (strncmp(tbname, tmp_file_prefix, tmp_file_prefix_length) == 0) { - return TRUE; - } - if (regex_exclude_list.empty() && regex_include_list.empty() && !tables_include_hash.array && I do not see why mariabackup should be allowed to skip copying any files whose name start with #sql . At that point we cannot possibly know whether the DDL operation will be committed or rolled back, or what the final name of that file should be. (OK, maybe we could know that the file will be dropped in the end, if backup locks work, but it is better to wear belt and suspenders.) The InnoDB recovery logic seems sound to me: trx_rollback_recovered( false ); … dict_check_tablespaces_and_store_max_id(); That is, we will first roll back any recovered DDL transaction that was in progress when the server was killed (or the backup was taken). Subsequently, dict_check_sys_tables() inside dict_check_tablespaces_and_store_max_id() would complain if a data file for anything listed in SYS_TABLES is inaccessible. If we had not rolled back a recovered DDL transaction first, then there could be false alarms from dict_check_tablespaces_and_store_max_id() due to the fact that after MDEV-18518 , we would first write the SYS_TABLES and SYS_INDEXES record, and only after that start creating the data file.

            The test harness is looking for mariabackup in the wrong location. I fixed it by augmenting my build directory with the following:
             

            ln -s ../extra/mariabackup/mariabackup client
            


            With that, it seems to work for me:

            # 2021-05-14T08:31:39 [360200] Trial 3 ended with exit status STATUS_OK (0)
            # 2021-05-14T08:31:39 [360200] Exit status STATUS_OK and the list of desired status codes is empty, result will be ignored
            # 2021-05-14T08:31:39 [360200] ##########################################################
            # 2021-05-14T08:31:39 [360200] Running trial 4/5
            

            After I reverted the fix, I was able to reproduce the error on the first trial:

            # 2021-05-14T08:38:46 [387278] #------------------------------------------------------------#
            # 2021-05-14T08:38:46 [387278] #-- Checking the server error log for errors after restart --#
            # 2021-05-14T08:38:46 [387278] #------------------------------------------------------------#
            # 2021-05-14T08:38:46 [387278] Checking server log for important errors starting from the beginning
            # 2021-05-14T08:38:46 [387278] ------
            # 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
            # 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
            # 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: Cannot open datafile for read-only: './test/#sql-alter-5e939-b.ibd' OS error: 71
            # 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
            # 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
            # 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: Could not find a valid tablespace file for ``test`.`#sql-alter-5e939-b``. Please refer to https://mariadb.com/kb/en/innodb-data-dictionary-troubleshooting/ for how to resolve the issue.
            # 2021-05-14T08:38:46 [387278] ------
            # 2021-05-14T08:38:46 [387278][ERROR] Found errors in the log, restart has apparently failed
            # 2021-05-14T08:38:46 [387278] Killed process 399920
            # 2021-05-14T08:38:46 [387278] [387278] /mariadb/rqg-mdev25666/runall-new.pl will exit with exit status STATUS_BACKUP_FAILURE (106)
            

            marko Marko Mäkelä added a comment - The test harness is looking for mariabackup in the wrong location. I fixed it by augmenting my build directory with the following:   ln -s ../extra/mariabackup/mariabackup client With that, it seems to work for me: # 2021-05-14T08:31:39 [360200] Trial 3 ended with exit status STATUS_OK (0) # 2021-05-14T08:31:39 [360200] Exit status STATUS_OK and the list of desired status codes is empty, result will be ignored # 2021-05-14T08:31:39 [360200] ########################################################## # 2021-05-14T08:31:39 [360200] Running trial 4/5 After I reverted the fix, I was able to reproduce the error on the first trial: # 2021-05-14T08:38:46 [387278] #------------------------------------------------------------# # 2021-05-14T08:38:46 [387278] #-- Checking the server error log for errors after restart --# # 2021-05-14T08:38:46 [387278] #------------------------------------------------------------# # 2021-05-14T08:38:46 [387278] Checking server log for important errors starting from the beginning # 2021-05-14T08:38:46 [387278] ------ # 2021-05-14T08:38:46 [387278] 2021-05-14 8:38:46 0 [ERROR] InnoDB: Operating system error number 2 in a file operation. # 2021-05-14T08:38:46 [387278] 2021-05-14 8:38:46 0 [ERROR] InnoDB: The error means the system cannot find the path specified. # 2021-05-14T08:38:46 [387278] 2021-05-14 8:38:46 0 [ERROR] InnoDB: Cannot open datafile for read-only: './test/#sql-alter-5e939-b.ibd' OS error: 71 # 2021-05-14T08:38:46 [387278] 2021-05-14 8:38:46 0 [ERROR] InnoDB: Operating system error number 2 in a file operation. # 2021-05-14T08:38:46 [387278] 2021-05-14 8:38:46 0 [ERROR] InnoDB: The error means the system cannot find the path specified. # 2021-05-14T08:38:46 [387278] 2021-05-14 8:38:46 0 [ERROR] InnoDB: Could not find a valid tablespace file for ``test`.`#sql-alter-5e939-b``. Please refer to https://mariadb.com/kb/en/innodb-data-dictionary-troubleshooting/ for how to resolve the issue. # 2021-05-14T08:38:46 [387278] ------ # 2021-05-14T08:38:46 [387278][ERROR] Found errors in the log, restart has apparently failed # 2021-05-14T08:38:46 [387278] Killed process 399920 # 2021-05-14T08:38:46 [387278] [387278] /mariadb/rqg-mdev25666/runall-new.pl will exit with exit status STATUS_BACKUP_FAILURE (106)

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.