[MDEV-25666] Atomic DDL: InnoDB: Operating system error number 2, Could not find a valid tablespace file Created: 2021-05-13  Updated: 2023-03-27  Resolved: 2021-05-14

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: N/A
Fix Version/s: 10.6.1

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-17567 Atomic DDL Closed
relates to MDEV-25854 Restoring a backup may result in garb... Closed
relates to MDEV-25899 intermediate files operations are not... Closed

 Description   

bb-10.6-monty-innodb 98ff31770

2021-05-13  4:00:12 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 100663296 bytes
2021-05-13  4:00:12 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2021-05-13  4:00:12 0 [Note] InnoDB: New log file created, LSN=8337128
2021-05-13  4:00:12 0 [Note] InnoDB: 2 transaction(s) which must be rolled back or cleaned up in total 17 row operations to undo
2021-05-13  4:00:12 0 [Note] InnoDB: Trx id counter is 4846
2021-05-13  4:00:12 0 [Note] InnoDB: 128 rollback segments are active.
2021-05-13  4:00:12 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
2021-05-13  4:00:12 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
2021-05-13  4:00:12 0 [ERROR] InnoDB: Cannot open datafile for read-only: './test/#sql-alter-44b2b-a.ibd' OS error: 71
2021-05-13  4:00:12 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
2021-05-13  4:00:12 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
2021-05-13  4:00:12 0 [ERROR] InnoDB: Could not find a valid tablespace file for ``test`.`#sql-alter-44b2b-a``. Please refer to https://mariadb.com/kb/en/innodb-data-dictionary-troubleshooting/ for how to resolve the issue.
2021-05-13  4:00:12 0 [Warning] InnoDB: Ignoring tablespace for `test`.`#sql-alter-44b2b-a` because it could not be opened.

The test runs a mix of DML and DDL in a few threads and periodically executes MariaBackup to take a full data backup. Afterwards, the original server is shut down, each backup in turn is prepared and the server is started upon it.
On one of backups the errors above occur.

The error appears to be specific to recovery in the atomic DDL branch and reproducible fairly easily on the revision above. If the same prepared backup is given to vanilla 10.6 server, it starts without any errors.

Logs, data directories and rr profiles are available.

To reproduce from scratch:

git clone https://github.com/MariaDB/randgen --branch mdev25666 rqg-mdev25666
cd rqg-mdev25666
. ./mdev25666.cmd <basedir>

The command line runs max 5 test attempts, until the first unsuccessful one. It currently fails for me on the first or second attempt.



 Comments   
Comment by Marko Mäkelä [ 2021-05-13 ]

Error 2 is ENOENT (No such file or directory), and 71 is OS_FILE_NOT_FOUND.

Comment by Marko Mäkelä [ 2021-05-14 ]

Would the following fix this?

diff --git a/extra/mariabackup/xtrabackup.cc b/extra/mariabackup/xtrabackup.cc
index c590886d6e5..60958eb05ae 100644
--- a/extra/mariabackup/xtrabackup.cc
+++ b/extra/mariabackup/xtrabackup.cc
@@ -2621,10 +2621,6 @@ check_if_skip_table(
 		tbname = ptr + 1;
 	}
 
-	if (strncmp(tbname, tmp_file_prefix, tmp_file_prefix_length) == 0) {
-		return TRUE;
-	}
-
 	if (regex_exclude_list.empty() &&
 		regex_include_list.empty() &&
 		!tables_include_hash.array &&

I do not see why mariabackup should be allowed to skip copying any files whose name start with #sql. At that point we cannot possibly know whether the DDL operation will be committed or rolled back, or what the final name of that file should be. (OK, maybe we could know that the file will be dropped in the end, if backup locks work, but it is better to wear belt and suspenders.)

The InnoDB recovery logic seems sound to me:

			trx_rollback_recovered(false);
			dict_check_tablespaces_and_store_max_id();

That is, we will first roll back any recovered DDL transaction that was in progress when the server was killed (or the backup was taken). Subsequently, dict_check_sys_tables() inside dict_check_tablespaces_and_store_max_id() would complain if a data file for anything listed in SYS_TABLES is inaccessible.

If we had not rolled back a recovered DDL transaction first, then there could be false alarms from dict_check_tablespaces_and_store_max_id() due to the fact that after MDEV-18518, we would first write the SYS_TABLES and SYS_INDEXES record, and only after that start creating the data file.

Comment by Marko Mäkelä [ 2021-05-14 ]

The test harness is looking for mariabackup in the wrong location. I fixed it by augmenting my build directory with the following:
 

ln -s ../extra/mariabackup/mariabackup client


With that, it seems to work for me:

# 2021-05-14T08:31:39 [360200] Trial 3 ended with exit status STATUS_OK (0)
# 2021-05-14T08:31:39 [360200] Exit status STATUS_OK and the list of desired status codes is empty, result will be ignored
# 2021-05-14T08:31:39 [360200] ##########################################################
# 2021-05-14T08:31:39 [360200] Running trial 4/5

After I reverted the fix, I was able to reproduce the error on the first trial:

# 2021-05-14T08:38:46 [387278] #------------------------------------------------------------#
# 2021-05-14T08:38:46 [387278] #-- Checking the server error log for errors after restart --#
# 2021-05-14T08:38:46 [387278] #------------------------------------------------------------#
# 2021-05-14T08:38:46 [387278] Checking server log for important errors starting from the beginning
# 2021-05-14T08:38:46 [387278] ------
# 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
# 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
# 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: Cannot open datafile for read-only: './test/#sql-alter-5e939-b.ibd' OS error: 71
# 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
# 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
# 2021-05-14T08:38:46 [387278] 2021-05-14  8:38:46 0 [ERROR] InnoDB: Could not find a valid tablespace file for ``test`.`#sql-alter-5e939-b``. Please refer to https://mariadb.com/kb/en/innodb-data-dictionary-troubleshooting/ for how to resolve the issue.
# 2021-05-14T08:38:46 [387278] ------
# 2021-05-14T08:38:46 [387278][ERROR] Found errors in the log, restart has apparently failed
# 2021-05-14T08:38:46 [387278] Killed process 399920
# 2021-05-14T08:38:46 [387278] [387278] /mariadb/rqg-mdev25666/runall-new.pl will exit with exit status STATUS_BACKUP_FAILURE (106)

Generated at Thu Feb 08 09:39:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.