[MDEV-25666] Atomic DDL: InnoDB: Operating system error number 2, Could not find a valid tablespace file Created: 2021-05-13 Updated: 2023-03-27 Resolved: 2021-05-14 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | N/A |
| Fix Version/s: | 10.6.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
The test runs a mix of DML and DDL in a few threads and periodically executes MariaBackup to take a full data backup. Afterwards, the original server is shut down, each backup in turn is prepared and the server is started upon it. The error appears to be specific to recovery in the atomic DDL branch and reproducible fairly easily on the revision above. If the same prepared backup is given to vanilla 10.6 server, it starts without any errors. Logs, data directories and rr profiles are available. To reproduce from scratch:
The command line runs max 5 test attempts, until the first unsuccessful one. It currently fails for me on the first or second attempt. |
| Comments |
| Comment by Marko Mäkelä [ 2021-05-13 ] | ||||||||||||||||||||
|
Error 2 is ENOENT (No such file or directory), and 71 is OS_FILE_NOT_FOUND. | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-05-14 ] | ||||||||||||||||||||
|
Would the following fix this?
I do not see why mariabackup should be allowed to skip copying any files whose name start with #sql. At that point we cannot possibly know whether the DDL operation will be committed or rolled back, or what the final name of that file should be. (OK, maybe we could know that the file will be dropped in the end, if backup locks work, but it is better to wear belt and suspenders.) The InnoDB recovery logic seems sound to me:
That is, we will first roll back any recovered DDL transaction that was in progress when the server was killed (or the backup was taken). Subsequently, dict_check_sys_tables() inside dict_check_tablespaces_and_store_max_id() would complain if a data file for anything listed in SYS_TABLES is inaccessible. If we had not rolled back a recovered DDL transaction first, then there could be false alarms from dict_check_tablespaces_and_store_max_id() due to the fact that after | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-05-14 ] | ||||||||||||||||||||
|
The test harness is looking for mariabackup in the wrong location. I fixed it by augmenting my build directory with the following:
After I reverted the fix, I was able to reproduce the error on the first trial:
|