[MDEV-24184] InnoDB RENAME TABLE recovery failure if names are reused Created: 2020-11-10 Updated: 2021-06-04 Resolved: 2021-03-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table, Storage Engine - InnoDB |
| Affects Version/s: | 10.2, 10.3, 10.4, 10.5, 10.6 |
| Fix Version/s: | 10.5.10, 10.6.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Eugene Kosov (Inactive) | Assignee: | Vladislav Lesin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | recovery | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
This came up in First of all, at least starting with MariaDB 10.5 (and the
With that fix applied, the recovery would fail on the third rename starting from checkpoint_lsn=77739:
We are renaming tablespace 5 from t2 to t1, then renaming tablespace 6 from t5 to t2 and back. The tablespace 5 already was named t1, and the tablespace 6 was named t2. So, only the third operation results in an attempted rename. But, there already was a t5 at tablespace 8. It had been successfully renamed before the server was killed:
I think that we should fix the redo log apply as follows:
I believe that by design, we will have to perform at most one rename operation. This should be guaranteed by the log_write_up_to() call in fil_name_write_rename(). |
| Comments |
| Comment by Marko Mäkelä [ 2020-11-10 ] |
|
Fixing this could fix the root cause of |
| Comment by Marko Mäkelä [ 2021-01-27 ] |
|
I now have a fix that I tested in the bb-10.6-monty branch. |
| Comment by Marko Mäkelä [ 2021-01-27 ] |
|
I removed the extra log record write. vlad.lesin, I pushed one commit to bb-10.6-monty-MDEV-24184 where this is mostly fixed. It turns out that mariabackup --prepare implements its own RENAME logging, which looks like a work-around of this bug. I think that in 10.6, we should remove the .ren files and rename_table_in_prepare() altogether. In GA versions, we should ensure that rename_table_in_prepare() will update fil_space_t. Possibly my above mentioned commit is fine as is for 10.5, and there was no bug in Mariabackup after all (thanks to the rename_table_in_prepare(), which strongly feels like a work-around of this recovery bug). |
| Comment by Vladislav Lesin [ 2021-01-31 ] |
|
marko, mariabackup processes renames with almost the same way you implemented in this task. I.e. it fills some map (space id)->(space name) during log reading, and then creates special files in backup directory with new names, and does renaming on --prepare basing on those files. So, yes, it contains some kind of workaround for this bug. |
| Comment by Vladislav Lesin [ 2021-02-01 ] |
|
One more reason why mariabackup is not affected is that server can't change backup directory. According to the description the following scenario was possible before Marko's fix: 1) Rename t1->t2, So if Server had already executed (3) before crash, recovery would fail trying to execute (1). In the case of mariabackup, it enumerates data files before starting log copying, i.e. it will copy: t1,t3, and start recovery since (1) or but, anyway, there will not be a situation when backup directory contains t3 renamed to t2 and recovery fails trying to execute (1). |
| Comment by Vladislav Lesin [ 2021-03-11 ] |
|
When I tried to remove only "rename" processing from mariabackup, I found that we can't remove it separately from "create" and "drop" processing. I would start with the claim that ddl recovery logic itself differs from backup ddl restoring logic. For example, if some tablespace was created and the server was crashed, the tablespace will not be created during redo log replaying, it will be loaded instead (see mfile_type_t::FILE_CREATE, recv_sys_t::parse(), fil_name_process(), recv_init_crash_recovery_spaces()). But for backup it must be created. Current ddl fixing logic just copies tablespaces created during backup to backup directory, if we remove it, we need to create tablespace during recovery. The same is true for "drop" operation. Redo log records are just removed for such tablespaces (see recv_validate_tablespace()) during recovery, but the files itself are not removed, but they must be removed on the case of backup restoring. If we return to "rename" ddl's, then at least one issue exists for backup restoring and does not exist for crash recovery - it's circular renames, when there is rename sequence like the following: a->tmp, b->a, tmp->b (see also rename_during_backup.test). The algorithm, proposed here: https://github.com/MariaDB/server/commit/fb219c7cb05e0e486658de05a11cb775e39ecc3d will not manage such renames in the case of backup restoring. So we can't just remove ddl fixing code from mariabackup, we also need to fix recovery code so, that it would also suite for backup restoring. One way to do this is to move backup_fix_ddl() logic from mariabackup to InnoDB recovery code, for example in recv_init_crash_recovery_spaces(). I will describe this idea in details in MDEV-18336 comments. The other idea was proposed by marko - it's replaying DDL records at the sequence of their parsing. And if we return to the initial idea within the current issue, i.e. to remove only "rename" ddl processing from mariabackup, we will have the bugs when "renames" are mixed with other ddl's. Example. Suppose we have "rename t1 to t2; create t1;" sequence in log file, and t1 was copied to backup directory before "rename" was executed by the server, and backup was finished after "create" is executed by the server. Currently backup directory will contain two files for ddl processing during backup prepare: t1.ibd.ren and t1.ibd.new. Before starting recovery t1.ibd.ren will be processed so that t1 will be renamed to t2, then t1.ibd.new will be renamed to t1.ibd. If we just remove rename_table_in_prepare() call from mariabackup, there will not be t1.ibd.ren file, and mariabackup will just overwrites t1.ibd with t1.ibd.new, and then start recovery, what will cause error. So if we need complex approach to get rid of separate ddl's processing in mariabackup, and this task should be separated from the current issue. We already created the task for this earlier: MDEV-18336. As for the current fix, i.e. https://github.com/MariaDB/server/commit/fb219c7cb05e0e486658de05a11cb775e39ecc3d commit, it works well for recovery and does not break backup preparing, so it's good enough for fixing this issue. |
| Comment by Marko Mäkelä [ 2021-03-12 ] |
|
Thank you vlad.lesin. Can you please analyze and fix the failure of the test innodb.log_file_name and push a revised commit to a 10.5-based branch? I think that it suffices to fix this failure in 10.5 and later versions for now. Before |