[MDEV-20677] Renaming files may not be filesystem-crash-safe Created: 2019-09-26  Updated: 2021-06-30

Status: Open
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: recovery

Issue Links:
Relates
relates to MDEV-14717 RENAME TABLE in InnoDB is not crash-safe Closed

 Description   

MDEV-14717 made rename operations inside InnoDB transactional and crash-safe in case the server process is killed.

However, the file-renaming operations might not be carried out in a fashion that guarantees crash-safety in a case that involves file system recovery.

To be safe, we should probably do fsync() of the file and the directory containing it, followed by the rename(), and finally an fsync() of the file and of its containing directory. This should be done both inside storage engines and in code that deals with .frm files and the like.



 Comments   
Comment by Sergei Golubchik [ 2019-09-30 ]

these four fsyncs is what PostgreSQL does: https://linuxplumbersconf.org/event/4/contributions/492/attachments/344/573/errors.pdf#page=4 and they claim that even if not all fsyncs are always necessary, every single one of them is needed on at least some filesystems

Comment by Michael Widenius [ 2020-06-07 ]
  • my_rename() provides the option MY_SYNC_DIR that we could use for the directory syncs.

I would however argue that some of the above sync are not necessary for MariaDB during rename as:

  • All writes to the file is synced (or are in the redo log and synced on close)
  • The table should have been synced on close ,if there was any changes, and because of that we don't need any sync for either file.

Some things to think about:

  • As we are doing operations of multiple files in the same directory, do we need a sync for each operation or only need to sync the directory after the last one?
  • After all, if we get a crash in the middle of rename, between syncs, we still have a situation where we can't totally trust the directory
    context and have to rely on redo. If there is a redo of file names, it should be able to fix any issues with missing middle syncs.

Some other observations:

  • Aria is already doing the necessary syncs in case of rename

However as Atomic DDL has to be delayed to 10.6, this is a 10.6 issue not a 10.5 issue

Generated at Thu Feb 08 09:01:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.