Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20677

Renaming files may not be filesystem-crash-safe

Details

    Description

      MDEV-14717 made rename operations inside InnoDB transactional and crash-safe in case the server process is killed.

      However, the file-renaming operations might not be carried out in a fashion that guarantees crash-safety in a case that involves file system recovery.

      To be safe, we should probably do fsync() of the file and the directory containing it, followed by the rename(), and finally an fsync() of the file and of its containing directory. This should be done both inside storage engines and in code that deals with .frm files and the like.

      Attachments

        Issue Links

          Activity

            these four fsyncs is what PostgreSQL does: https://linuxplumbersconf.org/event/4/contributions/492/attachments/344/573/errors.pdf#page=4 and they claim that even if not all fsyncs are always necessary, every single one of them is needed on at least some filesystems

            serg Sergei Golubchik added a comment - these four fsyncs is what PostgreSQL does: https://linuxplumbersconf.org/event/4/contributions/492/attachments/344/573/errors.pdf#page=4 and they claim that even if not all fsyncs are always necessary, every single one of them is needed on at least some filesystems
            • my_rename() provides the option MY_SYNC_DIR that we could use for the directory syncs.

            I would however argue that some of the above sync are not necessary for MariaDB during rename as:

            • All writes to the file is synced (or are in the redo log and synced on close)
            • The table should have been synced on close ,if there was any changes, and because of that we don't need any sync for either file.

            Some things to think about:

            • As we are doing operations of multiple files in the same directory, do we need a sync for each operation or only need to sync the directory after the last one?
            • After all, if we get a crash in the middle of rename, between syncs, we still have a situation where we can't totally trust the directory
              context and have to rely on redo. If there is a redo of file names, it should be able to fix any issues with missing middle syncs.

            Some other observations:

            • Aria is already doing the necessary syncs in case of rename

            However as Atomic DDL has to be delayed to 10.6, this is a 10.6 issue not a 10.5 issue

            monty Michael Widenius added a comment - my_rename() provides the option MY_SYNC_DIR that we could use for the directory syncs. I would however argue that some of the above sync are not necessary for MariaDB during rename as: All writes to the file is synced (or are in the redo log and synced on close) The table should have been synced on close ,if there was any changes, and because of that we don't need any sync for either file. Some things to think about: As we are doing operations of multiple files in the same directory, do we need a sync for each operation or only need to sync the directory after the last one? After all, if we get a crash in the middle of rename, between syncs, we still have a situation where we can't totally trust the directory context and have to rely on redo. If there is a redo of file names, it should be able to fix any issues with missing middle syncs. Some other observations: Aria is already doing the necessary syncs in case of rename However as Atomic DDL has to be delayed to 10.6, this is a 10.6 issue not a 10.5 issue

            People

              monty Michael Widenius
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.