Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18349

InnoDB file size changes are not safe when file system crashes

Details

    Description

      When InnoDB is invoking posix_fallocate() to extend data files, it is missing a call to fsync() to update the file system metadata. If file system recovery is needed, the file size could be incorrect.

      Furthermore, when the setting innodb_flush_method=O_DIRECT_NO_FSYNC that was introduced in MariaDB 10.0.11 (and MySQL 5.6) is enabled, InnoDB would stop calling fsync() after extending files.

      This report is motivated by a MySQL 5.7.25 change Bug#27309336 Backport to 5.7 that restores the fsync() call. We will fix the bug differently; it does not seem to be a good idea to hold the already contentious fil_system->mutex while executing a system call.

      Attachments

        Issue Links

          Activity

            Related to this, there appears to have been a bad merge to MariaDB 10.0.31 and 10.1.24 that caused XtraDB to ignore errors from posix_fallocate().

            marko Marko Mäkelä added a comment - Related to this, there appears to have been a bad merge to MariaDB 10.0.31 and 10.1.24 that caused XtraDB to ignore errors from posix_fallocate() .

            > We will fix the bug differently; it does not seem to be a good idea to hold the already contentious fil_system->mutex while executing a system call.

            But os_file_flush() inside fil_flush() already called outside of a mutex.

            kevg Eugene Kosov (Inactive) added a comment - > We will fix the bug differently; it does not seem to be a good idea to hold the already contentious fil_system->mutex while executing a system call. But os_file_flush() inside fil_flush() already called outside of a mutex.

            kevg, you are right, the system call in fil_flush() is not covered by a mutex. My mistake.
            My solution is not optimal, because in a worst case, it can invoke fsync() twice in a succession: first from os_file_flush() after posix_fallocate() and then possibly from fil_flush(), because in the first call we are not updating the bookkeeping. As always, improvements are welcome.

            marko Marko Mäkelä added a comment - kevg , you are right, the system call in fil_flush() is not covered by a mutex. My mistake. My solution is not optimal, because in a worst case, it can invoke fsync() twice in a succession: first from os_file_flush() after posix_fallocate() and then possibly from fil_flush() , because in the first call we are not updating the bookkeeping. As always, improvements are welcome.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.