Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35860

use O_TMPFILE for (implicit) temporary Aria/MyISAM tables

Details

    Description

      The current usage of a temporary table involves the following system calls that explicitly reference the filename.

      843157 openat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAI", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0660) = 49
      843157 openat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAD", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0660) = 50
      843157 readlink("/tmp/#sql-temptable-cdd6a-3-0.MAI", 0x7fd25c0b7400, 1023) = -1 EINVAL (Invalid argument)
      843157 newfstatat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAI", {st_mode=S_IFREG|0660, st_size=8192, ...}, AT_SYMLINK_NOFOLLOW) = 0
      843157 openat(49, "#sql-temptable-cdd6a-3-0.MAI", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 50
      843157 newfstatat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAD", {st_mode=S_IFREG|0660, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
      843157 openat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAD", O_RDWR|O_CLOEXEC) = 49
      843157 newfstatat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAI", {st_mode=S_IFREG|0660, st_size=8192, ...}, AT_SYMLINK_NOFOLLOW) = 0
      843157 unlink("/tmp/#sql-temptable-cdd6a-3-0.MAI") = 0
      843157 newfstatat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAD", {st_mode=S_IFREG|0660, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
      843157 unlink("/tmp/#sql-temptable-cdd6a-3-0.MAD") = 0
      

      As seen in MDEV-34577, there are inefficiencies in overlayfs2 (used by containers, Docker's moby, Podman etc), that result in poor performance.

      Overall eleven system calls by by filename causes overhead inside the kernel and contention locks there inside the kernel on top of the cost of all of the context switch of those system calls.

      With O_TMPFILE, like implemented in MDEV-15584, there is no need to maintain on disk filenames for temporary tables.

      As seen in MDEV-17420, when errors occur there is a bunch of temporary files to clean up.

      This will at the SQL layer result in combining create_internal_tmp_table and open_tmp_table, they are always following each so this is easy.

      maria_create (mi_create) would need to be adjusted to incorporate the state made by ha_maria::open and in a table opened once. Currently maria_create leaves the newly created files closed on storage.

      with O_TMPFILE used as an argument to open, there should be two open calls instead of the previous eleven, with additional performance benefits that the ha_maria::open won't need to revalidate everything the maria_create already did.

      Additional benefits if something crashes there will be no temporary files lingering of the filesystem.

      Attachments

        Issue Links

          Activity

            wlad Vladislav Vaintroub added a comment - - edited

            I think logic needs to be also implemented for non-Linux (at least automatically removed files when process ends, which is FILE_FLAG_DELETE_ON_CLOSE open on Windows perhaps probably create file followed by unlink elsewhere).
            If this is implemented Linux-only, sooner or later we'll start leaking temp files, without noticing it fast enough.

            wlad Vladislav Vaintroub added a comment - - edited I think logic needs to be also implemented for non-Linux (at least automatically removed files when process ends, which is FILE_FLAG_DELETE_ON_CLOSE open on Windows perhaps probably create file followed by unlink elsewhere). If this is implemented Linux-only, sooner or later we'll start leaking temp files, without noticing it fast enough.
            danblack Daniel Black added a comment -

            Happy to see Windows FILE_FLAG_DELETE_ON_CLOSE be part of this implementation with fallback unlinking for non-(Windows,Linux).

            danblack Daniel Black added a comment - Happy to see Windows FILE_FLAG_DELETE_ON_CLOSE be part of this implementation with fallback unlinking for non-(Windows,Linux).

            People

              Unassigned Unassigned
              danblack Daniel Black
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.