Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
The current usage of a temporary table involves the following system calls that explicitly reference the filename.
843157 openat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAI", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0660) = 49
|
843157 openat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAD", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0660) = 50
|
843157 readlink("/tmp/#sql-temptable-cdd6a-3-0.MAI", 0x7fd25c0b7400, 1023) = -1 EINVAL (Invalid argument)
|
843157 newfstatat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAI", {st_mode=S_IFREG|0660, st_size=8192, ...}, AT_SYMLINK_NOFOLLOW) = 0
|
843157 openat(49, "#sql-temptable-cdd6a-3-0.MAI", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 50
|
843157 newfstatat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAD", {st_mode=S_IFREG|0660, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
|
843157 openat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAD", O_RDWR|O_CLOEXEC) = 49
|
843157 newfstatat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAI", {st_mode=S_IFREG|0660, st_size=8192, ...}, AT_SYMLINK_NOFOLLOW) = 0
|
843157 unlink("/tmp/#sql-temptable-cdd6a-3-0.MAI") = 0
|
843157 newfstatat(AT_FDCWD, "/tmp/#sql-temptable-cdd6a-3-0.MAD", {st_mode=S_IFREG|0660, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
|
843157 unlink("/tmp/#sql-temptable-cdd6a-3-0.MAD") = 0
|
As seen in MDEV-34577, there are inefficiencies in overlayfs2 (used by containers, Docker's moby, Podman etc), that result in poor performance.
Overall eleven system calls by by filename causes overhead inside the kernel and contention locks there inside the kernel on top of the cost of all of the context switch of those system calls.
With O_TMPFILE, like implemented in MDEV-15584, there is no need to maintain on disk filenames for temporary tables.
As seen in MDEV-17420, when errors occur there is a bunch of temporary files to clean up.
This will at the SQL layer result in combining create_internal_tmp_table and open_tmp_table, they are always following each so this is easy.
maria_create (mi_create) would need to be adjusted to incorporate the state made by ha_maria::open and in a table opened once. Currently maria_create leaves the newly created files closed on storage.
with O_TMPFILE used as an argument to open, there should be two open calls instead of the previous eleven, with additional performance benefits that the ha_maria::open won't need to revalidate everything the maria_create already did.
Additional benefits if something crashes there will be no temporary files lingering of the filesystem.
Attachments
Issue Links
- relates to
-
MDEV-15584 Linux - use O_TMPFILE for create_temp_file
-
- Closed
-
-
MDEV-17420 MariaDB slave 10.2 leaks temporary tables
-
- Closed
-
-
MDEV-34577 Queries with on-disk tmp-tables cause significant additional memory use in Docker
-
- Closed
-
I think logic needs to be also implemented for non-Linux (at least automatically removed files when process ends, which is FILE_FLAG_DELETE_ON_CLOSE open on Windows perhaps probably create file followed by unlink elsewhere).
If this is implemented Linux-only, sooner or later we'll start leaking temp files, without noticing it fast enough.