Details
-
Bug
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
12.2.1
Description
Hi,
I was transferring a large dataset during an SST, and checking if there any performance gains to be made. A minor thing I noticed was fadvise64() being called after every write:
14:46:30.053265 pread64(308, "l\277\310,\2\260k\0"..., 10485760, 739183165440) = 10485760
|
14:46:30.080575 fadvise64(308, 739183165440, 10485760, POSIX_FADV_DONTNEED) = 0
|
14:46:30.082365 write(1, "XBSTCK01"..., 63) = 63
|
14:46:30.082454 fadvise64(1, 0, 0, POSIX_FADV_DONTNEED) = -1 ESPIPE (Illegal seek)
|
14:46:30.082512 write(1, "l\277\310,\2\260k\0"..., 10485760) = 10485760
|
14:46:30.118725 fadvise64(1, 0, 0, POSIX_FADV_DONTNEED) = -1 ESPIPE (Illegal seek)
|
14:46:30.118807 futex(0x55f717da8a08, FUTEX_WAKE_PRIVATE, 1) = 1
|
This process is the mariadb-backup that reads files from disk and feeds them to socat (optionally via a compressor).
/usr/sbin/mysqld --defaults-file=/etc/my.instance.cnf --wsrep-new-cluster
|
\_ sh -c wsrep_sst_mariabackup --role 'donor' ...
|
\_ bash /usr//bin/wsrep_sst_mariabackup --role donor ...
|
\_ /usr//bin/mariadb-backup --defaults-file=/etc/my.instance.cnf --backup --no-version-check --databases-exclude=lost+found --parallel=4 ... --use-memory=400G ...
|
\_ zstd -T8
|
\_ socat -u stdio openssl-connect:...
|
In this usecase, fd 1 is always a PIPE. And for 20TB (in 10MB chunks), it means that we do 2*2,000,000 calls to fadvise64() which all get an ESPIPE error.
For 80 µs per fadvise64() that translates to 320 seconds of system time. (Correct me if I'm wrong.) That means we might be able to speed up a 30 hour transfer by 5 minutes.
Can we get a check at the appropriate place after the fadvise64/posix_fadvise that checks for ESPIPE and then stores that on a boolean of the File object; and then skips the fadvise64() call if it is set?
[edit] I did calculate this for an 1MB buffer. After re-reading, I see that it's a 10MB buffer, reducing the win from 3200 seconds (an hour) to 320 seconds (5 minutes).
We could add a cheap userspace check to save us a syscall and win 5 minutes on a 25 hour transfer. And if this happens to end up in code that runs with a smaller buffer, it might help more. But, seeing that it's just 5 minutes, I can also see that you might not care.
Cheers,
Walter Doekes
OSSO B.V.
Attachments
Issue Links
- relates to
-
MDEV-38362 Develop an efficient alternative to mbstream
-
- In Progress
-