Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.6.12
-
Linux 6.2.11-gentoo #1 SMP PREEMPT_DYNAMIC x86_64 AMD EPYC 7451 24-Core GNU/Linux
Description
Use case: a cluster connected with 10Gbp network and big (over 2TB of size) dataset residing on fast (NVMe) drives.
The problem: SST takes over 8 hours to complete between the nodes. This is at least inconvenient.
With latest versions of mariadb and galera (valid for mariadb versions 10.4 - 10.6) regardless of network settings and was noticed that data transfer speed is about 700Mbps with compression (zstd) and never gets higher, while no resource on either donoring node or receiving side seems to be bottleneck: CPU is not loaded, no %IOwait is visible on any of them, other resources are also not utilized.
It was also found that network is not a bottleneck, too: in case parallel file transfer over network starts, SST speed doesn't change and NIC indicates total network utilization much higher than 1Gpbs.
Settings we have:
[mariabackup]
|
parallel=8 |
use-memory=4G
|
 |
[SST]
|
inno-backup-opts="--parallel=8" |
compressor='/usr/bin/zstd -2 --threads=8' |
decompressor='/usr/bin/zstd -d' |
Compression type and number of threads were set as a result of benchmarking, the fastest algorithm we set has speed of 450MBytes per second with reading source data from mysql directory and transferring data over the network with discarding it on recipient side (`nc -l [port] | zstd -d --stdout >> /dev/null`)
On further investigation, it was found that the bottleneck is mbstream process on receiving node.
Problem details.
SST transfer is performed by the `usr/bin/wsrep_sst_mariabackup` script. This script parses [SST] section (thus `parallel=8` option in `[mariabackup]` section is ignored and separate option with misleading name "inno-backup-opts" is required.) and according to settings, establish a transfer through the network, according to settings:
- on donor side: mariabackup -> [optional tar] -> [archiver or `pv`] -> [netcat or socat]
- on recipient side: [netcat or socat] -> [optional unarchiver] -> [untar or mbstream] -> data directory (/var/lib/mysql)
It was found that bottleneck is recipient side.
Indeed:
- mariabackup is multithreaded
- network has spare capacity
- (un)archiver is multithreaded
But mbstream always runs on 1 thread, exactly one CPU core has 100% utilization by mbstream process.
Documentation states, that mbstream is preferred compared to tar because mbstream supports encryption and multithreading.
Indeed, on Percona forum there's a discussion that recommends setting:
xbstream-opts="--parallel=16" |
However, this option is not available for `usr/bin/wsrep_sst_mariabackup` script - it's simply absent, the script has the block:
if [ "$WSREP_SST_OPT_ROLE" = 'joiner' ]; then |
strmcmd="'$STREAM_BIN' -x" |
else |
strmcmd="'$STREAM_BIN' -c '$INFO_FILE'" |
fi
|
Where `STREAM_BIN` is path to mbstream binary.
According to this block, mbstream command always starts with just one thread! So, while one of the reasons for using mbstream is multithreadins, there's no option to really use it.
The result
Indeed, when this block was modified in the way:
648c648
|
< strmcmd="'$STREAM_BIN' -x" |
---
|
> strmcmd="'$STREAM_BIN' -x --parallel=8" |
650c650
|
< strmcmd="'$STREAM_BIN' -c '$INFO_FILE'" |
---
|
> strmcmd="'$STREAM_BIN' -c '$INFO_FILE' --parallel=8" |
thus making hardcoded 8 threads used for mbstream, network exchange speed became about 2Gbps with the same compression option, thus SST time shortened from 8 hours to 2.5!
The request:
To make SST faster, can MariaDB team add support for the `xbstream-opts` of `mbstream-opts` to make possible using multi-threaded writes on recipient side, thus utilizing fast network connections efficiently and shortening SST time?
Attachments
Issue Links
- relates to
-
MDEV-28555 Making SST faster in Galera
- Open
-
MDEV-33991 Support FIFO Parallel Stream for Mariabackup Galera SST
- Open
-
MDEV-28555 Making SST faster in Galera
- Open