There are majorly 3 stages to a SST, backup, de-compression in case the backup was taken with compression and the last is preparing the backup.
From my experience what I have seen is if we have configured the mariabackup as SST method with compression, qpress is used which is much faster with parallel and compress-threads compared to other compression methods like gzip, pigz etc.
But when it comes to de-compression it is very slow, because as per the code in /usr/bin/wsrep_sst_mariabackup
each file is de-compressed with the threads which is equal to total number of cores available on the machine.
But if we use let's say 1/10th of total cores for decompression and running 10 such de-compression in parallel, I see that total decompression time has reduced drastically.
For example, if I have 40 cores, i'll de-compress a file with 4 threads and will run 10 such de-compression processes parallelly.
With this configuration, I was able to SST 1TB of data in just 50 mins.
Compressed Backup stage took 17 mins.
De-compression stage took 20 mins
and preparing stage took around 2 mins, since the backup stage completed very fast, there was not much changes to be applied and hence prepare stage took very less time.
With the original change of using all the cores for de-compressing a file, de-compression stage took more than an hour.
During a physical backup or manual SST on command line using mariabackup we have an option of setting
--decompress --parallel <threads>, but there's no such option for the SST file configuration.
I would leave upto you to check how we can implement this, One way is to either change /usr/bin/wsrep_sst_mariabackup and /usr/bin/wsrep_sst_common to incorporate above changes or introduce a new parameter for parallel decompression threads which can overwrite above logic.