By analyzing the phases of mariabackup it seems that there are parts that are being executed single threaded. Combined with a moderate/slow storage this leads to excessive backup times.
By profiling the I/O speed of our storage we found that it can deliver larger throughput when more I/Os are queued (i.e. larger queue depth) (as for example in multi threaded operations). On the other hand mariabackup is only partially run multi threaded.
It seems that mariabackup has 3 basic stages in terms of time taken to complete each (InnoDB database with >100K tables):
- open tables - single threaded
- stream ibd - multi threaded
- (lock tables)
- stream frm files - single threaded
- (unlock tables)
FROM these phases open tables & stream frm files is done in a single threaded manner as can be verified by looking at the logs and monitoring tools (atop) during mariabackup.
Command run is
$ mariabackup --backup --user=root --galera-info --parallel=16 --stream=xbstream --ftwrl-wait-timeout=120 --ftwrl-wait-threshold=30 | gzip > /var/backups/mariabackup/blogs-test.gz
- an annotated excerpt of a slow mariabackup operation.
- some source code annotations for the copy ibd and copy frm stages, the latter being executed in the main thead, while the former in a multitheaded manner.
So the issue is that file operations involving large number of files (as the case of innodb databses with the innodb_file_per_table setting (default) and 100K databases) are partially done in a multi threaded manner and partially in a single threaded manner.
It would be interesting if the open tables phase and copy frm files phase could be also made multi threaded. This would make a benefit for storages that can sustain larger queue depths (providing larger throughput for larger queue depths). I would expect this to be applied to modern ssds also that have this characteristic.