[MDEV-25931] Mariabackup is slow for databases with 100K tables due to single threaded operations Created: 2021-06-15  Updated: 2022-02-23

Status: Open
Project: MariaDB Server
Component/s: Backup, mariabackup
Affects Version/s: 10.3.27
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Vasilis G Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: backup, performance
Environment:

Debian 10.9


Attachments: Text File 20210609-mydb1-example-mariabackup.txt     Text File 20210614-mariabackup-analyze.txt    

 Description   

Hello

By analyzing the phases of mariabackup it seems that there are parts that are being executed single threaded. Combined with a moderate/slow storage this leads to excessive backup times.

By profiling the I/O speed of our storage we found that it can deliver larger throughput when more I/Os are queued (i.e. larger queue depth) (as for example in multi threaded operations). On the other hand mariabackup is only partially run multi threaded.

It seems that mariabackup has 3 basic stages in terms of time taken to complete each (InnoDB database with >100K tables):

  • open tables - single threaded
  • stream ibd - multi threaded
  • (lock tables)
  • stream frm files - single threaded
  • (unlock tables)

FROM these phases open tables & stream frm files is done in a single threaded manner as can be verified by looking at the logs and monitoring tools (atop) during mariabackup.

Command run is

$ mariabackup --backup --user=root --galera-info --parallel=16 --stream=xbstream --ftwrl-wait-timeout=120 --ftwrl-wait-threshold=30 | gzip > /var/backups/mariabackup/blogs-test.gz

Attached is

  • an annotated excerpt of a slow mariabackup operation.
  • some source code annotations for the copy ibd and copy frm stages, the latter being executed in the main thead, while the former in a multitheaded manner.

So the issue is that file operations involving large number of files (as the case of innodb databses with the innodb_file_per_table setting (default) and 100K databases) are partially done in a multi threaded manner and partially in a single threaded manner.

It would be interesting if the open tables phase and copy frm files phase could be also made multi threaded. This would make a benefit for storages that can sustain larger queue depths (providing larger throughput for larger queue depths). I would expect this to be applied to modern ssds also that have this characteristic.



 Comments   
Comment by Jochen [ 2022-02-23 ]

I see you are using gzip. I changed that to pigz to have multithreaded zipping, too. This speeds up the process measurably.

Generated at Thu Feb 08 09:41:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.