Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25931

Mariabackup is slow for databases with 100K tables due to single threaded operations

    XMLWordPrintable

Details

    Description

      Hello

      By analyzing the phases of mariabackup it seems that there are parts that are being executed single threaded. Combined with a moderate/slow storage this leads to excessive backup times.

      By profiling the I/O speed of our storage we found that it can deliver larger throughput when more I/Os are queued (i.e. larger queue depth) (as for example in multi threaded operations). On the other hand mariabackup is only partially run multi threaded.

      It seems that mariabackup has 3 basic stages in terms of time taken to complete each (InnoDB database with >100K tables):

      • open tables - single threaded
      • stream ibd - multi threaded
      • (lock tables)
      • stream frm files - single threaded
      • (unlock tables)

      FROM these phases open tables & stream frm files is done in a single threaded manner as can be verified by looking at the logs and monitoring tools (atop) during mariabackup.

      Command run is

      $ mariabackup --backup --user=root --galera-info --parallel=16 --stream=xbstream --ftwrl-wait-timeout=120 --ftwrl-wait-threshold=30 | gzip > /var/backups/mariabackup/blogs-test.gz

      Attached is

      • an annotated excerpt of a slow mariabackup operation.
      • some source code annotations for the copy ibd and copy frm stages, the latter being executed in the main thead, while the former in a multitheaded manner.

      So the issue is that file operations involving large number of files (as the case of innodb databses with the innodb_file_per_table setting (default) and 100K databases) are partially done in a multi threaded manner and partially in a single threaded manner.

      It would be interesting if the open tables phase and copy frm files phase could be also made multi threaded. This would make a benefit for storages that can sustain larger queue depths (providing larger throughput for larger queue depths). I would expect this to be applied to modern ssds also that have this characteristic.

      Attachments

        Activity

          People

            Unassigned Unassigned
            basos Vasilis G
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.