Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29940

mariabackup gets unbearably slow

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • MariaDB 10.5.9 on CentOS 8

    Description

      We are running a primary/replica setup with currently one primary and two replicas. DB size is about 550 GB with sufficiently beefy hardware: DL360 Gen10 with 24 cores, 64 GB on the primary and 128 GB on the replicas; 10 Gbit network, 1.8 TB SAS disks (RAID1) on a HPE Smart Array 408i with cache and additional 480 GB SSDs as SmartCache (relaylogs on a separate array); running CentOS 8-stream and MariaDB 10.5.9.

      To repair replication after a hardware problem, I've been using mariabackup streaming the following way to avoid having to store a temporary copy on the primary:

      replica # nc -l $local_ip $port | mbstream -p 8 -x --directory=/data/mysql_backup/
      primary# mariabackup -p 4 --innodb-read-io-threads=8 --backup --stream=mbstream | nc $replica $port
      

      This method hasn't ever been exactly speedy (although still better than dump-copy-restore) but it seems to have gotten worse between 10.3 and 10.5 (yeah, CentOS) and is downright glacial now.
      It always starts out well enough at about 180 MB/s but after just a few GB, the transfer rate nosedives and oscillates between 10 and 25 MB/s for the rest of the backup.
      CPU use is negligible at around 2% of a core and obviously neither disks nor network are maxed out (nc from /dev/zero to /dev/null pushes about 4 Gbps between the two machines; iowait gets to 1.5% on the receiving side but stays consistently <1% on the mariabackup side). The receiving side uses just slightly more CPU, between 5 and 20% of a core. The total number of files in the primary's data dir is about 2500 with just under 1400 IBD files, so it doesn't seem to have to do with mariabackup's problem with large numbers of files.

      stracing the mariabackup process shows a bunch of pread64 calls with fairly small 64k blocks but that's about the only thing that stands out and shouldn't have any such effect either. The strangest thing is that there are periods when iowait is zero on the sending side and load goes down to around 0.25, most of which is probably from other jobs like mostly idling mysqld, puppet and monitoring stuff.

      The slowdown is severe enough that I think there must be a bug. Any ideas would be highly appreciated, even if it's just "it was fixed a decade ago, upgrade".

      Attachments

        Activity

          People

            Unassigned Unassigned
            mbe Matthias Bethke
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.