Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31135

Missing mbstream parallel option in SST script

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.12
    • 10.6
    • Galera SST
    • Linux 6.2.11-gentoo #1 SMP PREEMPT_DYNAMIC x86_64 AMD EPYC 7451 24-Core GNU/Linux

    Description

      Use case: a cluster connected with 10Gbp network and big (over 2TB of size) dataset residing on fast (NVMe) drives.

      The problem: SST takes over 8 hours to complete between the nodes. This is at least inconvenient.
      With latest versions of mariadb and galera (valid for mariadb versions 10.4 - 10.6) regardless of network settings and was noticed that data transfer speed is about 700Mbps with compression (zstd) and never gets higher, while no resource on either donoring node or receiving side seems to be bottleneck: CPU is not loaded, no %IOwait is visible on any of them, other resources are also not utilized.
      It was also found that network is not a bottleneck, too: in case parallel file transfer over network starts, SST speed doesn't change and NIC indicates total network utilization much higher than 1Gpbs.

      Settings we have:

      [mariabackup]
      parallel=8
      use-memory=4G
       
      [SST]
      inno-backup-opts="--parallel=8"
      compressor='/usr/bin/zstd -2 --threads=8'
      decompressor='/usr/bin/zstd -d'
      

      Compression type and number of threads were set as a result of benchmarking, the fastest algorithm we set has speed of 450MBytes per second with reading source data from mysql directory and transferring data over the network with discarding it on recipient side (`nc -l [port] | zstd -d --stdout >> /dev/null`)

      On further investigation, it was found that the bottleneck is mbstream process on receiving node.

      Problem details.
      SST transfer is performed by the `usr/bin/wsrep_sst_mariabackup` script. This script parses [SST] section (thus `parallel=8` option in `[mariabackup]` section is ignored and separate option with misleading name "inno-backup-opts" is required.) and according to settings, establish a transfer through the network, according to settings:

      • on donor side: mariabackup -> [optional tar] -> [archiver or `pv`] -> [netcat or socat]
      • on recipient side: [netcat or socat] -> [optional unarchiver] -> [untar or mbstream] -> data directory (/var/lib/mysql)

      It was found that bottleneck is recipient side.
      Indeed:

      • mariabackup is multithreaded
      • network has spare capacity
      • (un)archiver is multithreaded

      But mbstream always runs on 1 thread, exactly one CPU core has 100% utilization by mbstream process.
      Documentation states, that mbstream is preferred compared to tar because mbstream supports encryption and multithreading.
      Indeed, on Percona forum there's a discussion that recommends setting:

      xbstream-opts="--parallel=16"
      

      However, this option is not available for `usr/bin/wsrep_sst_mariabackup` script - it's simply absent, the script has the block:

              if [ "$WSREP_SST_OPT_ROLE" = 'joiner' ]; then
                  strmcmd="'$STREAM_BIN' -x"
              else
                  strmcmd="'$STREAM_BIN' -c '$INFO_FILE'"
              fi
      

      Where `STREAM_BIN` is path to mbstream binary.
      According to this block, mbstream command always starts with just one thread! So, while one of the reasons for using mbstream is multithreadins, there's no option to really use it.

      The result
      Indeed, when this block was modified in the way:

      648c648
      <             strmcmd="'$STREAM_BIN' -x"
      ---
      >             strmcmd="'$STREAM_BIN' -x --parallel=8"
      650c650
      <             strmcmd="'$STREAM_BIN' -c '$INFO_FILE'"
      ---
      >             strmcmd="'$STREAM_BIN' -c '$INFO_FILE' --parallel=8"
      

      thus making hardcoded 8 threads used for mbstream, network exchange speed became about 2Gbps with the same compression option, thus SST time shortened from 8 hours to 2.5!

      The request:
      To make SST faster, can MariaDB team add support for the `xbstream-opts` of `mbstream-opts` to make possible using multi-threaded writes on recipient side, thus utilizing fast network connections efficiently and shortening SST time?

      Attachments

        Issue Links

          Activity

            People

              sysprg Julius Goryavsky
              euglorg Eugene
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.