Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34062

mariadb-backup --backup is extremely slow at copying ib_logfile0

    XMLWordPrintable

Details

    Description

      While preparing to test MDEV-33817 with a recovery-type workload (MDEV-29911), I noticed that in MariaDB Server 10.11 (but not 10.6), copying the InnoDB write-ahead log (innodb_logfile0) is taking an extremely long time, copying at most 3 MiB/s from a RAM disk. Here are some perf record samples from my local system:

      10.11 9dfef3fb4178e44b3f3da029995ef11df8360235 on Debian linux-image-6.7.12-amd64 (2024-04-24)

        44,19%  mariadb-backup  mariadb-backup     [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, false>(recv_buf&, bool)
        12,35%  mariadb-backup  [kernel.kallsyms]  [k] rep_movs_alternative
         8,14%  mariadb-backup  mariadb-backup     [.] crc32c_3way
         7,12%  mariadb-backup  [kernel.kallsyms]  [k] clear_page_erms
         1,81%  mariadb-backup  [kernel.kallsyms]  [k] filemap_get_entry
         1,78%  mariadb-backup  [kernel.kallsyms]  [k] shmem_file_read_iter
         1,60%  mariadb-backup  [kernel.kallsyms]  [k] mem_cgroup_commit_charge
         1,10%  mariadb-backup  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm
         1,10%  mariadb-backup  [kernel.kallsyms]  [k] shmem_get_folio_gfp
         0,96%  mariadb-backup  [kernel.kallsyms]  [k] xas_descend
         0,92%  mariadb-backup  [kernel.kallsyms]  [k] get_page_from_freelist
         0,92%  mariadb-backup  mariadb-backup     [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mtr<false>(bool)
         0,86%  mariadb-backup  [kernel.kallsyms]  [k] try_charge_memcg
         0,80%  mariadb-backup  mariadb-backup     [.] my_crc32c
         0,74%  mariadb-backup  [kernel.kallsyms]  [k] __list_del_entry_valid_or_report
         0,61%  mariadb-backup  [kernel.kallsyms]  [k] __rmqueue_pcplist
         0,55%  mariadb-backup  [kernel.kallsyms]  [k] lru_gen_add_folio
         0,55%  mariadb-backup  [kernel.kallsyms]  [k] _copy_to_iter
         0,50%  mariadb-backup  [kernel.kallsyms]  [k] __rcu_read_unlock
         0,49%  mariadb-backup  [kernel.kallsyms]  [k] alloc_pages_mpol
         0,49%  mariadb-backup  mariadb-backup     [.] xtrabackup_copy_logfile()
      

      The problem appears to be the default value of innodb_log_buffer_size (log_sys.buf_size), which is only 2 MiB. If I specify --innodb-log-buffer-size=1g, then xtrabackup_copy_logfile() will process much more data per a system call, the backup would finish in reasonable time during my Sysbench workload, and the top of perf report would look more reasonable as well:

      10.11 9dfef3fb4178e44b3f3da029995ef11df8360235

        31,06%  mariadb-backup  mariadb-backup        [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, false>(recv_buf&, bool)
        20,14%  mariadb-backup  [kernel.kallsyms]     [k] rep_movs_alternative
        10,01%  mariadb-backup  [kernel.kallsyms]     [k] clear_page_erms
         5,72%  mariadb-backup  mariadb-backup        [.] crc32c_3way
         3,86%  mariadb-backup  libc.so.6             [.] __memmove_avx_unaligned_erms_rtm
         1,58%  mariadb-backup  [kernel.kallsyms]     [k] filemap_get_entry
         1,57%  mariadb-backup  libc.so.6             [.] __memcmp_avx2_movbe_rtm
         1,50%  mariadb-backup  [kernel.kallsyms]     [k] shmem_get_folio_gfp
         1,40%  mariadb-backup  [kernel.kallsyms]     [k] xas_descend
         1,04%  mariadb-backup  [kernel.kallsyms]     [k] get_mem_cgroup_from_mm
         1,02%  mariadb-backup  [kernel.kallsyms]     [k] __mem_cgroup_charge
         0,98%  mariadb-backup  [kernel.kallsyms]     [k] __rmqueue_pcplist
         0,96%  mariadb-backup  [kernel.kallsyms]     [k] shmem_file_read_iter
         0,93%  mariadb-backup  [kernel.kallsyms]     [k] fault_in_readable
         0,92%  mariadb-backup  [kernel.kallsyms]     [k] get_page_from_freelist
         0,78%  mariadb-backup  mariadb-backup        [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mtr<false>(bool)
         0,68%  mariadb-backup  [kernel.kallsyms]     [k] xas_load
         0,59%  mariadb-backup  [kernel.kallsyms]     [k] xas_start
         0,58%  mariadb-backup  [kernel.kallsyms]     [k] try_charge_memcg
         0,57%  mariadb-backup  [kernel.kallsyms]     [k] _copy_to_iter
         0,52%  mariadb-backup  [kernel.kallsyms]     [k] copy_page_from_iter_atomic
         0,52%  mariadb-backup  [kernel.kallsyms]     [k] lru_gen_add_folio
         0,50%  mariadb-backup  [kernel.kallsyms]     [k] __alloc_pages
         0,44%  mariadb-backup  [kernel.kallsyms]     [k] shmem_add_to_page_cache
         0,40%  mariadb-backup  [kernel.kallsyms]     [k] generic_perform_write
         0,39%  mariadb-backup  [kernel.kallsyms]     [k] __mod_memcg_lruvec_state
         0,38%  mariadb-backup  [kernel.kallsyms]     [k] copy_page_to_iter
         0,35%  mariadb-backup  [kernel.kallsyms]     [k] lru_add_fn
         0,34%  mariadb-backup  [kernel.kallsyms]     [k] __rcu_read_lock
         0,34%  mariadb-backup  mariadb-backup        [.] xtrabackup_copy_logfile()
      

      The large share of kernel functions should be due to file I/O. This was on /dev/shm. The function xtrabackup_copy_logfile() is always invoking pread system calls to read the ib_logfile0.

      Before MDEV-14425, the default value of innodb_log_buffer_size in mariadb-backup was 1 MiB, but as far as I understand, the parameter was being ignored and the log was apparently being read RECV_SCAN_SIZE bytes at a time, that is, 4*innodb_page_size (by default, 64 KiB). This must have been acceptable when the log file was organized in blocks of 512 bytes. With the variable-size logical block format that was introduced in MDEV-14425, we’d apparently better read larger chunks at a time.

      I retested this on the 72-thread test system where I originally hit the problem yesterday. At innodb_log_buffer_size=128m the backup would manage to copy the log, but the operation would take longer than 40 seconds, at which point the server had been shut down by the benchmark script.

      [00] FATAL ERROR: 2024-05-02 23:10:26 failed to execute query BACKUP STAGE BLOCK_COMMIT: Server has gone away
      

      At innodb_log_buffer_size=256m it is better:

      [00] 2024-05-02 23:12:55 Finished backing up non-InnoDB tables and files
      [00] 2024-05-02 23:12:55 Waiting for log copy thread to read lsn 2039243060
      …
      [00] 2024-05-02 23:13:03 Redo log (from LSN 46894 to 2645660131) was copied.
      [00] 2024-05-02 23:13:03 completed OK!
      

      On my local system (dual socket Xeon E5-2630 v4, 40 concurrent Sysbench connections to match the 2Ă—2Ă—10 hardware threads), the backup can barely keep up with innodb_log_buffer_size=32m; the backup would finish in 42 seconds. With innodb_log_buffer_size=128m it would finish in 11 seconds. Note that this is an inaccurate test setup; I was manually starting the backup roughly 10 seconds into the write workload, and the InnoDB data files are growing during this workload.

      Possibly we should increase the default value of innodb_log_file_size, based on some more testing in various environments (NVMe, SSD, HDD of different speeds). My test setting is completely artificial; the /dev/shm/*/ib_logfile0 will be written by the server via the memory-mapped log_sys.buf while the backup is using regular file system calls. The original motivation of this test was to measure the performance of a subsequent test (mariadb-backup --prepare).

      This observation is related to MDEV-30000 and MDEV-31410 too. It could be a good idea to SET GLOBAL innodb_log_file_buffering=ON in the server during the backup, and specify a larger innodb_log_buffer_size.

      Ultimately, we should implement the log copying in the server process (MDEV-14992).

      Attachments

        Issue Links

          Activity

            People

              wlad Vladislav Vaintroub
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.