Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.11
Description
While preparing to test MDEV-33817 with a recovery-type workload (MDEV-29911), I noticed that in MariaDB Server 10.11 (but not 10.6), copying the InnoDB write-ahead log (innodb_logfile0) is taking an extremely long time, copying at most 3 MiB/s from a RAM disk. Here are some perf record samples from my local system:
10.11 9dfef3fb4178e44b3f3da029995ef11df8360235 on Debian linux-image-6.7.12-amd64 (2024-04-24) |
44,19% mariadb-backup mariadb-backup [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, false>(recv_buf&, bool)
|
12,35% mariadb-backup [kernel.kallsyms] [k] rep_movs_alternative
|
8,14% mariadb-backup mariadb-backup [.] crc32c_3way
|
7,12% mariadb-backup [kernel.kallsyms] [k] clear_page_erms
|
1,81% mariadb-backup [kernel.kallsyms] [k] filemap_get_entry
|
1,78% mariadb-backup [kernel.kallsyms] [k] shmem_file_read_iter
|
1,60% mariadb-backup [kernel.kallsyms] [k] mem_cgroup_commit_charge
|
1,10% mariadb-backup [kernel.kallsyms] [k] get_mem_cgroup_from_mm
|
1,10% mariadb-backup [kernel.kallsyms] [k] shmem_get_folio_gfp
|
0,96% mariadb-backup [kernel.kallsyms] [k] xas_descend
|
0,92% mariadb-backup [kernel.kallsyms] [k] get_page_from_freelist
|
0,92% mariadb-backup mariadb-backup [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mtr<false>(bool)
|
0,86% mariadb-backup [kernel.kallsyms] [k] try_charge_memcg
|
0,80% mariadb-backup mariadb-backup [.] my_crc32c
|
0,74% mariadb-backup [kernel.kallsyms] [k] __list_del_entry_valid_or_report
|
0,61% mariadb-backup [kernel.kallsyms] [k] __rmqueue_pcplist
|
0,55% mariadb-backup [kernel.kallsyms] [k] lru_gen_add_folio
|
0,55% mariadb-backup [kernel.kallsyms] [k] _copy_to_iter
|
0,50% mariadb-backup [kernel.kallsyms] [k] __rcu_read_unlock
|
0,49% mariadb-backup [kernel.kallsyms] [k] alloc_pages_mpol
|
0,49% mariadb-backup mariadb-backup [.] xtrabackup_copy_logfile()
|
The problem appears to be the default value of innodb_log_buffer_size (log_sys.buf_size), which is only 2 MiB. If I specify --innodb-log-buffer-size=1g, then xtrabackup_copy_logfile() will process much more data per a system call, the backup would finish in reasonable time during my Sysbench workload, and the top of perf report would look more reasonable as well:
10.11 9dfef3fb4178e44b3f3da029995ef11df8360235 |
31,06% mariadb-backup mariadb-backup [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, false>(recv_buf&, bool)
|
20,14% mariadb-backup [kernel.kallsyms] [k] rep_movs_alternative
|
10,01% mariadb-backup [kernel.kallsyms] [k] clear_page_erms
|
5,72% mariadb-backup mariadb-backup [.] crc32c_3way
|
3,86% mariadb-backup libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
|
1,58% mariadb-backup [kernel.kallsyms] [k] filemap_get_entry
|
1,57% mariadb-backup libc.so.6 [.] __memcmp_avx2_movbe_rtm
|
1,50% mariadb-backup [kernel.kallsyms] [k] shmem_get_folio_gfp
|
1,40% mariadb-backup [kernel.kallsyms] [k] xas_descend
|
1,04% mariadb-backup [kernel.kallsyms] [k] get_mem_cgroup_from_mm
|
1,02% mariadb-backup [kernel.kallsyms] [k] __mem_cgroup_charge
|
0,98% mariadb-backup [kernel.kallsyms] [k] __rmqueue_pcplist
|
0,96% mariadb-backup [kernel.kallsyms] [k] shmem_file_read_iter
|
0,93% mariadb-backup [kernel.kallsyms] [k] fault_in_readable
|
0,92% mariadb-backup [kernel.kallsyms] [k] get_page_from_freelist
|
0,78% mariadb-backup mariadb-backup [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mtr<false>(bool)
|
0,68% mariadb-backup [kernel.kallsyms] [k] xas_load
|
0,59% mariadb-backup [kernel.kallsyms] [k] xas_start
|
0,58% mariadb-backup [kernel.kallsyms] [k] try_charge_memcg
|
0,57% mariadb-backup [kernel.kallsyms] [k] _copy_to_iter
|
0,52% mariadb-backup [kernel.kallsyms] [k] copy_page_from_iter_atomic
|
0,52% mariadb-backup [kernel.kallsyms] [k] lru_gen_add_folio
|
0,50% mariadb-backup [kernel.kallsyms] [k] __alloc_pages
|
0,44% mariadb-backup [kernel.kallsyms] [k] shmem_add_to_page_cache
|
0,40% mariadb-backup [kernel.kallsyms] [k] generic_perform_write
|
0,39% mariadb-backup [kernel.kallsyms] [k] __mod_memcg_lruvec_state
|
0,38% mariadb-backup [kernel.kallsyms] [k] copy_page_to_iter
|
0,35% mariadb-backup [kernel.kallsyms] [k] lru_add_fn
|
0,34% mariadb-backup [kernel.kallsyms] [k] __rcu_read_lock
|
0,34% mariadb-backup mariadb-backup [.] xtrabackup_copy_logfile()
|
The large share of kernel functions should be due to file I/O. This was on /dev/shm. The function xtrabackup_copy_logfile() is always invoking pread system calls to read the ib_logfile0.
Before MDEV-14425, the default value of innodb_log_buffer_size in mariadb-backup was 1 MiB, but as far as I understand, the parameter was being ignored and the log was apparently being read RECV_SCAN_SIZE bytes at a time, that is, 4*innodb_page_size (by default, 64 KiB). This must have been acceptable when the log file was organized in blocks of 512 bytes. With the variable-size logical block format that was introduced in MDEV-14425, we’d apparently better read larger chunks at a time.
I retested this on the 72-thread test system where I originally hit the problem yesterday. At innodb_log_buffer_size=128m the backup would manage to copy the log, but the operation would take longer than 40 seconds, at which point the server had been shut down by the benchmark script.
[00] FATAL ERROR: 2024-05-02 23:10:26 failed to execute query BACKUP STAGE BLOCK_COMMIT: Server has gone away
|
At innodb_log_buffer_size=256m it is better:
[00] 2024-05-02 23:12:55 Finished backing up non-InnoDB tables and files
|
[00] 2024-05-02 23:12:55 Waiting for log copy thread to read lsn 2039243060
|
…
|
[00] 2024-05-02 23:13:03 Redo log (from LSN 46894 to 2645660131) was copied.
|
[00] 2024-05-02 23:13:03 completed OK!
|
On my local system (dual socket Xeon E5-2630 v4, 40 concurrent Sysbench connections to match the 2Ă—2Ă—10 hardware threads), the backup can barely keep up with innodb_log_buffer_size=32m; the backup would finish in 42 seconds. With innodb_log_buffer_size=128m it would finish in 11 seconds. Note that this is an inaccurate test setup; I was manually starting the backup roughly 10 seconds into the write workload, and the InnoDB data files are growing during this workload.
Possibly we should increase the default value of innodb_log_file_size, based on some more testing in various environments (NVMe, SSD, HDD of different speeds). My test setting is completely artificial; the /dev/shm/*/ib_logfile0 will be written by the server via the memory-mapped log_sys.buf while the backup is using regular file system calls. The original motivation of this test was to measure the performance of a subsequent test (mariadb-backup --prepare).
This observation is related to MDEV-30000 and MDEV-31410 too. It could be a good idea to SET GLOBAL innodb_log_file_buffering=ON in the server during the backup, and specify a larger innodb_log_buffer_size.
Ultimately, we should implement the log copying in the server process (MDEV-14992).
Attachments
Issue Links
- blocks
-
MDEV-34850 Unable to create backup with mariabackup: try increasing innodb_log_file_size
- Closed
- causes
-
MDEV-35608 Fake PMEM on /dev/shm no longer works
- Closed
- is blocked by
-
MDEV-34750 SET GLOBAL innodb_log_file_size is not crash safe
- Closed
- relates to
-
MDEV-31410 mariadb-backup prepare crash with InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT
- Needs Feedback
-
MDEV-33894 MariaDB does unexpected storage read IO for the redo log
- Closed
-
MDEV-34070 mariadb-backup --prepare with larger --use-memory may be slower
- Confirmed
-
MDEV-34307 Recovery or mariadb-backup --prepare debug failure [FATAL] InnoDB: Page ... still fixed or dirty
- Closed
-
MDEV-14425 Change the InnoDB redo log format to reduce write amplification
- Closed
-
MDEV-14992 BACKUP: in-server backup
- Open
-
MDEV-19492 Mariabackup hangs if table populated with INSERT... SELECT while it runs
- Stalled
-
MDEV-29911 InnoDB recovery and mariadb-backup --prepare fail to report detailed progress
- Closed
-
MDEV-30000 make mariadb-backup to force an innodb checkpoint
- Open
-
MDEV-31410 mariadb-backup prepare crash with InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT
- Needs Feedback
-
MDEV-33817 Implement AVX512BW and VPCLMULQDQ based CRC-32 algorithms
- Closed