[MDEV-34850] Unable to create backup with mariabackup: try increasing innodb_log_file_size - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.11.9
Fix Version/s: 10.11.10, 11.2.6, 11.4.4, 11.6.2
Component/s: mariabackup, Storage Engine - InnoDB
Labels:
None
Environment:
Rocky Linux 9.4
10.11.9-1.el9.x86_64

Description

I am running a huge database of about 3TB of data with MariaDB 10.11.9 on a mirrored ZFS vdev with SSDs.

Whenever I try to use Mariabackup with this database, it fails near the end with a message like:

mariabackup --target-dir=/mnt/ --slave-info --backup
...
[00] FATAL ERROR: [timestamp] Was only able to copy log from 148582842890306 to 148638022288286, not 148676138598341; try increasing innodb_log_file_size

The innodb_log_file_size is 32GB; I even tried 128GB, but it does not fix the problem.

Some time after Mariabackup starts, it repeatedly complains until the end with the same message:

Retrying read of log at LSN=148638022288286

The backuptarget has plenty of space free.

I increased innodb_log_file_size and tried this on three different slave-servers (all handle about 1TB to 3TB data) with the very same result. After a few hours of mariabackup-runtime, the backup fails with this error message.

The expectation is, that the backup should finishes successfully.

Attachments

Issue Links

causes

MDEV-35699 Multi-batch recovery occasionally fails

Closed

is blocked by

MDEV-34062 mariadb-backup --backup is extremely slow at copying ib_logfile0

Closed

relates to

MDEV-23947 reflink support for mariabackup

Open

MDEV-35830 innodb_undo_log_truncate=ON during mariadb-backup --backup does not work as intended

Closed

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Marko Mäkelä added a comment - 2024-09-12 09:08

To by understanding, ZFS would support cp --reflink=always but mariadb-backup is yet to implement that (MDEV-23947).

Marko Mäkelä added a comment - 2024-09-12 09:08 To by understanding, ZFS would support cp --reflink=always but mariadb-backup is yet to implement that ( MDEV-23947 ).

Marko Mäkelä added a comment - 2024-09-27 06:43

Now that ~~MDEV-34062~~ has been merged and the Boolean parameter innodb_log_file_mmap introduced, mariadb-backup --backup on 64-bit Linux and FreeBSD will by default parse the ib_logfile0 via a memory mapping. That is roughly equivalent to an unlimited innodb_log_buffer_size (during recovery and backup only). I conducted a small test:

sysbench oltp_update_index --mysql-socket="$MYSQL_SOCK" --mysql-user="$MYSQL_USER" --mysql-db=test --tables=64 --table_size=100000 prepare &

sleep 60

mariadb-backup -u"$MYSQL_USER" --socket="$MYSQL_SOCK" --backup --target-dir=/dev/shm/backup

This was with a server using 10 GiB log file and buffer pool on, the data directory on /dev/shm. The backup would start while the 48th table was being created. Here is an interesting excerpt of the output (from both sysbench and mariadb-backup:

10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5
2024-09-26 23:14:15 0 [Note] InnoDB: Memory-mapped log (block size=512 bytes)
[00] 2024-09-26 23:14:15 mariabackup: Generating a list of tablespaces
Creating a secondary index on 'sbtest48'...
Creating table 'sbtest49'...
Inserting 100000 records into 'sbtest49'
…
[00] 2024-09-26 23:11:18 Finished backing up non-InnoDB tables and files
[00] 2024-09-26 23:11:18 Waiting for log copy thread to read lsn 1445003635
…
[00] 2024-09-26 23:14:25 Redo log (from LSN 47165 to 1419510721) was copied.
Creating table 'sbtest51'...
Inserting 100000 records into 'sbtest51'
[00] 2024-09-26 23:14:25 completed OK!

We can see that the backup completed in 10 seconds, and it had copied 1,419,463,556 bytes of log records and about 2.8 GiB in total. Not that impressive. I conducted another test with a further optimization to the log file parsing:

10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328
Creating table 'sbtest53'...
Inserting 100000 records into 'sbtest53'
[00] 2024-09-26 23:31:10 Connecting to MariaDB server host: localhost, user: root, password: not set, port: not set, socket: /dev/shm/10.11/sock
…
2024-09-26 23:31:10 0 [Note] InnoDB: Number of transaction pools: 1
2024-09-26 23:31:10 0 [Note] InnoDB: Memory-mapped log (block size=512 bytes)
[00] 2024-09-26 23:31:10 mariabackup: Generating a list of tablespaces
…
[00] 2024-09-26 23:31:13 Finished backing up non-InnoDB tables and files
[00] 2024-09-26 23:31:13 Waiting for log copy thread to read lsn 1548591292
Creating a secondary index on 'sbtest55'...
…
[00] 2024-09-26 23:31:20 Redo log (from LSN 47101 to 1559097509) was copied.
Creating table 'sbtest56'...
Inserting 100000 records into 'sbtest56'
[00] 2024-09-26 23:31:21 completed OK!

Even though this run started a little longer into the workload (while the 55th table was being created instead of the 48th one) and therefore had to parse and copy more log, it completed in similar time (11 seconds). The size of the backup was 3.1 GiB instead of 2.8 GiB.

Based on this, I think that we can already conclude that innodb_log_file_mmap=ON should fix the most pressing issue. I tested a similar workload in ~~MDEV-34062~~ with innodb_log_file_mmap=OFF.

Next, I will conduct some tests with more deterministic input: killing a server that uses a large buffer pool, and starting a crash recovery using a tiny buffer pool, so that it is forced to use multiple batches. I had tested this type of scenario in ~~MDEV-29911~~. I believe that the simplified log record parser could benefit this case as well.

Marko Mäkelä added a comment - 2024-09-27 06:43 Now that MDEV-34062 has been merged and the Boolean parameter innodb_log_file_mmap introduced, mariadb-backup --backup on 64-bit Linux and FreeBSD will by default parse the ib_logfile0 via a memory mapping. That is roughly equivalent to an unlimited innodb_log_buffer_size (during recovery and backup only). I conducted a small test: sysbench oltp_update_index --mysql-socket="$MYSQL_SOCK" --mysql-user="$MYSQL_USER" --mysql-db=test --tables=64 --table_size=100000 prepare & sleep 60 mariadb-backup -u"$MYSQL_USER" --socket="$MYSQL_SOCK" --backup --target-dir=/dev/shm/backup This was with a server using 10 GiB log file and buffer pool on, the data directory on /dev/shm . The backup would start while the 48th table was being created. Here is an interesting excerpt of the output (from both sysbench and mariadb-backup : 10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5 2024-09-26 23:14:15 0 [Note] InnoDB: Memory-mapped log (block size=512 bytes) [00] 2024-09-26 23:14:15 mariabackup: Generating a list of tablespaces Creating a secondary index on 'sbtest48'... Creating table 'sbtest49'... Inserting 100000 records into 'sbtest49' … [00] 2024-09-26 23:11:18 Finished backing up non-InnoDB tables and files [00] 2024-09-26 23:11:18 Waiting for log copy thread to read lsn 1445003635 … [00] 2024-09-26 23:14:25 Redo log (from LSN 47165 to 1419510721) was copied. Creating table 'sbtest51'... Inserting 100000 records into 'sbtest51' [00] 2024-09-26 23:14:25 completed OK! We can see that the backup completed in 10 seconds, and it had copied 1,419,463,556 bytes of log records and about 2.8 GiB in total. Not that impressive. I conducted another test with a further optimization to the log file parsing: 10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328 Creating table 'sbtest53'... Inserting 100000 records into 'sbtest53' [00] 2024-09-26 23:31:10 Connecting to MariaDB server host: localhost, user: root, password: not set, port: not set, socket: /dev/shm/10.11/sock … 2024-09-26 23:31:10 0 [Note] InnoDB: Number of transaction pools: 1 2024-09-26 23:31:10 0 [Note] InnoDB: Memory-mapped log (block size=512 bytes) [00] 2024-09-26 23:31:10 mariabackup: Generating a list of tablespaces … [00] 2024-09-26 23:31:13 Finished backing up non-InnoDB tables and files [00] 2024-09-26 23:31:13 Waiting for log copy thread to read lsn 1548591292 Creating a secondary index on 'sbtest55'... … [00] 2024-09-26 23:31:20 Redo log (from LSN 47101 to 1559097509) was copied. Creating table 'sbtest56'... Inserting 100000 records into 'sbtest56' [00] 2024-09-26 23:31:21 completed OK! Even though this run started a little longer into the workload (while the 55th table was being created instead of the 48th one) and therefore had to parse and copy more log, it completed in similar time (11 seconds). The size of the backup was 3.1 GiB instead of 2.8 GiB. Based on this, I think that we can already conclude that innodb_log_file_mmap=ON should fix the most pressing issue. I tested a similar workload in MDEV-34062 with innodb_log_file_mmap=OFF . Next, I will conduct some tests with more deterministic input: killing a server that uses a large buffer pool, and starting a crash recovery using a tiny buffer pool, so that it is forced to use multiple batches. I had tested this type of scenario in MDEV-29911 . I believe that the simplified log record parser could benefit this case as well.

Marko Mäkelä added a comment - 2024-09-27 07:31

One more time, I initialized a server from the scratch, using 10GiB buffer pool and log (although 2 GiB of log would have been sufficient), and ran the above sysbench prepare. While it was almost completed (creating and populating the 62th table), I sent SIGKILL to the mariadbd process. I then copied this data directory and did rm -fr test so that the recovery will fail soon after the interesting step:

10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5
2024-09-27 0:01:54 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47149
2024-09-27 0:01:54 0 [Note] InnoDB: Multi-batch recovery needed at LSN 1305342
2024-09-27 0:01:56 0 [Note] InnoDB: End of log at LSN=1756766550
2024-09-27 0:01:56 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/sbtest1.ibd.

Between "Multi-batch recovery needed" and "End of log", parsing the log records consumed 2 seconds. With the optimized parsing, it seems to be faster. A difference of 1 second in the execution time could be attributed to a rounding error.

10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328
2024-09-27 0:06:45 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47149
2024-09-27 0:06:45 0 [Note] InnoDB: Multi-batch recovery needed at LSN 1305342
2024-09-27 0:06:46 0 [Note] InnoDB: End of log at LSN=1756766550
2024-09-27 0:06:46 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/sbtest1.ibd.

I ran the same once more with some instrumentation:

perf record sql/mariadbd --datadir=/dev/shm/recover --innodb-buffer-pool-size=5m --innodb-log-file-mmap=ON

Here is the perf report output for anything that used more than 0.5% of cycles, with a total number of 4,315,397,899 events.

10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5
61.09% mariadbd-base mariadbd-base [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, false>(recv_buf&, bool)
14.89% mariadbd-base [kernel.kallsyms] [k] _copy_to_iter
9.62% mariadbd-base mariadbd-base [.] crc32_avx512(unsigned int, char const*, unsigned long, crc32_tab const&)
3.70% mariadbd-base mariadbd-base [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mmap<false>(bool)
1.58% mariadbd-base [kernel.kallsyms] [k] shmem_file_read_iter
1.29% mariadbd-base [kernel.kallsyms] [k] filemap_get_entry
0.79% mariadbd-base mariadbd-base [.] recv_scan_log(bool)
0.73% mariadbd-base mariadbd-base [.] mlog_init_t::add(page_id_t, unsigned long) [clone .isra.0]

and for the simplification (3,459,690,538 events in total):

10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328
50.68% mariadbd mariadbd [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, (recv_sys_t::store)0>(recv_buf&, bool)
19.53% mariadbd [kernel.kallsyms] [k] _copy_to_iter
14.84% mariadbd mariadbd [.] crc32_avx512(unsigned int, char const*, unsigned long, crc32_tab const&)
3.83% mariadbd mariadbd [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mmap<(recv_sys_t::store)0>(bool)
2.23% mariadbd [kernel.kallsyms] [k] filemap_get_entry
1.58% mariadbd [kernel.kallsyms] [k] shmem_file_read_iter
0.74% mariadbd mariadbd [.] recv_scan_log(bool)
0.57% mariadbd [kernel.kallsyms] [k] shmem_get_folio_gfp

We seem to be saving 19.8% of CPU cycles here, computed from the total number of events (3,459,690,538/4,315,397,899).

Marko Mäkelä added a comment - 2024-09-27 07:31 One more time, I initialized a server from the scratch, using 10GiB buffer pool and log (although 2 GiB of log would have been sufficient), and ran the above sysbench prepare . While it was almost completed (creating and populating the 62th table), I sent SIGKILL to the mariadbd process. I then copied this data directory and did rm -fr test so that the recovery will fail soon after the interesting step: 10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5 2024-09-27 0:01:54 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47149 2024-09-27 0:01:54 0 [Note] InnoDB: Multi-batch recovery needed at LSN 1305342 2024-09-27 0:01:56 0 [Note] InnoDB: End of log at LSN=1756766550 2024-09-27 0:01:56 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/sbtest1.ibd. Between "Multi-batch recovery needed" and "End of log", parsing the log records consumed 2 seconds. With the optimized parsing, it seems to be faster. A difference of 1 second in the execution time could be attributed to a rounding error. 10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328 2024-09-27 0:06:45 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47149 2024-09-27 0:06:45 0 [Note] InnoDB: Multi-batch recovery needed at LSN 1305342 2024-09-27 0:06:46 0 [Note] InnoDB: End of log at LSN=1756766550 2024-09-27 0:06:46 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/sbtest1.ibd. I ran the same once more with some instrumentation: perf record sql/mariadbd --datadir=/dev/shm/recover --innodb-buffer-pool-size=5m --innodb-log-file-mmap=ON Here is the perf report output for anything that used more than 0.5% of cycles , with a total number of 4,315,397,899 events. 10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5 61.09% mariadbd-base mariadbd-base [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, false>(recv_buf&, bool) 14.89% mariadbd-base [kernel.kallsyms] [k] _copy_to_iter 9.62% mariadbd-base mariadbd-base [.] crc32_avx512(unsigned int, char const*, unsigned long, crc32_tab const&) 3.70% mariadbd-base mariadbd-base [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mmap<false>(bool) 1.58% mariadbd-base [kernel.kallsyms] [k] shmem_file_read_iter 1.29% mariadbd-base [kernel.kallsyms] [k] filemap_get_entry 0.79% mariadbd-base mariadbd-base [.] recv_scan_log(bool) 0.73% mariadbd-base mariadbd-base [.] mlog_init_t::add(page_id_t, unsigned long) [clone .isra.0] and for the simplification (3,459,690,538 events in total): 10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328 50.68% mariadbd mariadbd [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, (recv_sys_t::store)0>(recv_buf&, bool) 19.53% mariadbd [kernel.kallsyms] [k] _copy_to_iter 14.84% mariadbd mariadbd [.] crc32_avx512(unsigned int, char const*, unsigned long, crc32_tab const&) 3.83% mariadbd mariadbd [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mmap<(recv_sys_t::store)0>(bool) 2.23% mariadbd [kernel.kallsyms] [k] filemap_get_entry 1.58% mariadbd [kernel.kallsyms] [k] shmem_file_read_iter 0.74% mariadbd mariadbd [.] recv_scan_log(bool) 0.57% mariadbd [kernel.kallsyms] [k] shmem_get_folio_gfp We seem to be saving 19.8% of CPU cycles here, computed from the total number of events (3,459,690,538/4,315,397,899).

Debarun Banerjee added a comment - 2024-10-03 09:52

Thanks for the patch. Looks good to me.

Debarun Banerjee added a comment - 2024-10-03 09:52 Thanks for the patch. Looks good to me.

Marko Mäkelä added a comment - 2024-10-04 11:48

It turned out that we do need to process the INIT_PAGE and FREE_PAGE records in the skip_the_rest: loop in the same way as before this change. Otherwise, tests like innodb.recovery_memory, innodb_gis.types and innodb.undo_truncate_recover could occasionally fail. During mariadb-backup --backup, we only have to deal with FILE_ records and validate the mini-transaction checksums.

Marko Mäkelä added a comment - 2024-10-04 11:48 It turned out that we do need to process the INIT_PAGE and FREE_PAGE records in the skip_the_rest: loop in the same way as before this change. Otherwise, tests like innodb.recovery_memory , innodb_gis.types and innodb.undo_truncate_recover could occasionally fail. During mariadb-backup --backup , we only have to deal with FILE_ records and validate the mini-transaction checksums.

MariaDB Server

Unable to create backup with mariabackup: try increasing innodb_log_file_size

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration