Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34850

Unable to create backup with mariabackup: try increasing innodb_log_file_size

Details

    Description

      I am running a huge database of about 3TB of data with MariaDB 10.11.9 on a mirrored ZFS vdev with SSDs.

      Whenever I try to use Mariabackup with this database, it fails near the end with a message like:

      mariabackup --target-dir=/mnt/ --slave-info --backup
      ...
      [00] FATAL ERROR: [timestamp] Was only able to copy log from 148582842890306 to 148638022288286, not 148676138598341; try increasing innodb_log_file_size

      The innodb_log_file_size is 32GB; I even tried 128GB, but it does not fix the problem.

      Some time after Mariabackup starts, it repeatedly complains until the end with the same message:

      Retrying read of log at LSN=148638022288286

      The backuptarget has plenty of space free.

      I increased innodb_log_file_size and tried this on three different slave-servers (all handle about 1TB to 3TB data) with the very same result. After a few hours of mariabackup-runtime, the backup fails with this error message.

      The expectation is, that the backup should finishes successfully.

      Attachments

        Issue Links

          Activity

            To by understanding, ZFS would support cp --reflink=always but mariadb-backup is yet to implement that (MDEV-23947).

            marko Marko Mäkelä added a comment - To by understanding, ZFS would support cp --reflink=always but mariadb-backup is yet to implement that ( MDEV-23947 ).

            Now that MDEV-34062 has been merged and the Boolean parameter innodb_log_file_mmap introduced, mariadb-backup --backup on 64-bit Linux and FreeBSD will by default parse the ib_logfile0 via a memory mapping. That is roughly equivalent to an unlimited innodb_log_buffer_size (during recovery and backup only). I conducted a small test:

            sysbench oltp_update_index --mysql-socket="$MYSQL_SOCK" --mysql-user="$MYSQL_USER" --mysql-db=test --tables=64 --table_size=100000 prepare &
            sleep 60
            mariadb-backup -u"$MYSQL_USER" --socket="$MYSQL_SOCK" --backup --target-dir=/dev/shm/backup
            

            This was with a server using 10 GiB log file and buffer pool on, the data directory on /dev/shm. The backup would start while the 48th table was being created. Here is an interesting excerpt of the output (from both sysbench and mariadb-backup:

            10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5

            2024-09-26 23:14:15 0 [Note] InnoDB: Memory-mapped log (block size=512 bytes)
            [00] 2024-09-26 23:14:15 mariabackup: Generating a list of tablespaces
            Creating a secondary index on 'sbtest48'...
            Creating table 'sbtest49'...
            Inserting 100000 records into 'sbtest49'
            …
            [00] 2024-09-26 23:11:18 Finished backing up non-InnoDB tables and files
            [00] 2024-09-26 23:11:18 Waiting for log copy thread to read lsn 1445003635
            …
            [00] 2024-09-26 23:14:25 Redo log (from LSN 47165 to 1419510721) was copied.
            Creating table 'sbtest51'...
            Inserting 100000 records into 'sbtest51'
            [00] 2024-09-26 23:14:25 completed OK!
            

            We can see that the backup completed in 10 seconds, and it had copied 1,419,463,556 bytes of log records and about 2.8 GiB in total. Not that impressive. I conducted another test with a further optimization to the log file parsing:

            10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328

            Creating table 'sbtest53'...
            Inserting 100000 records into 'sbtest53'
            [00] 2024-09-26 23:31:10 Connecting to MariaDB server host: localhost, user: root, password: not set, port: not set, socket: /dev/shm/10.11/sock
            …
            2024-09-26 23:31:10 0 [Note] InnoDB: Number of transaction pools: 1
            2024-09-26 23:31:10 0 [Note] InnoDB: Memory-mapped log (block size=512 bytes)
            [00] 2024-09-26 23:31:10 mariabackup: Generating a list of tablespaces
            …
            [00] 2024-09-26 23:31:13 Finished backing up non-InnoDB tables and files
            [00] 2024-09-26 23:31:13 Waiting for log copy thread to read lsn 1548591292
            Creating a secondary index on 'sbtest55'...
            …
            [00] 2024-09-26 23:31:20 Redo log (from LSN 47101 to 1559097509) was copied.
            Creating table 'sbtest56'...
            Inserting 100000 records into 'sbtest56'
            [00] 2024-09-26 23:31:21 completed OK!
            

            Even though this run started a little longer into the workload (while the 55th table was being created instead of the 48th one) and therefore had to parse and copy more log, it completed in similar time (11 seconds). The size of the backup was 3.1 GiB instead of 2.8 GiB.

            Based on this, I think that we can already conclude that innodb_log_file_mmap=ON should fix the most pressing issue. I tested a similar workload in MDEV-34062 with innodb_log_file_mmap=OFF.

            Next, I will conduct some tests with more deterministic input: killing a server that uses a large buffer pool, and starting a crash recovery using a tiny buffer pool, so that it is forced to use multiple batches. I had tested this type of scenario in MDEV-29911. I believe that the simplified log record parser could benefit this case as well.

            marko Marko Mäkelä added a comment - Now that MDEV-34062 has been merged and the Boolean parameter innodb_log_file_mmap introduced, mariadb-backup --backup on 64-bit Linux and FreeBSD will by default parse the ib_logfile0 via a memory mapping. That is roughly equivalent to an unlimited innodb_log_buffer_size (during recovery and backup only). I conducted a small test: sysbench oltp_update_index --mysql-socket="$MYSQL_SOCK" --mysql-user="$MYSQL_USER" --mysql-db=test --tables=64 --table_size=100000 prepare & sleep 60 mariadb-backup -u"$MYSQL_USER" --socket="$MYSQL_SOCK" --backup --target-dir=/dev/shm/backup This was with a server using 10 GiB log file and buffer pool on, the data directory on /dev/shm . The backup would start while the 48th table was being created. Here is an interesting excerpt of the output (from both sysbench and mariadb-backup : 10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5 2024-09-26 23:14:15 0 [Note] InnoDB: Memory-mapped log (block size=512 bytes) [00] 2024-09-26 23:14:15 mariabackup: Generating a list of tablespaces Creating a secondary index on 'sbtest48'... Creating table 'sbtest49'... Inserting 100000 records into 'sbtest49' … [00] 2024-09-26 23:11:18 Finished backing up non-InnoDB tables and files [00] 2024-09-26 23:11:18 Waiting for log copy thread to read lsn 1445003635 … [00] 2024-09-26 23:14:25 Redo log (from LSN 47165 to 1419510721) was copied. Creating table 'sbtest51'... Inserting 100000 records into 'sbtest51' [00] 2024-09-26 23:14:25 completed OK! We can see that the backup completed in 10 seconds, and it had copied 1,419,463,556 bytes of log records and about 2.8 GiB in total. Not that impressive. I conducted another test with a further optimization to the log file parsing: 10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328 Creating table 'sbtest53'... Inserting 100000 records into 'sbtest53' [00] 2024-09-26 23:31:10 Connecting to MariaDB server host: localhost, user: root, password: not set, port: not set, socket: /dev/shm/10.11/sock … 2024-09-26 23:31:10 0 [Note] InnoDB: Number of transaction pools: 1 2024-09-26 23:31:10 0 [Note] InnoDB: Memory-mapped log (block size=512 bytes) [00] 2024-09-26 23:31:10 mariabackup: Generating a list of tablespaces … [00] 2024-09-26 23:31:13 Finished backing up non-InnoDB tables and files [00] 2024-09-26 23:31:13 Waiting for log copy thread to read lsn 1548591292 Creating a secondary index on 'sbtest55'... … [00] 2024-09-26 23:31:20 Redo log (from LSN 47101 to 1559097509) was copied. Creating table 'sbtest56'... Inserting 100000 records into 'sbtest56' [00] 2024-09-26 23:31:21 completed OK! Even though this run started a little longer into the workload (while the 55th table was being created instead of the 48th one) and therefore had to parse and copy more log, it completed in similar time (11 seconds). The size of the backup was 3.1 GiB instead of 2.8 GiB. Based on this, I think that we can already conclude that innodb_log_file_mmap=ON should fix the most pressing issue. I tested a similar workload in MDEV-34062 with innodb_log_file_mmap=OFF . Next, I will conduct some tests with more deterministic input: killing a server that uses a large buffer pool, and starting a crash recovery using a tiny buffer pool, so that it is forced to use multiple batches. I had tested this type of scenario in MDEV-29911 . I believe that the simplified log record parser could benefit this case as well.

            One more time, I initialized a server from the scratch, using 10GiB buffer pool and log (although 2 GiB of log would have been sufficient), and ran the above sysbench prepare. While it was almost completed (creating and populating the 62th table), I sent SIGKILL to the mariadbd process. I then copied this data directory and did rm -fr test so that the recovery will fail soon after the interesting step:

            10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5

            2024-09-27  0:01:54 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47149
            2024-09-27  0:01:54 0 [Note] InnoDB: Multi-batch recovery needed at LSN 1305342
            2024-09-27  0:01:56 0 [Note] InnoDB: End of log at LSN=1756766550
            2024-09-27  0:01:56 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/sbtest1.ibd.
            

            Between "Multi-batch recovery needed" and "End of log", parsing the log records consumed 2 seconds. With the optimized parsing, it seems to be faster. A difference of 1 second in the execution time could be attributed to a rounding error.

            10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328

            2024-09-27  0:06:45 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47149
            2024-09-27  0:06:45 0 [Note] InnoDB: Multi-batch recovery needed at LSN 1305342
            2024-09-27  0:06:46 0 [Note] InnoDB: End of log at LSN=1756766550
            2024-09-27  0:06:46 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/sbtest1.ibd.
            

            I ran the same once more with some instrumentation:

            perf record sql/mariadbd --datadir=/dev/shm/recover --innodb-buffer-pool-size=5m --innodb-log-file-mmap=ON
            

            Here is the perf report output for anything that used more than 0.5% of cycles, with a total number of 4,315,397,899 events.

            10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5

              61.09%  mariadbd-base  mariadbd-base         [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, false>(recv_buf&, bool)
              14.89%  mariadbd-base  [kernel.kallsyms]     [k] _copy_to_iter
               9.62%  mariadbd-base  mariadbd-base         [.] crc32_avx512(unsigned int, char const*, unsigned long, crc32_tab const&)
               3.70%  mariadbd-base  mariadbd-base         [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mmap<false>(bool)
               1.58%  mariadbd-base  [kernel.kallsyms]     [k] shmem_file_read_iter
               1.29%  mariadbd-base  [kernel.kallsyms]     [k] filemap_get_entry
               0.79%  mariadbd-base  mariadbd-base         [.] recv_scan_log(bool)
               0.73%  mariadbd-base  mariadbd-base         [.] mlog_init_t::add(page_id_t, unsigned long) [clone .isra.0]
            

            and for the simplification (3,459,690,538 events in total):

            10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328

              50.68%  mariadbd  mariadbd              [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, (recv_sys_t::store)0>(recv_buf&, bool)
              19.53%  mariadbd  [kernel.kallsyms]     [k] _copy_to_iter
              14.84%  mariadbd  mariadbd              [.] crc32_avx512(unsigned int, char const*, unsigned long, crc32_tab const&)
               3.83%  mariadbd  mariadbd              [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mmap<(recv_sys_t::store)0>(bool)
               2.23%  mariadbd  [kernel.kallsyms]     [k] filemap_get_entry
               1.58%  mariadbd  [kernel.kallsyms]     [k] shmem_file_read_iter
               0.74%  mariadbd  mariadbd              [.] recv_scan_log(bool)
               0.57%  mariadbd  [kernel.kallsyms]     [k] shmem_get_folio_gfp
            

            We seem to be saving 19.8% of CPU cycles here, computed from the total number of events (3,459,690,538/4,315,397,899).

            marko Marko Mäkelä added a comment - One more time, I initialized a server from the scratch, using 10GiB buffer pool and log (although 2 GiB of log would have been sufficient), and ran the above sysbench prepare . While it was almost completed (creating and populating the 62th table), I sent SIGKILL to the mariadbd process. I then copied this data directory and did rm -fr test so that the recovery will fail soon after the interesting step: 10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5 2024-09-27 0:01:54 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47149 2024-09-27 0:01:54 0 [Note] InnoDB: Multi-batch recovery needed at LSN 1305342 2024-09-27 0:01:56 0 [Note] InnoDB: End of log at LSN=1756766550 2024-09-27 0:01:56 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/sbtest1.ibd. Between "Multi-batch recovery needed" and "End of log", parsing the log records consumed 2 seconds. With the optimized parsing, it seems to be faster. A difference of 1 second in the execution time could be attributed to a rounding error. 10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328 2024-09-27 0:06:45 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47149 2024-09-27 0:06:45 0 [Note] InnoDB: Multi-batch recovery needed at LSN 1305342 2024-09-27 0:06:46 0 [Note] InnoDB: End of log at LSN=1756766550 2024-09-27 0:06:46 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/sbtest1.ibd. I ran the same once more with some instrumentation: perf record sql/mariadbd --datadir=/dev/shm/recover --innodb-buffer-pool-size=5m --innodb-log-file-mmap=ON Here is the perf report output for anything that used more than 0.5% of cycles , with a total number of 4,315,397,899 events. 10.11 6acada713a95a605b3aa65bd519fbd4532ad23e5 61.09% mariadbd-base mariadbd-base [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, false>(recv_buf&, bool) 14.89% mariadbd-base [kernel.kallsyms] [k] _copy_to_iter 9.62% mariadbd-base mariadbd-base [.] crc32_avx512(unsigned int, char const*, unsigned long, crc32_tab const&) 3.70% mariadbd-base mariadbd-base [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mmap<false>(bool) 1.58% mariadbd-base [kernel.kallsyms] [k] shmem_file_read_iter 1.29% mariadbd-base [kernel.kallsyms] [k] filemap_get_entry 0.79% mariadbd-base mariadbd-base [.] recv_scan_log(bool) 0.73% mariadbd-base mariadbd-base [.] mlog_init_t::add(page_id_t, unsigned long) [clone .isra.0] and for the simplification (3,459,690,538 events in total): 10.11-MDEV-34907 62523f03fd9b9bb669bd35103c2b82da8211b328 50.68% mariadbd mariadbd [.] recv_sys_t::parse_mtr_result recv_sys_t::parse<recv_buf, (recv_sys_t::store)0>(recv_buf&, bool) 19.53% mariadbd [kernel.kallsyms] [k] _copy_to_iter 14.84% mariadbd mariadbd [.] crc32_avx512(unsigned int, char const*, unsigned long, crc32_tab const&) 3.83% mariadbd mariadbd [.] recv_sys_t::parse_mtr_result recv_sys_t::parse_mmap<(recv_sys_t::store)0>(bool) 2.23% mariadbd [kernel.kallsyms] [k] filemap_get_entry 1.58% mariadbd [kernel.kallsyms] [k] shmem_file_read_iter 0.74% mariadbd mariadbd [.] recv_scan_log(bool) 0.57% mariadbd [kernel.kallsyms] [k] shmem_get_folio_gfp We seem to be saving 19.8% of CPU cycles here, computed from the total number of events (3,459,690,538/4,315,397,899).

            Thanks for the patch. Looks good to me.

            debarun Debarun Banerjee added a comment - Thanks for the patch. Looks good to me.

            It turned out that we do need to process the INIT_PAGE and FREE_PAGE records in the skip_the_rest: loop in the same way as before this change. Otherwise, tests like innodb.recovery_memory, innodb_gis.types and innodb.undo_truncate_recover could occasionally fail. During mariadb-backup --backup, we only have to deal with FILE_ records and validate the mini-transaction checksums.

            marko Marko Mäkelä added a comment - It turned out that we do need to process the INIT_PAGE and FREE_PAGE records in the skip_the_rest: loop in the same way as before this change. Otherwise, tests like innodb.recovery_memory , innodb_gis.types and innodb.undo_truncate_recover could occasionally fail. During mariadb-backup --backup , we only have to deal with FILE_ records and validate the mini-transaction checksums.

            People

              marko Marko Mäkelä
              Marhesh Marhesh
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.