Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36159

mariabackup failed after upgrade 10.11.10

Details

    • Bug
    • Status: In Progress (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.11.10
    • 10.11
    • Backup

    Description

      Hi Team,

      We upgraded from 10.11.8 to 10.11.10 two weeks ago and since then mariabackup keeps failing with the following logs.

      [00] 2025-02-24 00:41:42 Waiting for log copy thread to read lsn 160568393706001
      [00] 2025-02-24 00:41:43 Retrying read of log at LSN=160515356496404
      [00] 2025-02-24 00:41:44 Retrying read of log at LSN=160515356496404
      [00] 2025-02-24 00:41:45 Retrying read of log at LSN=160515356496404
      [00] 2025-02-24 00:41:46 Retrying read of log at LSN=160515356496404
      [00] 2025-02-24 00:41:47 Retrying read of log at LSN=160515356496404
      [00] 2025-02-24 00:41:47 Was only able to copy log from 160487092534834 to 160515356496404, not 160568393706001; try increasing innodb_log_file_size
      mariabackup: Stopping log copying thread.
      

      I judged that this was caused by a small innodb_log_file_size value, so I tested to change it to an appropriate value and got the following results.

      • MariaDB engine ver : 10.11.10 / mariabackup engine ver : 10.11.10
        innodb_log_file_size = 1G - failed
        innodb_log_file_size = 4G - failed
        innodb_log_file_size = 8G - failed
      • MariaDB engine ver : 10.11.8 / mariabackup engine ver : 10.11.8
        innodb_log_file_size = 1G - success
        innodb_log_file_size = 4G - success
      • MariaDB engine ver : 10.11.10 / mariabackup engine ver : 10.11.8
        innodb_log_file_size = 1G - success

      Is this a new bug different from MDEV-34062?
      I would like to know if there is any impact on the acceptability of backups from engine 10.11.10 to mariabackup 10.11.8 in production environments.

      Thanks and Regard.

      Attachments

        Issue Links

          Activity

            By the way, the same problem occurs on 10.11 and 11.4. It does NOT happen on 10.6.

            hydrapolic Tomáš Mózes added a comment - By the way, the same problem occurs on 10.11 and 11.4. It does NOT happen on 10.6.
            supbaek baek seung ho added a comment - - edited

            Yesterday I have successfully backed up my stage database with mariabackup 10.11.10.
            There is some option that is not noticed in the mariabackup options page of documents.
            I think it is in the mariabackup for enterprise edition, not community, which is innodb-log-file-buffering and innodb-log-file-mmap.

            I have the following questions:
            1. when running mariabackup, if I use the --skip-innodb-log-file-buffering option, the backup completes normally, but if I run a backup with innodb-log-file-buffering turned off, is there any side effect on the server or the backup?

            2. if a backup succeeds when run with innodb-log-file-buffering disabled, is there a reason for this? In our tests we noticed that ib_logfile0 in the backup file did not grow when the backup failed, but ib_logfile0 continued to grow when the backup was performed with innodb-log-file-buffering disabled.

            3. what is the exact meaning of the parameter innodb_log_file_buffering? The description of the parameter says whether the file system cache is enabled for ib_logfile0, but I'm wondering what that means exactly.

            4. can you tell me if this problem will be fixed in the next release 10.11.12 and when it will be released?

            I will upload mariabackup logs which are both failed and success with --verbose option.
            backup_failed.log
            backup_success.log

            supbaek baek seung ho added a comment - - edited Yesterday I have successfully backed up my stage database with mariabackup 10.11.10. There is some option that is not noticed in the mariabackup options page of documents. I think it is in the mariabackup for enterprise edition, not community, which is innodb-log-file-buffering and innodb-log-file-mmap. I have the following questions: 1. when running mariabackup, if I use the --skip-innodb-log-file-buffering option, the backup completes normally, but if I run a backup with innodb-log-file-buffering turned off, is there any side effect on the server or the backup? 2. if a backup succeeds when run with innodb-log-file-buffering disabled, is there a reason for this? In our tests we noticed that ib_logfile0 in the backup file did not grow when the backup failed, but ib_logfile0 continued to grow when the backup was performed with innodb-log-file-buffering disabled. 3. what is the exact meaning of the parameter innodb_log_file_buffering? The description of the parameter says whether the file system cache is enabled for ib_logfile0, but I'm wondering what that means exactly. 4. can you tell me if this problem will be fixed in the next release 10.11.12 and when it will be released? I will upload mariabackup logs which are both failed and success with --verbose option. backup_failed.log backup_success.log

            The fundamental difference between 10.6 and 10.11 is that until MDEV-14425 was implemented, the write-ahead log ib_logfile0 was divided into 512-byte blocks. Backup would copy these log blocks and validate the CRC-32C. It would not try to parse individual log records. This format was slow to write, because InnoDB would hold log_sys.mutex while copying data into log blocks, optionally encrypting the blocks (innodb_encrypt_log=ON) and computing the CRC-32C. The new format makes each individual mini-transaction a ‘block’ on its own. This allows any threads that modify persistent data to perform the encryption and CRC-32C concurrently. Also the actual memcpy() into the log buffer log_sys.buf is concurrent. Concurrency will be improved even further after the bottleneck MDEV-21923 has been removed.

            While the server has gotten faster to write the log, backup has gotten slower, because it is only copying and parsing the ib_logfile0 in one thread, and it now has to parse individual log records in order to find the mini-transaction boundaries and to be able to validate the CRC-32C for each mini-transaction. This creates a producer-consumer buffer overflow problem. The fix of MDEV-30000 could alleviate this a little, by forcing a checkpoint at the start of the backup, so that less log would have to be copied. Another possible help is to configure a larger innodb_log_file_size.

            A better fix would be to integrate the backup in the server in some way (MDEV-14992) or to make the server responsible for producing a log for backups (something like log archiving). If the server were writing the log for backup in sync with the recovery log, it would naturally slow down. This is a large change that will take time to implement, and it would only appear in a new major release of MariaDB Server, and possibly in the MariaDB Enterprise Server 11.4 release.

            The options in mariadb-backup are somewhat of a mess. The only part where innodb_log_file_buffering could make a difference is when reading the server’s ib_logfile0. innodb_log_file_buffering=OFF means that an attempt is made to open the log with O_DIRECT. Reading or writing the backed-up ib_logfile0 will not use O_DIRECT. The parameter was introduced in MDEV-30136 when innodb_flush_method was deprecated. I made some tests in May 2024 in MDEV-34062. The column "server innodb_log_file_mmap" in the tables is referring to a prototype that would allow the server to write log via mmap(). In the final version, this parameter only has effect during crash recovery or in backup, when the server’s log is being read. Those tests suggested that disabling O_DIRECT on the server for the log file or enabling memory-mapped access to parsing the file would enable the Linux kernel block cache. Of course, the results could vary between file system and kernel versions. I tested it only on one system.

            marko Marko Mäkelä added a comment - The fundamental difference between 10.6 and 10.11 is that until MDEV-14425 was implemented, the write-ahead log ib_logfile0 was divided into 512-byte blocks. Backup would copy these log blocks and validate the CRC-32C. It would not try to parse individual log records. This format was slow to write, because InnoDB would hold log_sys.mutex while copying data into log blocks, optionally encrypting the blocks ( innodb_encrypt_log=ON ) and computing the CRC-32C. The new format makes each individual mini-transaction a ‘block’ on its own. This allows any threads that modify persistent data to perform the encryption and CRC-32C concurrently. Also the actual memcpy() into the log buffer log_sys.buf is concurrent. Concurrency will be improved even further after the bottleneck MDEV-21923 has been removed. While the server has gotten faster to write the log, backup has gotten slower, because it is only copying and parsing the ib_logfile0 in one thread, and it now has to parse individual log records in order to find the mini-transaction boundaries and to be able to validate the CRC-32C for each mini-transaction. This creates a producer-consumer buffer overflow problem. The fix of MDEV-30000 could alleviate this a little, by forcing a checkpoint at the start of the backup, so that less log would have to be copied. Another possible help is to configure a larger innodb_log_file_size . A better fix would be to integrate the backup in the server in some way ( MDEV-14992 ) or to make the server responsible for producing a log for backups (something like log archiving). If the server were writing the log for backup in sync with the recovery log, it would naturally slow down. This is a large change that will take time to implement, and it would only appear in a new major release of MariaDB Server, and possibly in the MariaDB Enterprise Server 11.4 release. The options in mariadb-backup are somewhat of a mess. The only part where innodb_log_file_buffering could make a difference is when reading the server’s ib_logfile0 . innodb_log_file_buffering=OFF means that an attempt is made to open the log with O_DIRECT . Reading or writing the backed-up ib_logfile0 will not use O_DIRECT . The parameter was introduced in MDEV-30136 when innodb_flush_method was deprecated. I made some tests in May 2024 in MDEV-34062 . The column "server innodb_log_file_mmap " in the tables is referring to a prototype that would allow the server to write log via mmap() . In the final version, this parameter only has effect during crash recovery or in backup, when the server’s log is being read. Those tests suggested that disabling O_DIRECT on the server for the log file or enabling memory-mapped access to parsing the file would enable the Linux kernel block cache. Of course, the results could vary between file system and kernel versions. I tested it only on one system.

            axel, can you please verify the claim that backup got more failure-prone between 10.11.8 and 10.11.10?

            marko Marko Mäkelä added a comment - axel , can you please verify the claim that backup got more failure-prone between 10.11.8 and 10.11.10?
            supbaek baek seung ho added a comment -

            I would like to know how the marie backup is going in the test, and I would like to know if there is a way to continue the backup while keeping the current DB version.

            As an additional workaround, I would like to run backups on a slave, and if I backup using the Safe Slave Backup and Slave Info options, will there be any issues with backing up to the current version?

            Also, I would like to know the release schedule for MariaDB 10.11.12.

            supbaek baek seung ho added a comment - I would like to know how the marie backup is going in the test, and I would like to know if there is a way to continue the backup while keeping the current DB version. As an additional workaround, I would like to run backups on a slave, and if I backup using the Safe Slave Backup and Slave Info options, will there be any issues with backing up to the current version? Also, I would like to know the release schedule for MariaDB 10.11.12.

            People

              axel Axel Schwenke
              supbaek baek seung ho
              Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.