Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35791

mariadb-backup 10.11.10 failed to create backup

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.11.10
    • N/A
    • Backup
    • None
    • Centos 9 Stream

    Description

      The mariadb-backup failed to create a backup with the message:
      00] 2025-01-07 11:44:25 Was only able to copy log from 14354290020357 to 14357475827057, not 14357476072889; try increasing innodb_log_file_size

      I tried tens of combinations but it did not work,
      --innodb-log-buffer-size=(from 2m to 20g) --innodb-log-file-mmap=on and off.
      ANd with SET GLOBAL innodb_log_file_buffering=ON and set global innodb_log_file_mmap=ON and all other combinations on/off.

      Restarting the backup script to create a backup again sometimes helps.
      Restarting the MariaDB server sometimes helps.

      Attachments

        Issue Links

          Activity

            shexphobos Xesh created issue -
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Fix Version/s 10.11 [ 27614 ]
            serg Sergei Golubchik made changes -
            Assignee Marko Mäkelä [ marko ]
            shexphobos Xesh added a comment -

            Weird thing is that I have two separate database and everytime when backup fails on one it fails on another one. Every time they fail the same day and if one backup passes another one was OK too.

            shexphobos Xesh added a comment - Weird thing is that I have two separate database and everytime when backup fails on one it fails on another one. Every time they fail the same day and if one backup passes another one was OK too.

            My math says 14357476072889-14354290020357=3186052532, that is, backup would have to copy 2.97 GiB of log. How big is innodb_log_file_size of the server? Have you tried to make it any larger? For the report in the Description, 245,832 bytes of the log could not be copied, presumably because the server had advanced its checkpoint and overwritten this part of its circular ib_logfile0 before mariadb-backup --backup had had a chance to copy it.

            Note that the when you use the MDEV-34062 introduced parameter innodb_log_file_mmap=ON, which should be enabled by default in mariadb-backup on Linux and FreeBSD, the parameter innodb_log_buffer_size should not have any effect on the backup.

            If you are unwilling to enlarge the innodb_log_file_size, then this should be fixed by implementing MDEV-30000. Please refer to that ticket how to force a log checkpoint in the server before starting a backup. Would this work for you?

            marko Marko Mäkelä added a comment - My math says 14357476072889-14354290020357=3186052532, that is, backup would have to copy 2.97 GiB of log. How big is innodb_log_file_size of the server? Have you tried to make it any larger? For the report in the Description, 245,832 bytes of the log could not be copied, presumably because the server had advanced its checkpoint and overwritten this part of its circular ib_logfile0 before mariadb-backup --backup had had a chance to copy it. Note that the when you use the MDEV-34062 introduced parameter innodb_log_file_mmap=ON , which should be enabled by default in mariadb-backup on Linux and FreeBSD, the parameter innodb_log_buffer_size should not have any effect on the backup. If you are unwilling to enlarge the innodb_log_file_size , then this should be fixed by implementing MDEV-30000 . Please refer to that ticket how to force a log checkpoint in the server before starting a backup. Would this work for you?
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Needs Feedback [ 10501 ]
            shexphobos Xesh added a comment -

            It is 4GB in config, but that variable is dynamic in 10.11 and it is set
            dynamically to 2G.

            shexphobos Xesh added a comment - It is 4GB in config, but that variable is dynamic in 10.11 and it is set dynamically to 2G.
            shexphobos Xesh added a comment -

            Just now it failed

            [00] 2025-01-30 09:52:38 Was only able to copy log from 22585940364668 to 22587158431173, not 22587158432590; try increasing innodb_log_file_size
            mariabackup: Stopping log copying thread.[00] 2025-01-30 09:52:38 Retrying read of log at LSN=22587158431173

            22587158432590-22585940364668 =1,218,067,922 (1.3G)

            The innodb_log_file_size is set to 4G

            --------------------------------+

            Variable_name Value

            --------------------------------+

            innodb_log_file_size 4294967296

            --------------------------------+

            shexphobos Xesh added a comment - Just now it failed [00] 2025-01-30 09:52:38 Was only able to copy log from 22585940364668 to 22587158431173, not 22587158432590; try increasing innodb_log_file_size mariabackup: Stopping log copying thread. [00] 2025-01-30 09:52:38 Retrying read of log at LSN=22587158431173 22587158432590-22585940364668 =1,218,067,922 (1.3G) The innodb_log_file_size is set to 4G --------------------- -----------+ Variable_name Value --------------------- -----------+ innodb_log_file_size 4294967296 --------------------- -----------+

            It looks like the log file must be corrupted. This would mean that if the server were killed, it would fail to recover.

            We recently diagnosed a case like that for a support customer who was using Veritas VxFS. We monitored some system calls of the server with strace and based on that output were able to prove that InnoDB was writing the log correctly, but something was replacing the contents of log blocks with NUL bytes. As soon as that customer switched to ext4fs, the problem went away.

            Such spontaneous corruption of the file could also occur because the storage device is failing. I would recommend checking any kernel or system logs for any hints of corruption, as well as checking if smartctl or similar tools would report any signs of failure.

            marko Marko Mäkelä added a comment - It looks like the log file must be corrupted. This would mean that if the server were killed, it would fail to recover. We recently diagnosed a case like that for a support customer who was using Veritas VxFS. We monitored some system calls of the server with strace and based on that output were able to prove that InnoDB was writing the log correctly, but something was replacing the contents of log blocks with NUL bytes. As soon as that customer switched to ext4fs, the problem went away. Such spontaneous corruption of the file could also occur because the storage device is failing. I would recommend checking any kernel or system logs for any hints of corruption, as well as checking if smartctl or similar tools would report any signs of failure.
            shexphobos Xesh added a comment -

            I'm using AWS EBS as a storage and XFS file system. However, the backups on
            two different servers are failing. There are no issues with the storage or
            any errors in the logs.

            On Thu, Jan 30, 2025 at 12:36 PM Marko Mäkelä (Jira) <jira@mariadb.org>

            shexphobos Xesh added a comment - I'm using AWS EBS as a storage and XFS file system. However, the backups on two different servers are failing. There are no issues with the storage or any errors in the logs. On Thu, Jan 30, 2025 at 12:36 PM Marko Mäkelä (Jira) <jira@mariadb.org>

            Can you reproduce this in a test environment and trace the execution of both mariadbd and mariadb-backup --backup with strace (and innodb_log_file_mmap=OFF)?

            marko Marko Mäkelä added a comment - Can you reproduce this in a test environment and trace the execution of both mariadbd and mariadb-backup --backup with strace (and innodb_log_file_mmap=OFF )?
            andy.lecoq Andy made changes -
            Attachment screenshot-1.png [ 74636 ]
            andy.lecoq Andy added a comment - - edited

            Hi,

            I have the same issue with this image : registry.redhat.io/rhel8/mariadb-1011@sha256:5bf69f78c5891c5e0089084a76c04317b49fb0005908c6b7ec18e57cb74756dc (updated 10 days ago)

            [00] 2025-02-08 22:27:02 Skipping db: ./performance_schema
            [00] 2025-02-08 22:27:02 Skipping db: ./sys
            [00] 2025-02-08 22:27:02 Finished backing up non-InnoDB tables and files
            [00] 2025-02-08 22:27:02 Waiting for log copy thread to read lsn 710823310346
            [00] 2025-02-08 22:27:02 Retrying read of log at LSN=710823295690
            [00] 2025-02-08 22:27:02 >> log scanned up to (710823295690)
            [00] 2025-02-08 22:27:03 Retrying read of log at LSN=710823295690
            [00] 2025-02-08 22:27:03 >> log scanned up to (710823295690)
            [00] 2025-02-08 22:27:04 Retrying read of log at LSN=710823295690
            [00] 2025-02-08 22:27:04 >> log scanned up to (710823295690)
            [00] 2025-02-08 22:27:05 Retrying read of log at LSN=710823295690
            [00] 2025-02-08 22:27:05 >> log scanned up to (710823295690)
            [00] 2025-02-08 22:27:06 Retrying read of log at LSN=710823295690
            [00] 2025-02-08 22:27:06 >> log scanned up to (710823295690)
            [00] 2025-02-08 22:27:07 Was only able to copy log from 710823295630 to 710823295690, not 710823310346; try increasing innodb_log_file_size
            mariabackup: Stopping log copying thread[00] 2025-02-08 22:27:07 Retrying read of log at LSN=710823295690
            

            I am trying the previous image registry.redhat.io/rhel8/mariadb-1011@sha256:ba576cdf749df1f62980936a489ed4cd0c599d89e0d124aacae714bda0aaad0e (updated 13 days ago)
            And finally works

            [00] 2025-02-08 23:16:51 All tables unlocked
            [00] 2025-02-08 23:16:51 Backup created in directory '/nfs-backup/mariadb/daily/incr_25-02-08_23h15/'
            [00] 2025-02-08 23:16:51 Writing backup-my.cnf
            [00] 2025-02-08 23:16:51 ...done
            [00] 2025-02-08 23:16:51 Writing xtrabackup_info
            [00] 2025-02-08 23:16:51 ...done
            [00] 2025-02-08 23:16:51 Redo log (from LSN 712070635863 to 712070635920) was copied.
            [00] 2025-02-08 23:16:51 completed OK!
            

            I will try it for fews days

            What has changed between both image

            Before upgrading the version of the image, we tried changing innodb_log_file_size from 5G to 15G, but we observed the same behavior. I tryied to have more logs with --verbose but nothing appears according to this issue

            andy.lecoq Andy added a comment - - edited Hi, I have the same issue with this image : registry.redhat.io/rhel8/mariadb-1011@sha256:5bf69f78c5891c5e0089084a76c04317b49fb0005908c6b7ec18e57cb74756dc (updated 10 days ago) [00] 2025-02-08 22:27:02 Skipping db: . /performance_schema [00] 2025-02-08 22:27:02 Skipping db: . /sys [00] 2025-02-08 22:27:02 Finished backing up non-InnoDB tables and files [00] 2025-02-08 22:27:02 Waiting for log copy thread to read lsn 710823310346 [00] 2025-02-08 22:27:02 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:02 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:03 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:03 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:04 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:04 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:05 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:05 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:06 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:06 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:07 Was only able to copy log from 710823295630 to 710823295690, not 710823310346; try increasing innodb_log_file_size mariabackup: Stopping log copying thread[00] 2025-02-08 22:27:07 Retrying read of log at LSN=710823295690 I am trying the previous image registry.redhat.io/rhel8/mariadb-1011@sha256:ba576cdf749df1f62980936a489ed4cd0c599d89e0d124aacae714bda0aaad0e (updated 13 days ago) And finally works [00] 2025-02-08 23:16:51 All tables unlocked [00] 2025-02-08 23:16:51 Backup created in directory '/nfs-backup/mariadb/daily/incr_25-02-08_23h15/' [00] 2025-02-08 23:16:51 Writing backup-my.cnf [00] 2025-02-08 23:16:51 ... done [00] 2025-02-08 23:16:51 Writing xtrabackup_info [00] 2025-02-08 23:16:51 ... done [00] 2025-02-08 23:16:51 Redo log (from LSN 712070635863 to 712070635920) was copied. [00] 2025-02-08 23:16:51 completed OK! I will try it for fews days What has changed between both image Before upgrading the version of the image, we tried changing innodb_log_file_size from 5G to 15G, but we observed the same behavior. I tryied to have more logs with --verbose but nothing appears according to this issue
            shexphobos Xesh added a comment -

            Marko, I'll try this weekend.

            shexphobos Xesh added a comment - Marko, I'll try this weekend.
            mschorm Michal Schorm added a comment -

            Hello,
            I am Michal Schorm, Red Hat engineer taking care of MariaDB. Thank you for the report.

            I base my research on your report:

            working:   registry.redhat.io/rhel8/mariadb-1011@sha256:ba576cdf749df1f62980936a489ed4cd0c599d89e0d124aacae714bda0aaad0e
            broken:     registry.redhat.io/rhel8/mariadb-1011@sha256:5bf69f78c5891c5e0089084a76c04317b49fb0005908c6b7ec18e57cb74756dc
            

            I tried to reproduce your issue with a trivial database on my laptop, but I was unsuccessful.
            (In both cases the mariadb-backup succeeded).

            I reviewed changes in the MariaDB software between the two images, and there are none.
            (Last change Red Hat did to the MariaDB software in the container images was cca 9 months ago)

            I have some questions that might help me to find the cause:

            • Are these containers running as standalone or in OpenShift?
            • In case OpenShift, I assume we are talking about version 4 ?
            • Are you able to reproduce this issue at will, or you haven't yet find times / triggers for it?
            • Is it possible to get more logs, like ‘debug’ from the upgrade between the images ?
            mschorm Michal Schorm added a comment - Hello, I am Michal Schorm, Red Hat engineer taking care of MariaDB. Thank you for the report. I base my research on your report: working: registry.redhat.io /rhel8/mariadb-1011 @sha256:ba576cdf749df1f62980936a489ed4cd0c599d89e0d124aacae714bda0aaad0e broken: registry.redhat.io /rhel8/mariadb-1011 @sha256:5bf69f78c5891c5e0089084a76c04317b49fb0005908c6b7ec18e57cb74756dc I tried to reproduce your issue with a trivial database on my laptop, but I was unsuccessful. (In both cases the mariadb-backup succeeded). I reviewed changes in the MariaDB software between the two images, and there are none. (Last change Red Hat did to the MariaDB software in the container images was cca 9 months ago) I have some questions that might help me to find the cause: Are these containers running as standalone or in OpenShift? In case OpenShift, I assume we are talking about version 4 ? Are you able to reproduce this issue at will, or you haven't yet find times / triggers for it? Is it possible to get more logs, like ‘debug’ from the upgrade between the images ?
            andy.lecoq Andy made changes -
            Attachment anonymized_mariadb_backup.log [ 74661 ]
            andy.lecoq Andy made changes -
            Attachment fully_anonymized_toanonymize.txt [ 74662 ]
            danblack Daniel Black added a comment -

            Note there is a mariadb-10.11(.11)-ubi based on rhel9 ubi on Docker Official Images from upsteam MariaDB RHEL9 packages if that's a useful comparision image.

            danblack Daniel Black added a comment - Note there is a mariadb-10.11(.11)-ubi based on rhel9 ubi on Docker Official Images from upsteam MariaDB RHEL9 packages if that's a useful comparision image.
            shexphobos Xesh added a comment -

            Mariadb-backup 10.11.11 crashed this morning on one server.
            But Mariadb-backup 10.11.10 with innodb_adaptive_flushing_lwm=10 crashed only once in 15 days.

            shexphobos Xesh added a comment - Mariadb-backup 10.11.11 crashed this morning on one server. But Mariadb-backup 10.11.10 with innodb_adaptive_flushing_lwm=10 crashed only once in 15 days.
            marko Marko Mäkelä made changes -
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Incomplete [ 4 ]
            Status Needs Feedback [ 10501 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Resolution Incomplete [ 4 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s N/A [ 14700 ]
            serg Sergei Golubchik made changes -
            Status Stalled [ 10000 ] Open [ 1 ]

            By "crashed", do you mean that it failed with an error message similar to the one in the Description, or do you mean an actual crash? I believe that fixing MDEV-30000 could reduce the reproducibility of this failure, because then backup would typically have to copy less log.

            marko Marko Mäkelä added a comment - By "crashed", do you mean that it failed with an error message similar to the one in the Description, or do you mean an actual crash? I believe that fixing MDEV-30000 could reduce the reproducibility of this failure, because then backup would typically have to copy less log.
            serg Sergei Golubchik made changes -
            Status Open [ 1 ] Needs Feedback [ 10501 ]
            shexphobos Xesh added a comment -

            I updated to 10.11.11 with innodb_adaptive_flushing_lwm=5, and it crashes once a week, then I tried with innodb_adaptive_flushing_lwm=0.01 and it crashes also this morning.

            Then, I tried innodb_log_checkpoint_now=on with and without lwm=0.01, and it also crashed

            mysql -e 'SET GLOBAL innodb_log_checkpoint_now=ON;'
            mysql -e 'SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.01;'
            sleep 1m
            /bin/mariadb-backup --innodb-log-buffer-size=20g --backup --target-dir=

            I try tomorrow with --skip-innodb-log-checkpoint-on

            shexphobos Xesh added a comment - I updated to 10.11.11 with innodb_adaptive_flushing_lwm=5, and it crashes once a week, then I tried with innodb_adaptive_flushing_lwm=0.01 and it crashes also this morning. Then, I tried innodb_log_checkpoint_now=on with and without lwm=0.01, and it also crashed mysql -e 'SET GLOBAL innodb_log_checkpoint_now=ON;' mysql -e 'SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.01;' sleep 1m /bin/mariadb-backup --innodb-log-buffer-size=20g --backup --target-dir= I try tomorrow with --skip-innodb-log-checkpoint-on
            shexphobos Xesh added a comment -

            Marko, yes, crash means log like this: 00] 2025-01-07 11:44:25 Was only able to copy log from 14354290020357 to 14357475827057, not 14357476072889; try increasing innodb_log_file_size

            shexphobos Xesh added a comment - Marko, yes, crash means log like this: 00] 2025-01-07 11:44:25 Was only able to copy log from 14354290020357 to 14357475827057, not 14357476072889; try increasing innodb_log_file_size
            shexphobos Xesh added a comment -

            this command is unknown in 10.11.11: SET GLOBAL innodb_log_checkpoint_now=ON;
            and /bin/mariadb-backup with --skip-innodb-log-checkpoint-on is not working, it does not recognize this command.

            shexphobos Xesh added a comment - this command is unknown in 10.11.11: SET GLOBAL innodb_log_checkpoint_now=ON; and /bin/mariadb-backup with --skip-innodb-log-checkpoint-on is not working, it does not recognize this command.

            The fix of MDEV-30000 should be included in the upcoming 10.11.12 release, not in the 10.11.11 that was released before the fix was implemented.

            marko Marko Mäkelä added a comment - The fix of MDEV-30000 should be included in the upcoming 10.11.12 release, not in the 10.11.11 that was released before the fix was implemented.
            julien.fritsch Julien Fritsch made changes -
            Due Date 2025-04-21

            In MDEV-36159 you can find some more analysis of this problem.

            MDEV-30000 was finally fixed in the upcoming 10.11.12 release, by making mariadb-backup --backup execute a log checkpoint in the server before starting. In that way, it will have less log to parse and copy, which hopefully reduces the chances of failure. Would that work for you?

            marko Marko Mäkelä added a comment - In MDEV-36159 you can find some more analysis of this problem. MDEV-30000 was finally fixed in the upcoming 10.11.12 release, by making mariadb-backup --backup execute a log checkpoint in the server before starting. In that way, it will have less log to parse and copy, which hopefully reduces the chances of failure. Would that work for you?
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Debarun Banerjee [ JIRAUSER54513 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Incomplete [ 4 ]
            Status Needs Feedback [ 10501 ] Closed [ 6 ]

            People

              debarun Debarun Banerjee
              shexphobos Xesh
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.