[MDEV-35791] mariadb-backup 10.11.10 failed to create backup - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.11.10
Fix Version/s: N/A
Component/s: Backup
Labels:
None
Environment:
Centos 9 Stream

Description

The mariadb-backup failed to create a backup with the message:
00] 2025-01-07 11:44:25 Was only able to copy log from 14354290020357 to 14357475827057, not 14357476072889; try increasing innodb_log_file_size

I tried tens of combinations but it did not work,
--innodb-log-buffer-size=(from 2m to 20g) --innodb-log-file-mmap=on and off.
ANd with SET GLOBAL innodb_log_file_buffering=ON and set global innodb_log_file_mmap=ON and all other combinations on/off.

Restarting the backup script to create a backup again sometimes helps.
Restarting the MariaDB server sometimes helps.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

anonymized_mariadb_backup.log
134 kB
2025-02-13 19:42
fully_anonymized_toanonymize.txt
42 kB
2025-02-13 19:52
screenshot-1.png
73 kB
2025-02-08 22:37

Issue Links

is duplicated by

MDEV-36159 mariabackup failed after upgrade 10.11.10

Stalled

relates to

MDEV-30000 make mariadb-backup to force an innodb checkpoint

Closed

MDEV-14992 BACKUP: in-server backup

Open

Activity

Ascending order - Click to sort in descending order

Xesh created issue - 2025-01-07 12:06

Sergei Golubchik made changes - 2025-01-12 20:26

Field	Original Value	New Value
Fix Version/s		10.11 [ 27614 ]

Sergei Golubchik made changes - 2025-01-12 20:26

Assignee

Marko Mäkelä [ marko ]

Xesh added a comment - 2025-01-30 09:23

Weird thing is that I have two separate database and everytime when backup fails on one it fails on another one. Every time they fail the same day and if one backup passes another one was OK too.

Xesh added a comment - 2025-01-30 09:23 Weird thing is that I have two separate database and everytime when backup fails on one it fails on another one. Every time they fail the same day and if one backup passes another one was OK too.

Marko Mäkelä added a comment - 2025-01-30 09:38

My math says 14357476072889-14354290020357=3186052532, that is, backup would have to copy 2.97 GiB of log. How big is innodb_log_file_size of the server? Have you tried to make it any larger? For the report in the Description, 245,832 bytes of the log could not be copied, presumably because the server had advanced its checkpoint and overwritten this part of its circular ib_logfile0 before mariadb-backup --backup had had a chance to copy it.

Note that the when you use the ~~MDEV-34062~~ introduced parameter innodb_log_file_mmap=ON, which should be enabled by default in mariadb-backup on Linux and FreeBSD, the parameter innodb_log_buffer_size should not have any effect on the backup.

If you are unwilling to enlarge the innodb_log_file_size, then this should be fixed by implementing ~~MDEV-30000~~. Please refer to that ticket how to force a log checkpoint in the server before starting a backup. Would this work for you?

Marko Mäkelä added a comment - 2025-01-30 09:38 My math says 14357476072889-14354290020357=3186052532, that is, backup would have to copy 2.97 GiB of log. How big is innodb_log_file_size of the server? Have you tried to make it any larger? For the report in the Description, 245,832 bytes of the log could not be copied, presumably because the server had advanced its checkpoint and overwritten this part of its circular ib_logfile0 before mariadb-backup --backup had had a chance to copy it. Note that the when you use the MDEV-34062 introduced parameter innodb_log_file_mmap=ON , which should be enabled by default in mariadb-backup on Linux and FreeBSD, the parameter innodb_log_buffer_size should not have any effect on the backup. If you are unwilling to enlarge the innodb_log_file_size , then this should be fixed by implementing MDEV-30000 . Please refer to that ticket how to force a log checkpoint in the server before starting a backup. Would this work for you?

Marko Mäkelä made changes - 2025-01-30 09:38

Status

Open [ 1 ]

Needs Feedback [ 10501 ]

Xesh added a comment - 2025-01-30 09:55

It is 4GB in config, but that variable is dynamic in 10.11 and it is set
dynamically to 2G.

Xesh added a comment - 2025-01-30 09:55 It is 4GB in config, but that variable is dynamic in 10.11 and it is set dynamically to 2G.

Xesh added a comment - 2025-01-30 10:01

Just now it failed

[00] 2025-01-30 09:52:38 Was only able to copy log from 22585940364668 to 22587158431173, not 22587158432590; try increasing innodb_log_file_size
mariabackup: Stopping log copying thread.[00] 2025-01-30 09:52:38 Retrying read of log at LSN=22587158431173

22587158432590-22585940364668 =1,218,067,922 (1.3G)

The innodb_log_file_size is set to 4G

--------------------------------+

Variable_name

Value

--------------------------------+

innodb_log_file_size

4294967296

--------------------------------+

Xesh added a comment - 2025-01-30 10:01 Just now it failed [00] 2025-01-30 09:52:38 Was only able to copy log from 22585940364668 to 22587158431173, not 22587158432590; try increasing innodb_log_file_size mariabackup: Stopping log copying thread. [00] 2025-01-30 09:52:38 Retrying read of log at LSN=22587158431173 22587158432590-22585940364668 =1,218,067,922 (1.3G) The innodb_log_file_size is set to 4G --------------------- -----------+ Variable_name Value --------------------- -----------+ innodb_log_file_size 4294967296 --------------------- -----------+

Marko Mäkelä added a comment - 2025-01-30 11:35

It looks like the log file must be corrupted. This would mean that if the server were killed, it would fail to recover.

We recently diagnosed a case like that for a support customer who was using Veritas VxFS. We monitored some system calls of the server with strace and based on that output were able to prove that InnoDB was writing the log correctly, but something was replacing the contents of log blocks with NUL bytes. As soon as that customer switched to ext4fs, the problem went away.

Such spontaneous corruption of the file could also occur because the storage device is failing. I would recommend checking any kernel or system logs for any hints of corruption, as well as checking if smartctl or similar tools would report any signs of failure.

Marko Mäkelä added a comment - 2025-01-30 11:35 It looks like the log file must be corrupted. This would mean that if the server were killed, it would fail to recover. We recently diagnosed a case like that for a support customer who was using Veritas VxFS. We monitored some system calls of the server with strace and based on that output were able to prove that InnoDB was writing the log correctly, but something was replacing the contents of log blocks with NUL bytes. As soon as that customer switched to ext4fs, the problem went away. Such spontaneous corruption of the file could also occur because the storage device is failing. I would recommend checking any kernel or system logs for any hints of corruption, as well as checking if smartctl or similar tools would report any signs of failure.

Xesh added a comment - 2025-01-30 11:55

I'm using AWS EBS as a storage and XFS file system. However, the backups on
two different servers are failing. There are no issues with the storage or
any errors in the logs.

On Thu, Jan 30, 2025 at 12:36 PM Marko Mäkelä (Jira) <jira@mariadb.org>

Xesh added a comment - 2025-01-30 11:55 I'm using AWS EBS as a storage and XFS file system. However, the backups on two different servers are failing. There are no issues with the storage or any errors in the logs. On Thu, Jan 30, 2025 at 12:36 PM Marko Mäkelä (Jira) <jira@mariadb.org>

Marko Mäkelä added a comment - 2025-01-30 12:22

Can you reproduce this in a test environment and trace the execution of both mariadbd and mariadb-backup --backup with strace (and innodb_log_file_mmap=OFF)?

Marko Mäkelä added a comment - 2025-01-30 12:22 Can you reproduce this in a test environment and trace the execution of both mariadbd and mariadb-backup --backup with strace (and innodb_log_file_mmap=OFF )?

Andy made changes - 2025-02-08 22:37

Attachment

screenshot-1.png [ 74636 ]

Andy added a comment - 2025-02-08 22:37 - edited

Hi,

I have the same issue with this image : registry.redhat.io/rhel8/mariadb-1011@sha256:5bf69f78c5891c5e0089084a76c04317b49fb0005908c6b7ec18e57cb74756dc (updated 10 days ago)

[00] 2025-02-08 22:27:02 Skipping db: ./performance_schema

[00] 2025-02-08 22:27:02 Skipping db: ./sys

[00] 2025-02-08 22:27:02 Finished backing up non-InnoDB tables and files

[00] 2025-02-08 22:27:02 Waiting for log copy thread to read lsn 710823310346

[00] 2025-02-08 22:27:02 Retrying read of log at LSN=710823295690

[00] 2025-02-08 22:27:02 >> log scanned up to (710823295690)

[00] 2025-02-08 22:27:03 Retrying read of log at LSN=710823295690

[00] 2025-02-08 22:27:03 >> log scanned up to (710823295690)

[00] 2025-02-08 22:27:04 Retrying read of log at LSN=710823295690

[00] 2025-02-08 22:27:04 >> log scanned up to (710823295690)

[00] 2025-02-08 22:27:05 Retrying read of log at LSN=710823295690

[00] 2025-02-08 22:27:05 >> log scanned up to (710823295690)

[00] 2025-02-08 22:27:06 Retrying read of log at LSN=710823295690

[00] 2025-02-08 22:27:06 >> log scanned up to (710823295690)

[00] 2025-02-08 22:27:07 Was only able to copy log from 710823295630 to 710823295690, not 710823310346; try increasing innodb_log_file_size

mariabackup: Stopping log copying thread[00] 2025-02-08 22:27:07 Retrying read of log at LSN=710823295690

I am trying the previous image registry.redhat.io/rhel8/mariadb-1011@sha256:ba576cdf749df1f62980936a489ed4cd0c599d89e0d124aacae714bda0aaad0e (updated 13 days ago)
And finally works

[00] 2025-02-08 23:16:51 All tables unlocked

[00] 2025-02-08 23:16:51 Backup created in directory '/nfs-backup/mariadb/daily/incr_25-02-08_23h15/'

[00] 2025-02-08 23:16:51 Writing backup-my.cnf

[00] 2025-02-08 23:16:51 ...done

[00] 2025-02-08 23:16:51 Writing xtrabackup_info

[00] 2025-02-08 23:16:51 ...done

[00] 2025-02-08 23:16:51 Redo log (from LSN 712070635863 to 712070635920) was copied.

[00] 2025-02-08 23:16:51 completed OK!

I will try it for fews days

What has changed between both image

Before upgrading the version of the image, we tried changing innodb_log_file_size from 5G to 15G, but we observed the same behavior. I tryied to have more logs with --verbose but nothing appears according to this issue

Andy added a comment - 2025-02-08 22:37 - edited Hi, I have the same issue with this image : registry.redhat.io/rhel8/mariadb-1011@sha256:5bf69f78c5891c5e0089084a76c04317b49fb0005908c6b7ec18e57cb74756dc (updated 10 days ago) [00] 2025-02-08 22:27:02 Skipping db: . /performance_schema [00] 2025-02-08 22:27:02 Skipping db: . /sys [00] 2025-02-08 22:27:02 Finished backing up non-InnoDB tables and files [00] 2025-02-08 22:27:02 Waiting for log copy thread to read lsn 710823310346 [00] 2025-02-08 22:27:02 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:02 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:03 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:03 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:04 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:04 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:05 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:05 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:06 Retrying read of log at LSN=710823295690 [00] 2025-02-08 22:27:06 >> log scanned up to (710823295690) [00] 2025-02-08 22:27:07 Was only able to copy log from 710823295630 to 710823295690, not 710823310346; try increasing innodb_log_file_size mariabackup: Stopping log copying thread[00] 2025-02-08 22:27:07 Retrying read of log at LSN=710823295690 I am trying the previous image registry.redhat.io/rhel8/mariadb-1011@sha256:ba576cdf749df1f62980936a489ed4cd0c599d89e0d124aacae714bda0aaad0e (updated 13 days ago) And finally works [00] 2025-02-08 23:16:51 All tables unlocked [00] 2025-02-08 23:16:51 Backup created in directory '/nfs-backup/mariadb/daily/incr_25-02-08_23h15/' [00] 2025-02-08 23:16:51 Writing backup-my.cnf [00] 2025-02-08 23:16:51 ... done [00] 2025-02-08 23:16:51 Writing xtrabackup_info [00] 2025-02-08 23:16:51 ... done [00] 2025-02-08 23:16:51 Redo log (from LSN 712070635863 to 712070635920) was copied. [00] 2025-02-08 23:16:51 completed OK! I will try it for fews days What has changed between both image Before upgrading the version of the image, we tried changing innodb_log_file_size from 5G to 15G, but we observed the same behavior. I tryied to have more logs with --verbose but nothing appears according to this issue

Xesh added a comment - 2025-02-10 07:59

Marko, I'll try this weekend.

Xesh added a comment - 2025-02-10 07:59 Marko, I'll try this weekend.

Michal Schorm added a comment - 2025-02-13 13:28

Hello,
I am Michal Schorm, Red Hat engineer taking care of MariaDB. Thank you for the report.

I base my research on your report:

working:   registry.redhat.io/rhel8/mariadb-1011@sha256:ba576cdf749df1f62980936a489ed4cd0c599d89e0d124aacae714bda0aaad0e

broken:     registry.redhat.io/rhel8/mariadb-1011@sha256:5bf69f78c5891c5e0089084a76c04317b49fb0005908c6b7ec18e57cb74756dc

I tried to reproduce your issue with a trivial database on my laptop, but I was unsuccessful.
(In both cases the mariadb-backup succeeded).

I reviewed changes in the MariaDB software between the two images, and there are none.
(Last change Red Hat did to the MariaDB software in the container images was cca 9 months ago)

I have some questions that might help me to find the cause:

Are these containers running as standalone or in OpenShift?
In case OpenShift, I assume we are talking about version 4 ?
Are you able to reproduce this issue at will, or you haven't yet find times / triggers for it?
Is it possible to get more logs, like ‘debug’ from the upgrade between the images ?

Michal Schorm added a comment - 2025-02-13 13:28 Hello, I am Michal Schorm, Red Hat engineer taking care of MariaDB. Thank you for the report. I base my research on your report: working: registry.redhat.io /rhel8/mariadb-1011 @sha256:ba576cdf749df1f62980936a489ed4cd0c599d89e0d124aacae714bda0aaad0e broken: registry.redhat.io /rhel8/mariadb-1011 @sha256:5bf69f78c5891c5e0089084a76c04317b49fb0005908c6b7ec18e57cb74756dc I tried to reproduce your issue with a trivial database on my laptop, but I was unsuccessful. (In both cases the mariadb-backup succeeded). I reviewed changes in the MariaDB software between the two images, and there are none. (Last change Red Hat did to the MariaDB software in the container images was cca 9 months ago) I have some questions that might help me to find the cause: Are these containers running as standalone or in OpenShift? In case OpenShift, I assume we are talking about version 4 ? Are you able to reproduce this issue at will, or you haven't yet find times / triggers for it? Is it possible to get more logs, like ‘debug’ from the upgrade between the images ?

Andy made changes - 2025-02-13 19:42

Attachment

anonymized_mariadb_backup.log [ 74661 ]

Andy made changes - 2025-02-13 19:52

Attachment

fully_anonymized_toanonymize.txt [ 74662 ]

Daniel Black added a comment - 2025-02-14 04:16

Note there is a mariadb-10.11(.11)-ubi based on rhel9 ubi on Docker Official Images from upsteam MariaDB RHEL9 packages if that's a useful comparision image.

Daniel Black added a comment - 2025-02-14 04:16 Note there is a mariadb-10.11(.11)-ubi based on rhel9 ubi on Docker Official Images from upsteam MariaDB RHEL9 packages if that's a useful comparision image.

Xesh added a comment - 2025-02-16 11:36

Mariadb-backup 10.11.11 crashed this morning on one server.
But Mariadb-backup 10.11.10 with innodb_adaptive_flushing_lwm=10 crashed only once in 15 days.

Xesh added a comment - 2025-02-16 11:36 Mariadb-backup 10.11.11 crashed this morning on one server. But Mariadb-backup 10.11.10 with innodb_adaptive_flushing_lwm=10 crashed only once in 15 days.

Marko Mäkelä made changes - 2025-02-25 09:30

Link

This issue relates to MDEV-36159 [ MDEV-36159 ]

Julien Fritsch made changes - 2025-03-03 08:26

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.11 [ 27614 ]
Resolution		Incomplete [ 4 ]
Status	Needs Feedback [ 10501 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2025-03-07 12:13

Resolution	Incomplete [ 4 ]
Status	Closed [ 6 ]	Stalled [ 10000 ]

Sergei Golubchik made changes - 2025-03-07 12:13

Fix Version/s		10.11 [ 27614 ]
Fix Version/s	N/A [ 14700 ]

Sergei Golubchik made changes - 2025-03-07 12:13

Status

Stalled [ 10000 ]

Open [ 1 ]

Marko Mäkelä added a comment - 2025-03-07 12:22

By "crashed", do you mean that it failed with an error message similar to the one in the Description, or do you mean an actual crash? I believe that fixing ~~MDEV-30000~~ could reduce the reproducibility of this failure, because then backup would typically have to copy less log.

Marko Mäkelä added a comment - 2025-03-07 12:22 By "crashed", do you mean that it failed with an error message similar to the one in the Description, or do you mean an actual crash? I believe that fixing MDEV-30000 could reduce the reproducibility of this failure, because then backup would typically have to copy less log.

Sergei Golubchik made changes - 2025-03-07 13:15

Status

Open [ 1 ]

Needs Feedback [ 10501 ]

Xesh added a comment - 2025-03-21 11:39

I updated to 10.11.11 with innodb_adaptive_flushing_lwm=5, and it crashes once a week, then I tried with innodb_adaptive_flushing_lwm=0.01 and it crashes also this morning.

Then, I tried innodb_log_checkpoint_now=on with and without lwm=0.01, and it also crashed

mysql -e 'SET GLOBAL innodb_log_checkpoint_now=ON;'
mysql -e 'SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.01;'
sleep 1m
/bin/mariadb-backup --innodb-log-buffer-size=20g --backup --target-dir=

I try tomorrow with --skip-innodb-log-checkpoint-on

Xesh added a comment - 2025-03-21 11:39 I updated to 10.11.11 with innodb_adaptive_flushing_lwm=5, and it crashes once a week, then I tried with innodb_adaptive_flushing_lwm=0.01 and it crashes also this morning. Then, I tried innodb_log_checkpoint_now=on with and without lwm=0.01, and it also crashed mysql -e 'SET GLOBAL innodb_log_checkpoint_now=ON;' mysql -e 'SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.01;' sleep 1m /bin/mariadb-backup --innodb-log-buffer-size=20g --backup --target-dir= I try tomorrow with --skip-innodb-log-checkpoint-on

Xesh added a comment - 2025-03-24 15:23

Marko, yes, crash means log like this: 00] 2025-01-07 11:44:25 Was only able to copy log from 14354290020357 to 14357475827057, not 14357476072889; try increasing innodb_log_file_size

Xesh added a comment - 2025-03-24 15:23 Marko, yes, crash means log like this: 00] 2025-01-07 11:44:25 Was only able to copy log from 14354290020357 to 14357475827057, not 14357476072889; try increasing innodb_log_file_size

Xesh added a comment - 2025-03-24 15:26

this command is unknown in 10.11.11: SET GLOBAL innodb_log_checkpoint_now=ON;
and /bin/mariadb-backup with --skip-innodb-log-checkpoint-on is not working, it does not recognize this command.

Xesh added a comment - 2025-03-24 15:26 this command is unknown in 10.11.11: SET GLOBAL innodb_log_checkpoint_now=ON; and /bin/mariadb-backup with --skip-innodb-log-checkpoint-on is not working, it does not recognize this command.

Marko Mäkelä added a comment - 2025-03-25 06:17

The fix of ~~MDEV-30000~~ should be included in the upcoming 10.11.12 release, not in the 10.11.11 that was released before the fix was implemented.

Marko Mäkelä added a comment - 2025-03-25 06:17 The fix of MDEV-30000 should be included in the upcoming 10.11.12 release, not in the 10.11.11 that was released before the fix was implemented.

Julien Fritsch made changes - 2025-04-07 06:13

Due Date

2025-04-21

Marko Mäkelä added a comment - 2025-04-07 06:34

In MDEV-36159 you can find some more analysis of this problem.

~~MDEV-30000~~ was finally fixed in the upcoming 10.11.12 release, by making mariadb-backup --backup execute a log checkpoint in the server before starting. In that way, it will have less log to parse and copy, which hopefully reduces the chances of failure. Would that work for you?

Marko Mäkelä added a comment - 2025-04-07 06:34 In MDEV-36159 you can find some more analysis of this problem. MDEV-30000 was finally fixed in the upcoming 10.11.12 release, by making mariadb-backup --backup execute a log checkpoint in the server before starting. In that way, it will have less log to parse and copy, which hopefully reduces the chances of failure. Would that work for you?

Marko Mäkelä made changes - 2025-04-07 06:34

Link

This issue relates to ~~MDEV-30000~~ [ ~~MDEV-30000~~ ]

Marko Mäkelä made changes - 2025-04-07 06:35

Link

This issue is duplicated by MDEV-36159 [ MDEV-36159 ]

Marko Mäkelä made changes - 2025-04-07 06:35

Link

This issue relates to MDEV-36159 [ MDEV-36159 ]

Marko Mäkelä made changes - 2025-04-07 06:36

Link

This issue relates to MDEV-14992 [ MDEV-14992 ]

Marko Mäkelä made changes - 2025-04-07 06:36

Assignee

Marko Mäkelä [ marko ]

Debarun Banerjee [ JIRAUSER54513 ]

Julien Fritsch made changes - 5 days ago

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.11 [ 27614 ]
Resolution		Incomplete [ 4 ]
Status	Needs Feedback [ 10501 ]	Closed [ 6 ]

People

Assignee:: Debarun Banerjee

Reporter:: Xesh

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Due:: 2025-04-21

Created:: 2025-01-07 12:06

Updated:: 5 days ago 06:38

Resolved:: 5 days ago 06:38

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration