[MDEV-36159] mariabackup failed after upgrade 10.11.10 - Jira

Details

Type: Bug
Status: In Progress (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.11.10
Fix Version/s: 10.11
Component/s: Backup
Labels:
- performance

Description

Hi Team,

We upgraded from 10.11.8 to 10.11.10 two weeks ago and since then mariabackup keeps failing with the following logs.

[00] 2025-02-24 00:41:42 Waiting for log copy thread to read lsn 160568393706001

[00] 2025-02-24 00:41:43 Retrying read of log at LSN=160515356496404

[00] 2025-02-24 00:41:44 Retrying read of log at LSN=160515356496404

[00] 2025-02-24 00:41:45 Retrying read of log at LSN=160515356496404

[00] 2025-02-24 00:41:46 Retrying read of log at LSN=160515356496404

[00] 2025-02-24 00:41:47 Retrying read of log at LSN=160515356496404

[00] 2025-02-24 00:41:47 Was only able to copy log from 160487092534834 to 160515356496404, not 160568393706001; try increasing innodb_log_file_size

mariabackup: Stopping log copying thread.

I judged that this was caused by a small innodb_log_file_size value, so I tested to change it to an appropriate value and got the following results.

MariaDB engine ver : 10.11.10 / mariabackup engine ver : 10.11.10
innodb_log_file_size = 1G - failed
innodb_log_file_size = 4G - failed
innodb_log_file_size = 8G - failed

MariaDB engine ver : 10.11.8 / mariabackup engine ver : 10.11.8
innodb_log_file_size = 1G - success
innodb_log_file_size = 4G - success

MariaDB engine ver : 10.11.10 / mariabackup engine ver : 10.11.8
innodb_log_file_size = 1G - success

Is this a new bug different from MDEV-34062?
I would like to know if there is any impact on the acceptability of backups from engine 10.11.10 to mariabackup 10.11.8 in production environments.

Thanks and Regard.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

2025-02-22_pnucedbm02_backup_failed.log
5.87 MB
2025-03-05 01:54
2025-03-01_pnucedbm02_backup_success.log
1.26 MB
2025-03-05 01:54
backup_failed.log
631 kB
2025-03-13 01:31
backup_success.log
409 kB
2025-03-13 01:31
innodb_parameter.txt
25 kB
2025-02-25 08:54

Issue Links

relates to

MDEV-14992 BACKUP: in-server backup

Open

MDEV-30000 make mariadb-backup to force an innodb checkpoint

Closed

MDEV-34062 mariadb-backup --backup is extremely slow at copying ib_logfile0

Closed

MDEV-34070 mariadb-backup --prepare with larger --use-memory may be slower

Confirmed

MDEV-35791 mariadb-backup 10.11.10 failed to create backup

Needs Feedback

Activity

Ascending order - Click to sort in descending order

baek seung ho created issue - 2025-02-25 08:56

baek seung ho made changes - 2025-02-25 08:57

Field	Original Value	New Value
Summary	mariaback failed after upgrade 10.11.10	mariabackup failed after upgrade 10.11.10

Marko Mäkelä made changes - 2025-02-25 09:30

Link

This issue relates to MDEV-14992 [ MDEV-14992 ]

Marko Mäkelä made changes - 2025-02-25 09:30

Link

This issue relates to ~~MDEV-30000~~ [ ~~MDEV-30000~~ ]

Marko Mäkelä made changes - 2025-02-25 09:30

Link

This issue relates to MDEV-35791 [ MDEV-35791 ]

Marko Mäkelä added a comment - 2025-02-25 09:30

According to the numbers in the message, backup would need to copy 81,301,171,167 bytes (81 GB, 75.7 GiB) of log since the latest checkpoint at the time when the backup was started. It only managed to copy 28,263,961,570 bytes, a bit less than a third of that. Also that is a rather good achievement, because the circular ib_logfile0 that you configured must have been overwritten over 9 times (over 28 times if you used innodb_log_file_size=1g) while the backup is in progress.

If you had configured a large enough log file size, then this failure should occur only when the log is corrupted, possibly due to a file system error. MDEV-35791 might be such a case.

Given that the amount of log that needs to be copied is much larger than the configured log file size, I do not think that an attempt to force more frequent checkpoints (as discussed in ~~MDEV-30000~~) would help. What would definitely help would be to have some form of server-assisted log copying (MDEV-14992) or log archiving. In that way, the server would automatically throttle is write activity to ensure that the log for the backup is not missing anything.

Marko Mäkelä added a comment - 2025-02-25 09:30 According to the numbers in the message, backup would need to copy 81,301,171,167 bytes (81 GB, 75.7 GiB) of log since the latest checkpoint at the time when the backup was started. It only managed to copy 28,263,961,570 bytes, a bit less than a third of that. Also that is a rather good achievement, because the circular ib_logfile0 that you configured must have been overwritten over 9 times (over 28 times if you used innodb_log_file_size=1g ) while the backup is in progress. If you had configured a large enough log file size, then this failure should occur only when the log is corrupted, possibly due to a file system error. MDEV-35791 might be such a case. Given that the amount of log that needs to be copied is much larger than the configured log file size, I do not think that an attempt to force more frequent checkpoints (as discussed in MDEV-30000 ) would help. What would definitely help would be to have some form of server-assisted log copying ( MDEV-14992 ) or log archiving. In that way, the server would automatically throttle is write activity to ensure that the log for the backup is not missing anything.

Marko Mäkelä added a comment - 2025-02-25 09:32

Can you test if forcing an InnoDB log checkpoint immediately before starting the backup, as discussed in ~~MDEV-30000~~, would improve the chances of completing the backup?

Marko Mäkelä added a comment - 2025-02-25 09:32 Can you test if forcing an InnoDB log checkpoint immediately before starting the backup, as discussed in MDEV-30000 , would improve the chances of completing the backup?

Marko Mäkelä made changes - 2025-02-25 09:32

Status

Open [ 1 ]

Needs Feedback [ 10501 ]

baek seung ho added a comment - 2025-02-26 04:41

We did a backup on 10.11.10 after causing a checkpoint as follows, but the backup failed the same way.

SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.01;

SET GLOBAL innodb_max_dirty_pages_pct_lwm=0;

Is there any other way besides this?

If we need to increase innodb_log_file_size, how much should we increase it?

baek seung ho added a comment - 2025-02-26 04:41 We did a backup on 10.11.10 after causing a checkpoint as follows, but the backup failed the same way. SET GLOBAL innodb_max_dirty_pages_pct_lwm= 0.01 ; SET GLOBAL innodb_max_dirty_pages_pct_lwm= 0 ; Is there any other way besides this? If we need to increase innodb_log_file_size, how much should we increase it?

Elena Stepanova made changes - 2025-02-26 06:32

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

Elena Stepanova made changes - 2025-02-26 06:33

Assignee

Marko Mäkelä [ marko ]

baek seung ho added a comment - 2025-02-26 09:24

First, let's change the operating environment as follows and check if the backup is successful.

innodb_log_file_size : 16GB
innodb_log_buffer_size : 64MB

baek seung ho added a comment - 2025-02-26 09:24 First, let's change the operating environment as follows and check if the backup is successful. innodb_log_file_size : 16GB innodb_log_buffer_size : 64MB

Agbo Steven added a comment - 2025-02-27 08:08

Hi,

Same behavior here with the version: mariadb-backup 1:10.11.11+maria~deb12

With these specs:

innodb_log_file_size = 8024M
innodb_log_buffer_size = 32M

==> Failed

innodb_log_file_size = 16048M
innodb_log_buffer_size = 64M

==> Failed

Agbo Steven added a comment - 2025-02-27 08:08 Hi, Same behavior here with the version: mariadb-backup 1:10.11.11+maria~deb12 With these specs: innodb_log_file_size = 8024M innodb_log_buffer_size = 32M ==> Failed innodb_log_file_size = 16048M innodb_log_buffer_size = 64M ==> Failed

Tomáš Mózes added a comment - 2025-02-28 12:48

Same here with MariaDB 10.11.11.

# mariadb-backup --user=root --backup --stream=xbstream

...

[00] 2025-02-27 21:19:52 Finished backing up non-InnoDB tables and files

[00] 2025-02-27 21:19:52 Waiting for log copy thread to read lsn 14369606880777

[00] 2025-02-27 21:19:53 Retrying read of log at LSN=14369590462614

[00] 2025-02-27 21:19:54 Retrying read of log at LSN=14369590462614

[00] 2025-02-27 21:19:55 Retrying read of log at LSN=14369590462614

[00] 2025-02-27 21:19:56 Retrying read of log at LSN=14369590462614

[00] 2025-02-27 21:19:57 Was only able to copy log from 14369570090711 to 14369590462614, not 14369606880777; try increasing innodb_log_file_size

mariabackup: Stopping log copying thread.[00] 2025-02-27 21:19:57 Retrying read of log at LSN=14369590462614

# mariadb-backup --prepare --target-dir=.

[00] 2025-02-28 12:18:00 cd to /var/lib/mysql/

[00] 2025-02-28 12:18:00 open files limit requested 0, set to 1024

[00] FATAL ERROR: 2025-02-28 12:18:00 Can't open backup-my.cnf for reading

Tomáš Mózes added a comment - 2025-02-28 12:48 Same here with MariaDB 10.11.11. # mariadb-backup --user=root --backup --stream=xbstream ... [00] 2025-02-27 21:19:52 Finished backing up non-InnoDB tables and files [00] 2025-02-27 21:19:52 Waiting for log copy thread to read lsn 14369606880777 [00] 2025-02-27 21:19:53 Retrying read of log at LSN=14369590462614 [00] 2025-02-27 21:19:54 Retrying read of log at LSN=14369590462614 [00] 2025-02-27 21:19:55 Retrying read of log at LSN=14369590462614 [00] 2025-02-27 21:19:56 Retrying read of log at LSN=14369590462614 [00] 2025-02-27 21:19:57 Was only able to copy log from 14369570090711 to 14369590462614, not 14369606880777; try increasing innodb_log_file_size mariabackup: Stopping log copying thread.[00] 2025-02-27 21:19:57 Retrying read of log at LSN=14369590462614 # mariadb-backup --prepare --target-dir=. [00] 2025-02-28 12:18:00 cd to /var/lib/mysql/ [00] 2025-02-28 12:18:00 open files limit requested 0, set to 1024 [00] FATAL ERROR: 2025-02-28 12:18:00 Can't open backup-my.cnf for reading

baek seung ho added a comment - 2025-02-28 16:10

@Tomas Mozes
Can you tell me what your innodb_log_file_size setting is?

baek seung ho added a comment - 2025-02-28 16:10 @Tomas Mozes Can you tell me what your innodb_log_file_size setting is?

Tomáš Mózes added a comment - 2025-02-28 16:37

MariaDB [(none)]> show variables like 'innodb_log_file_size';

+----------------------+-----------+

| Variable_name        | Value     |

+----------------------+-----------+

| innodb_log_file_size | 100663296 |

+----------------------+-----------+

1 row in set (0.001 sec)

Tomáš Mózes added a comment - 2025-02-28 16:37 MariaDB [(none)]> show variables like 'innodb_log_file_size'; +----------------------+-----------+ | Variable_name | Value | +----------------------+-----------+ | innodb_log_file_size | 100663296 | +----------------------+-----------+ 1 row in set (0.001 sec)

Miika Kankare added a comment - 2025-03-01 13:44

Our backups have also been failing intermittently for a while. Roughly every two weeks.

This started happening some time last year. I think after the upgrade from 10.11.7 to 10.11.9. But as the problem didn't start immediately, I can't rule out all other differences. But perhaps that gives some indication when a potential bug may have appeared. Now we are running 10.11.11 with the problem still persisting.

The database is fairly small with a limited number of traffic, and if I've learned something from this thread and the linked ones, our innodb_log_file_size set at 1G is plenty for this (probably too much):

[00] 2025-02-20 06:16:04 Was only able to copy log from 297374420802 to 297377118180, not 297377133957; try increasing innodb_log_file_size

According to the logs mmap is used and since the amount of data is rather small, I think this isn't a performance problem.

To debug, I've already added a new disk to the server (it's a cloud VM) for the destination of the backups. The source database is still on the same disk it has been since the beginning. I could perhaps take the server offline next and run a check on the filesystem, but somehow I feel like there's something else going on here.

Miika Kankare added a comment - 2025-03-01 13:44 Our backups have also been failing intermittently for a while. Roughly every two weeks. This started happening some time last year. I think after the upgrade from 10.11.7 to 10.11.9. But as the problem didn't start immediately, I can't rule out all other differences. But perhaps that gives some indication when a potential bug may have appeared. Now we are running 10.11.11 with the problem still persisting. The database is fairly small with a limited number of traffic, and if I've learned something from this thread and the linked ones, our innodb_log_file_size set at 1G is plenty for this (probably too much): [00] 2025-02-20 06:16:04 Was only able to copy log from 297374420802 to 297377118180, not 297377133957; try increasing innodb_log_file_size According to the logs mmap is used and since the amount of data is rather small, I think this isn't a performance problem. To debug, I've already added a new disk to the server (it's a cloud VM) for the destination of the backups. The source database is still on the same disk it has been since the beginning. I could perhaps take the server offline next and run a check on the filesystem, but somehow I feel like there's something else going on here.

Marko Mäkelä added a comment - 2025-03-03 05:35

For those failures where the amount of data to be copied is small compared to the server’s innodb_log_file_size, MDEV-36201 might share a root cause with this. Unfortunately, to analyze this, I would need a copy of all data and logs, at the very least including an affected server’s ib_logfile0 file and the output of mariadb-backup --backup. If someone can reproduce this with some dummy data that can be shared, that would be great.

Marko Mäkelä added a comment - 2025-03-03 05:35 For those failures where the amount of data to be copied is small compared to the server’s innodb_log_file_size , MDEV-36201 might share a root cause with this. Unfortunately, to analyze this, I would need a copy of all data and logs, at the very least including an affected server’s ib_logfile0 file and the output of mariadb-backup --backup . If someone can reproduce this with some dummy data that can be shared, that would be great.

baek seung ho added a comment - 2025-03-04 01:52

After changing innodb_log_file_size and innodb_log_buffer_size, the full backup on March 2 was successful, but the incremental backup on March 3 failed.
We believe this phenomenon is most likely a bug in mariabackup and would like to verify whether backing up MariaDB 10.11.10 with mariabackup 10.11.8 has no effect.

baek seung ho added a comment - 2025-03-04 01:52 After changing innodb_log_file_size and innodb_log_buffer_size, the full backup on March 2 was successful, but the incremental backup on March 3 failed. We believe this phenomenon is most likely a bug in mariabackup and would like to verify whether backing up MariaDB 10.11.10 with mariabackup 10.11.8 has no effect.

baek seung ho added a comment - 2025-03-04 02:18

The mariabackup test was conducted with sysbench.
The table size was about 40GB, and the backup was tested while performing an update with 100 threads using sysbench.
When conducting the test, when MariaDB was restarted to change the innodb_log_file_size, the first backup was almost successful, but the second and subsequent backups failed.

baek seung ho added a comment - 2025-03-04 02:18 The mariabackup test was conducted with sysbench. The table size was about 40GB, and the backup was tested while performing an update with 100 threads using sysbench. When conducting the test, when MariaDB was restarted to change the innodb_log_file_size, the first backup was almost successful, but the second and subsequent backups failed.

Marko Mäkelä added a comment - 2025-03-04 05:32

supbaek, thank you for the updates. Thanks to ~~MDEV-27812~~, you can actually invoke SET GLOBAL innodb_log_file_size while the server is running. However, if you concurrently run mariadb-backup --backup, it will hang as soon as the to-be-new log file ib_logfile101 replaces the original ib_logfile0.

Can you share your scripts that you use for reproducing this?

Marko Mäkelä added a comment - 2025-03-04 05:32 supbaek , thank you for the updates. Thanks to MDEV-27812 , you can actually invoke SET GLOBAL innodb_log_file_size while the server is running. However, if you concurrently run mariadb-backup --backup , it will hang as soon as the to-be-new log file ib_logfile101 replaces the original ib_logfile0 . Can you share your scripts that you use for reproducing this?

baek seung ho made changes - 2025-03-05 01:54

Attachment

2025-03-01_pnucedbm02_backup_success.log [ 74763 ]

baek seung ho made changes - 2025-03-05 01:54

Attachment

2025-02-22_pnucedbm02_backup_failed.log [ 74764 ]

baek seung ho added a comment - 2025-03-05 02:11

I've attached the log files when mariabackup succeeded and failed, and I'll share the commands I used during the test.

sysbench /usr/share/sysbench/oltp_update_index.lua \

--threads=50 \

--mysql-host=192.168.6.28 \

--mysql-port=3306 \

--mysql-db=test \

--mysql-user=growin \

--mysql-password=growin \

--db-driver=mysql \

--tables=5 \

--table-size=10000000 \

--time=7200 \

$CMD

sh sysbench_for_mysql.sh run &

sh sysbench_for_mysql.sh run &

And I have a few questions.
We upgraded from 10.4.15 -> 10.11.3 -> 10.11.8 -> 10.11.10.
1. After the upgrade, innodb_log_file was reduced from 3 to 1. Are there any side effects?
2. I changed innodb_log_file_size (1.5G -> 16G) and innodb_log_buffer_size (32MB -> 64MB). Is there a monitoring indicator that can compare DB performance before and after the change?
3. Is there any problem if I back up MariaDB 10.11.10 with mariabackup 10.11.8?

baek seung ho added a comment - 2025-03-05 02:11 I've attached the log files when mariabackup succeeded and failed, and I'll share the commands I used during the test. sysbench /usr/share/sysbench/oltp_update_index.lua \ --threads= 50 \ --mysql-host= 192.168 . 6.28 \ --mysql-port= 3306 \ --mysql-db=test \ --mysql-user=growin \ --mysql-password=growin \ --db-driver=mysql \ --tables= 5 \ --table-size= 10000000 \ --time= 7200 \ $CMD sh sysbench_for_mysql.sh run & sh sysbench_for_mysql.sh run & And I have a few questions. We upgraded from 10.4.15 -> 10.11.3 -> 10.11.8 -> 10.11.10. 1. After the upgrade, innodb_log_file was reduced from 3 to 1. Are there any side effects? 2. I changed innodb_log_file_size (1.5G -> 16G) and innodb_log_buffer_size (32MB -> 64MB). Is there a monitoring indicator that can compare DB performance before and after the change? 3. Is there any problem if I back up MariaDB 10.11.10 with mariabackup 10.11.8?

Marko Mäkelä added a comment - 2025-03-05 07:17

Before innodb_log_files_in_group was hard-wired to 1 in MariaDB Server 10.5, it was possible to treat multiple log files as a single one. To retain the same log file size, you need to ensure that innodb_log_files_in_group multiplied by innodb_log_file_size will remain unchanged.

The log format is not supposed to change within a major release. mariadb-backup 10.11.8 will lack some performance fixes, such as ~~MDEV-34062~~. Some recovery or backup bugs that we find and fix based on our internal testing are on the "write" side (the server instance that was being backed up or that was killed), some on the "recovery" or mariadb-backup --prepare side.

Marko Mäkelä added a comment - 2025-03-05 07:17 Before innodb_log_files_in_group was hard-wired to 1 in MariaDB Server 10.5, it was possible to treat multiple log files as a single one. To retain the same log file size, you need to ensure that innodb_log_files_in_group multiplied by innodb_log_file_size will remain unchanged. The log format is not supposed to change within a major release. mariadb-backup 10.11.8 will lack some performance fixes, such as MDEV-34062 . Some recovery or backup bugs that we find and fix based on our internal testing are on the "write" side (the server instance that was being backed up or that was killed), some on the "recovery" or mariadb-backup --prepare side.

Marko Mäkelä added a comment - 2025-03-05 10:09

Thank you for the test script. I will need to return to this. The script might also be useful for testing MDEV-34070, which I did not get back to after fixing ~~MDEV-34062~~.

I checked the change history since 10.11.10, and I don’t think that anything could possibly have fixed this reported problem (which I have not reproduced yet).

Marko Mäkelä added a comment - 2025-03-05 10:09 Thank you for the test script. I will need to return to this. The script might also be useful for testing MDEV-34070 , which I did not get back to after fixing MDEV-34062 . I checked the change history since 10.11.10, and I don’t think that anything could possibly have fixed this reported problem (which I have not reproduced yet).

Marko Mäkelä made changes - 2025-03-05 10:09

Link

This issue relates to ~~MDEV-34062~~ [ ~~MDEV-34062~~ ]

Marko Mäkelä made changes - 2025-03-05 10:09

Link

This issue relates to MDEV-34070 [ MDEV-34070 ]

Marko Mäkelä made changes - 2025-03-05 15:27

Labels

performance

Marko Mäkelä made changes - 2025-03-05 15:27

Priority

Major [ 3 ]

Critical [ 2 ]

baek seung ho added a comment - 2025-03-06 07:33

If we need to backup database safely, should we downgrade to MariaDB 10.11.8? Or can we just use mariadb-backup 10.11.8 on MariaDB 10.11.10 database?
In mariadb-backup 10.11.8, even though some performance fixes are missing, if the major version is same, We are wondering if there is any problem doing the backup in upper minor version.

baek seung ho added a comment - 2025-03-06 07:33 If we need to backup database safely, should we downgrade to MariaDB 10.11.8? Or can we just use mariadb-backup 10.11.8 on MariaDB 10.11.10 database? In mariadb-backup 10.11.8, even though some performance fixes are missing, if the major version is same, We are wondering if there is any problem doing the backup in upper minor version.

Tomáš Mózes added a comment - 2025-03-06 07:58 - edited

After some experimenting, I've decided to switch back my replicas from 10.11.11 to 10.6.21 (primary on 10.6.20).

I've tried the backups on 3 different databases, ranging from 100GB to 1TB. On the smallest database, setting innodb_log_file_size = 10G helped and a mariadb-backup with 10.11.11 was successful. However on the bigger data sets, even 10GB wasn't enough, I had to raise to 50GB for the backup to work. On the biggest database (1TB) however, even setting to 300GB didn't help.

I've also tried using mariadb-backup from 10.11.8, but it didn't help either.

Today after downgrading one of the replicas from 10.11.11 -> 10.6.21, the backup works fine (tested 5x times) even with the default innodb_log_file_size value.

Tomáš Mózes added a comment - 2025-03-06 07:58 - edited After some experimenting, I've decided to switch back my replicas from 10.11.11 to 10.6.21 (primary on 10.6.20). I've tried the backups on 3 different databases, ranging from 100GB to 1TB. On the smallest database, setting innodb_log_file_size = 10G helped and a mariadb-backup with 10.11.11 was successful. However on the bigger data sets, even 10GB wasn't enough, I had to raise to 50GB for the backup to work. On the biggest database (1TB) however, even setting to 300GB didn't help. I've also tried using mariadb-backup from 10.11.8, but it didn't help either. Today after downgrading one of the replicas from 10.11.11 -> 10.6.21, the backup works fine (tested 5x times) even with the default innodb_log_file_size value.

JiraAutomate made changes - 2025-03-11 06:35

Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]

Julien Fritsch made changes - 2025-03-11 06:53

Fix Version/s		10.11 [ 27614 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]

Tomáš Mózes added a comment - 2025-03-11 16:41

By the way, the same problem occurs on 10.11 and 11.4. It does NOT happen on 10.6.

Tomáš Mózes added a comment - 2025-03-11 16:41 By the way, the same problem occurs on 10.11 and 11.4. It does NOT happen on 10.6.

baek seung ho made changes - 2025-03-13 01:31

Attachment		backup_success.log [ 74810 ]
Attachment		backup_failed.log [ 74811 ]

baek seung ho added a comment - 2025-03-13 01:32 - edited

Yesterday I have successfully backed up my stage database with mariabackup 10.11.10.
There is some option that is not noticed in the mariabackup options page of documents.
I think it is in the mariabackup for enterprise edition, not community, which is innodb-log-file-buffering and innodb-log-file-mmap.

I have the following questions:
1. when running mariabackup, if I use the --skip-innodb-log-file-buffering option, the backup completes normally, but if I run a backup with innodb-log-file-buffering turned off, is there any side effect on the server or the backup?

2. if a backup succeeds when run with innodb-log-file-buffering disabled, is there a reason for this? In our tests we noticed that ib_logfile0 in the backup file did not grow when the backup failed, but ib_logfile0 continued to grow when the backup was performed with innodb-log-file-buffering disabled.

3. what is the exact meaning of the parameter innodb_log_file_buffering? The description of the parameter says whether the file system cache is enabled for ib_logfile0, but I'm wondering what that means exactly.

4. can you tell me if this problem will be fixed in the next release 10.11.12 and when it will be released?

I will upload mariabackup logs which are both failed and success with --verbose option.
backup_failed.log
backup_success.log

baek seung ho added a comment - 2025-03-13 01:32 - edited Yesterday I have successfully backed up my stage database with mariabackup 10.11.10. There is some option that is not noticed in the mariabackup options page of documents. I think it is in the mariabackup for enterprise edition, not community, which is innodb-log-file-buffering and innodb-log-file-mmap. I have the following questions: 1. when running mariabackup, if I use the --skip-innodb-log-file-buffering option, the backup completes normally, but if I run a backup with innodb-log-file-buffering turned off, is there any side effect on the server or the backup? 2. if a backup succeeds when run with innodb-log-file-buffering disabled, is there a reason for this? In our tests we noticed that ib_logfile0 in the backup file did not grow when the backup failed, but ib_logfile0 continued to grow when the backup was performed with innodb-log-file-buffering disabled. 3. what is the exact meaning of the parameter innodb_log_file_buffering? The description of the parameter says whether the file system cache is enabled for ib_logfile0, but I'm wondering what that means exactly. 4. can you tell me if this problem will be fixed in the next release 10.11.12 and when it will be released? I will upload mariabackup logs which are both failed and success with --verbose option. backup_failed.log backup_success.log

Marko Mäkelä added a comment - 2025-03-13 06:19

The fundamental difference between 10.6 and 10.11 is that until ~~MDEV-14425~~ was implemented, the write-ahead log ib_logfile0 was divided into 512-byte blocks. Backup would copy these log blocks and validate the CRC-32C. It would not try to parse individual log records. This format was slow to write, because InnoDB would hold log_sys.mutex while copying data into log blocks, optionally encrypting the blocks (innodb_encrypt_log=ON) and computing the CRC-32C. The new format makes each individual mini-transaction a ‘block’ on its own. This allows any threads that modify persistent data to perform the encryption and CRC-32C concurrently. Also the actual memcpy() into the log buffer log_sys.buf is concurrent. Concurrency will be improved even further after the bottleneck MDEV-21923 has been removed.

While the server has gotten faster to write the log, backup has gotten slower, because it is only copying and parsing the ib_logfile0 in one thread, and it now has to parse individual log records in order to find the mini-transaction boundaries and to be able to validate the CRC-32C for each mini-transaction. This creates a producer-consumer buffer overflow problem. The fix of ~~MDEV-30000~~ could alleviate this a little, by forcing a checkpoint at the start of the backup, so that less log would have to be copied. Another possible help is to configure a larger innodb_log_file_size.

A better fix would be to integrate the backup in the server in some way (MDEV-14992) or to make the server responsible for producing a log for backups (something like log archiving). If the server were writing the log for backup in sync with the recovery log, it would naturally slow down. This is a large change that will take time to implement, and it would only appear in a new major release of MariaDB Server, and possibly in the MariaDB Enterprise Server 11.4 release.

The options in mariadb-backup are somewhat of a mess. The only part where innodb_log_file_buffering could make a difference is when reading the server’s ib_logfile0. innodb_log_file_buffering=OFF means that an attempt is made to open the log with O_DIRECT. Reading or writing the backed-up ib_logfile0 will not use O_DIRECT. The parameter was introduced in ~~MDEV-30136~~ when innodb_flush_method was deprecated. I made some tests in May 2024 in ~~MDEV-34062~~. The column "server innodb_log_file_mmap" in the tables is referring to a prototype that would allow the server to write log via mmap(). In the final version, this parameter only has effect during crash recovery or in backup, when the server’s log is being read. Those tests suggested that disabling O_DIRECT on the server for the log file or enabling memory-mapped access to parsing the file would enable the Linux kernel block cache. Of course, the results could vary between file system and kernel versions. I tested it only on one system.

Marko Mäkelä added a comment - 2025-03-13 06:19 The fundamental difference between 10.6 and 10.11 is that until MDEV-14425 was implemented, the write-ahead log ib_logfile0 was divided into 512-byte blocks. Backup would copy these log blocks and validate the CRC-32C. It would not try to parse individual log records. This format was slow to write, because InnoDB would hold log_sys.mutex while copying data into log blocks, optionally encrypting the blocks ( innodb_encrypt_log=ON ) and computing the CRC-32C. The new format makes each individual mini-transaction a ‘block’ on its own. This allows any threads that modify persistent data to perform the encryption and CRC-32C concurrently. Also the actual memcpy() into the log buffer log_sys.buf is concurrent. Concurrency will be improved even further after the bottleneck MDEV-21923 has been removed. While the server has gotten faster to write the log, backup has gotten slower, because it is only copying and parsing the ib_logfile0 in one thread, and it now has to parse individual log records in order to find the mini-transaction boundaries and to be able to validate the CRC-32C for each mini-transaction. This creates a producer-consumer buffer overflow problem. The fix of MDEV-30000 could alleviate this a little, by forcing a checkpoint at the start of the backup, so that less log would have to be copied. Another possible help is to configure a larger innodb_log_file_size . A better fix would be to integrate the backup in the server in some way ( MDEV-14992 ) or to make the server responsible for producing a log for backups (something like log archiving). If the server were writing the log for backup in sync with the recovery log, it would naturally slow down. This is a large change that will take time to implement, and it would only appear in a new major release of MariaDB Server, and possibly in the MariaDB Enterprise Server 11.4 release. The options in mariadb-backup are somewhat of a mess. The only part where innodb_log_file_buffering could make a difference is when reading the server’s ib_logfile0 . innodb_log_file_buffering=OFF means that an attempt is made to open the log with O_DIRECT . Reading or writing the backed-up ib_logfile0 will not use O_DIRECT . The parameter was introduced in MDEV-30136 when innodb_flush_method was deprecated. I made some tests in May 2024 in MDEV-34062 . The column "server innodb_log_file_mmap " in the tables is referring to a prototype that would allow the server to write log via mmap() . In the final version, this parameter only has effect during crash recovery or in backup, when the server’s log is being read. Those tests suggested that disabling O_DIRECT on the server for the log file or enabling memory-mapped access to parsing the file would enable the Linux kernel block cache. Of course, the results could vary between file system and kernel versions. I tested it only on one system.

Marko Mäkelä added a comment - 2025-03-13 06:25

axel, can you please verify the claim that backup got more failure-prone between 10.11.8 and 10.11.10?

Marko Mäkelä added a comment - 2025-03-13 06:25 axel , can you please verify the claim that backup got more failure-prone between 10.11.8 and 10.11.10?

Marko Mäkelä made changes - 2025-03-13 06:25

Assignee

Marko Mäkelä [ marko ]

Axel Schwenke [ axel ]

Axel Schwenke made changes - 2025-03-18 12:33

Status

Open [ 1 ]

In Progress [ 3 ]

baek seung ho added a comment - 2025-03-20 08:18

I would like to know how the marie backup is going in the test, and I would like to know if there is a way to continue the backup while keeping the current DB version.

As an additional workaround, I would like to run backups on a slave, and if I backup using the Safe Slave Backup and Slave Info options, will there be any issues with backing up to the current version?

Also, I would like to know the release schedule for MariaDB 10.11.12.

baek seung ho added a comment - 2025-03-20 08:18 I would like to know how the marie backup is going in the test, and I would like to know if there is a way to continue the backup while keeping the current DB version. As an additional workaround, I would like to run backups on a slave, and if I backup using the Safe Slave Backup and Slave Info options, will there be any issues with backing up to the current version? Also, I would like to know the release schedule for MariaDB 10.11.12.

People

Assignee:: Axel Schwenke

Reporter:: baek seung ho

Votes:: 3 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2025-02-25 08:56

Updated:: 2025-03-21 08:55

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.