Details
-
Bug
-
Status: Stalled (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.11.10
Description
Hi Team,
We upgraded from 10.11.8 to 10.11.10 two weeks ago and since then mariabackup keeps failing with the following logs.
[00] 2025-02-24 00:41:42 Waiting for log copy thread to read lsn 160568393706001 |
[00] 2025-02-24 00:41:43 Retrying read of log at LSN=160515356496404 |
[00] 2025-02-24 00:41:44 Retrying read of log at LSN=160515356496404 |
[00] 2025-02-24 00:41:45 Retrying read of log at LSN=160515356496404 |
[00] 2025-02-24 00:41:46 Retrying read of log at LSN=160515356496404 |
[00] 2025-02-24 00:41:47 Retrying read of log at LSN=160515356496404 |
[00] 2025-02-24 00:41:47 Was only able to copy log from 160487092534834 to 160515356496404, not 160568393706001; try increasing innodb_log_file_size |
mariabackup: Stopping log copying thread.
|
I judged that this was caused by a small innodb_log_file_size value, so I tested to change it to an appropriate value and got the following results.
- MariaDB engine ver : 10.11.10 / mariabackup engine ver : 10.11.10
innodb_log_file_size = 1G - failed
innodb_log_file_size = 4G - failed
innodb_log_file_size = 8G - failed
- MariaDB engine ver : 10.11.8 / mariabackup engine ver : 10.11.8
innodb_log_file_size = 1G - success
innodb_log_file_size = 4G - success
- MariaDB engine ver : 10.11.10 / mariabackup engine ver : 10.11.8
innodb_log_file_size = 1G - success
Is this a new bug different from MDEV-34062?
I would like to know if there is any impact on the acceptability of backups from engine 10.11.10 to mariabackup 10.11.8 in production environments.
Thanks and Regard.
Attachments
Issue Links
- duplicates
-
MDEV-35791 mariadb-backup 10.11.10 failed to create backup
-
- Closed
-
- relates to
-
MDEV-14992 BACKUP: in-server backup
-
- Open
-
-
MDEV-30000 make mariadb-backup to force an innodb checkpoint
-
- Closed
-
-
MDEV-34062 mariadb-backup --backup is extremely slow at copying ib_logfile0
-
- Closed
-
-
MDEV-34070 mariadb-backup --prepare with larger --use-memory may be slower
-
- Confirmed
-
-
MDEV-14992 BACKUP: in-server backup
-
- Open
-
-
MDEV-21105 Port clone plugin API (MYSQL_CLONE_PLUGIN) from MySQL
-
- Closed
-
-
MDEV-21923 LSN allocation is a bottleneck
-
- Closed
-
I had a closer look at the backup logs for
10.11.8:
[00] 2025-04-03 16:29:37 Connecting to MariaDB server host: localhost, user: root,
password: not set, port: not set, socket: /tmp/mysqld.sock.sysbench
[00] 2025-04-03 16:29:37 BACKUP STAGE START
[00] 2025-04-03 16:29:37 Acquiring BACKUP LOCKS...
(copying table spaces)
[00] 2025-04-03 16:30:33 BACKUP STAGE FLUSH
(copying more table spaces)
[00] 2025-04-03 16:30:33 Finished backing up non-InnoDB tables and files
[00] 2025-04-03 16:30:33 Waiting for log copy thread to read lsn 14309104946
-- what happens in those 11 min?
[00] 2025-04-03 16:41:28 Start copying aria log file tail: /home/galera/var/regression//aria_log.00000001
[00] 2025-04-03 16:41:28 Stop copying aria log file tail: /home/galera/var/regression//aria_log.00000001, copied 0 bytes
[00] 2025-04-03 16:41:28 BACKUP STAGE BLOCK_COMMIT
(copy log tables)
(copy stats tables)
[00] 2025-04-03 16:41:28 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
[00] 2025-04-03 16:41:28 mariabackup: The latest check point (for incremental): '23536505333'
mariabackup: Stopping log copying thread......... (many more dots)
-- what happens in those 15 min?
[00] 2025-04-03 16:56:11 BACKUP STAGE END
[00] 2025-04-03 16:56:11 Executing BACKUP STAGE END
There are two periods where the backup process is mostly idle. The HDD holding the copied redo log sees ~2MB/s writes. That is both lower than the redo log is written on the server (~16MB/s) and what the HDD can do (~170MB/s).
In comparison the log for 10.11.10:
[00] 2025-04-03 19:38:34 Connecting to MariaDB server host: localhost, user: root,
password: not set, port: not set, socket: /tmp/mysqld.sock.sysbench
...
[00] 2025-04-03 19:38:41 BACKUP STAGE START
[00] 2025-04-03 19:38:41 Acquiring BACKUP LOCKS...
(copying table spaces)
[00] 2025-04-03 19:39:38 BACKUP STAGE FLUSH
(copying more table spaces)
[00] 2025-04-03 19:39:38 Finished backing up non-InnoDB tables and files
[00] 2025-04-03 19:39:38 Waiting for log copy thread to read lsn 14353301653
[00] 2025-04-03 19:39:38 Start copying aria log file tail: /home/galera/var/regression//aria_log.00000001
[00] 2025-04-03 19:39:38 Stop copying aria log file tail: /home/galera/var/regression//aria_log.00000001, copied 0 bytes
[00] 2025-04-03 19:39:38 BACKUP STAGE BLOCK_COMMIT
(copy log tables)
[00] 2025-04-03 19:39:38 mariabackup: The latest check point (for incremental): '12786279177'
[00] 2025-04-03 19:39:38 Waiting for log copy thread to read lsn 18446744073709551615
(copy stats tables)
[00] 2025-04-03 19:39:39 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
[00] 2025-04-03 19:39:39 BACKUP STAGE END
[00] 2025-04-03 19:39:39 Executing BACKUP STAGE END
Also look at the sequence of events. 10.11.10 prints out the latest check point before copying the stats tables. 10.11.8 does that imediately after copying the log tables and reports the latest checkpoint later.