[MDEV-29372] mariabackup runs into a repeated loop InnoDB: Read redo log up to LSN when doing full backup Created: 2022-08-24  Updated: 2022-08-27  Resolved: 2022-08-27

Status: Closed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.4.26
Fix Version/s: 10.3.37, 10.4.27, 10.5.18, 10.6.10, 10.7.6, 10.8.5

Type: Bug Priority: Major
Reporter: Walter Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: mariabackup
Environment:

Debian Buster in a 5 node Galera cluster


Issue Links:
Duplicate
duplicates MDEV-29043 mariabackup --compress --parallel=4 h... Closed

 Description   

Since upgrading MariaDB to 10.4.26 our mariabackup processes don't finish, but simply run into a loop:
[01] 2022-08-24 15:48:06 ...done
[00] 2022-08-24 15:48:07 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:08 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:09 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:10 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:11 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:12 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:13 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:14 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:15 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:16 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:17 >> log scanned up to (377301702)
[00] 2022-08-24 15:48:18 >> log scanned up to (377301702)
2022-08-24 15:48:19 0 [Note] InnoDB: Read redo log up to LSN=377302016

Here is the command I run:
mariabackup --defaults-file=/etc/mysql/backup.cnf --extra-lsndir=/var/lib/mysql_backups/2022-08-24 --backup --compress --stream=xbstream --binlog-info=ON --parallel=2 --compress-threads=2 --galera-info

Seems like Acquiring BACKUP LOCKS... step isn't happening in the new version with full backup.
Not seeing this output in my logs with the new version like I do with previous version 10.4.25:
2022-08-24 15:59:34 62 [Note] WSREP: Desyncing and pausing the provider
2022-08-24 15:59:34 0 [Note] WSREP: Member 3.0 (mwpdbbackup-staging14) desyncs itself from group
2022-08-24 15:59:34 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 68)
2022-08-24 15:59:34 62 [Note] WSREP: pause
2022-08-24 15:59:34 62 [Note] WSREP: Provider paused at aadbd2c9-f7a7-11ec-8153-d6a691037b36:68 (6)
2022-08-24 15:59:34 62 [Note] WSREP: Provider paused at: 68
2022-08-24 15:59:35 62 [Note] WSREP: Resuming and resyncing the provider
2022-08-24 15:59:35 62 [Note] WSREP: resume
2022-08-24 15:59:35 62 [Note] WSREP: resuming provider at 6
2022-08-24 15:59:35 62 [Note] WSREP: Provider resumed.
2022-08-24 15:59:35 0 [Note] WSREP: Member 3.0 (mwpdbbackup-staging14) resyncs itself to group.
2022-08-24 15:59:35 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 68)
2022-08-24 15:59:35 0 [Note] WSREP: Processing event queue:... -nan% (0/0 events) complete.
2022-08-24 15:59:35 0 [Note] WSREP: Member 3.0 (mwpdbbackup-staging14) synced with group.
2022-08-24 15:59:35 0 [Note] WSREP: Processing event queue:...100.0% (1/1 events) complete.
2022-08-24 15:59:35 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 68)
2022-08-24 15:59:35 12 [Note] WSREP: Server mwpdbbackup-staging14 synced with group

I can do incremental backup on an existing full backup successfully, but I cannot create a new full backup
When I run incremental, I use the following command:
mariabackup --defaults-file=/etc/mysql/backup.cnf --extra-lsndir=/var/lib/mysql_backups/2022-08-24 --backup --compress --stream=xbstream --binlog-info=ON --parallel=2 --compress-threads=2 --galera-info --incremental-lsn=377305294
and I see the host desync itself and successfully complete the backup.

[00] 2022-08-24 15:59:34 >> log scanned up to (377305303)
[00] 2022-08-24 15:59:34 Acquiring BACKUP LOCKS...
[00] 2022-08-24 15:59:34 Starting to backup non-InnoDB tables and files

When I specify a full backup to be taken, using this command, it loops through the LSN reads:
mariabackup --defaults-file=/etc/mysql/backup.cnf --extra-lsndir=/var/lib/mysql_backups/2022-08-24 --backup --compress --stream=xbstream --binlog-info=ON --parallel=2 --compress-threads=2 --galera-info
and I don't see it desyncing from the cluster
It looks like it doesn't finish the step to move onto non-innodb tables and files.

I've tried turning the innodb_instances to 1 (as described by this bug report in case it was related) but it doesn't resolve the issue.



 Comments   
Comment by Marko Mäkelä [ 2022-08-25 ]

By design, mariabackup uses a dedicated thread that keeps copying or polling the InnoDB redo log for any new changes. Concurrently with this thread, any data files will be copied.

The log-copying thread is not supposed to terminate until the very end. If there is no file copying activity going on, then the progress should actually be blocked by the server. At the end, mariabackup will issue a command like FLUSH TABLES WITH READ LOCK or BACKUP STAGE BLOCK_COMMIT. Depending on the operations that are concurrently executing on the server, it may take a long time for such a command to complete.

Please check what is happening on the server side, by issuing a command like SHOW PROCESSLIST.

Comment by Walter [ 2022-08-25 ]

Hi Marko,
Thanks for the response.
Here is the processlist. Are you able to spot anything out of the ordinary? Not sure if the PMM agent could play a role here?

MariaDB [(none)]> SHOW PROCESSLIST;
+------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
| Id   | User        | Host      | db   | Command | Time | State                    | Info             | Progress |
+------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
|    1 | system user |           | NULL | Sleep   | 2669 | wsrep aborter idle       | NULL             |    0.000 |
|    2 | system user |           | NULL | Sleep   | 2669 | wsrep applier idle       | NULL             |    0.000 |
|    4 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|    5 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|    6 | system user |           | NULL | Daemon  | NULL | InnoDB purge coordinator | NULL             |    0.000 |
|    7 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|    8 | system user |           | NULL | Daemon  | NULL | InnoDB shutdown handler  | NULL             |    0.000 |
|   12 | system user |           | NULL | Sleep   | 2666 | Closing tables           | NULL             |    0.000 |
|   13 | system user |           | NULL | Sleep   | 2666 | Closing tables           | NULL             |    0.000 |
|   15 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   14 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   16 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   17 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   22 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   18 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   21 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   23 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   20 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   24 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   19 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   27 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   25 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
|   33 | pmm         | localhost | NULL | Sleep   |    2 |                          | NULL             |    0.000 |
|  407 | mariabackup | localhost | NULL | Sleep   | 1855 |                          | NULL             |    0.000 |
|  410 | root        | localhost | NULL | Query   |    0 | Init                     | SHOW PROCESSLIST |    0.000 |
| 1302 | pmm         | localhost | NULL | Sleep   |    2 |                          | NULL             |    0.000 |
| 1303 | pmm         | localhost | NULL | Sleep   |    2 |                          | NULL             |    0.000 |
| 1320 | pmm         | localhost | NULL | Sleep   |    2 |                          | NULL             |    0.000 |
+------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+

It has been running for almost an hour now, where it usually finishes in seconds.
I can successfully use the 10.4.25 version of mariabackup binary against the 10.4.26 mariadb-server packages.

Comment by Marko Mäkelä [ 2022-08-26 ]

Sorry, I failed to notice the --compress option. This looks like a duplicate of MDEV-29043.

Comment by Sergei Golubchik [ 2022-08-27 ]

walkleyn, I'm going to close it as a duplicate of MDEV-29043. But if you'll find that the next release didn't fix the bug, please, do comment and we'll reopen it.

Generated at Thu Feb 08 10:08:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.