Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29372

mariabackup runs into a repeated loop InnoDB: Read redo log up to LSN when doing full backup

Details

    Description

      Since upgrading MariaDB to 10.4.26 our mariabackup processes don't finish, but simply run into a loop:
      [01] 2022-08-24 15:48:06 ...done
      [00] 2022-08-24 15:48:07 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:08 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:09 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:10 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:11 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:12 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:13 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:14 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:15 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:16 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:17 >> log scanned up to (377301702)
      [00] 2022-08-24 15:48:18 >> log scanned up to (377301702)
      2022-08-24 15:48:19 0 [Note] InnoDB: Read redo log up to LSN=377302016

      Here is the command I run:
      mariabackup --defaults-file=/etc/mysql/backup.cnf --extra-lsndir=/var/lib/mysql_backups/2022-08-24 --backup --compress --stream=xbstream --binlog-info=ON --parallel=2 --compress-threads=2 --galera-info

      Seems like Acquiring BACKUP LOCKS... step isn't happening in the new version with full backup.
      Not seeing this output in my logs with the new version like I do with previous version 10.4.25:
      2022-08-24 15:59:34 62 [Note] WSREP: Desyncing and pausing the provider
      2022-08-24 15:59:34 0 [Note] WSREP: Member 3.0 (mwpdbbackup-staging14) desyncs itself from group
      2022-08-24 15:59:34 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 68)
      2022-08-24 15:59:34 62 [Note] WSREP: pause
      2022-08-24 15:59:34 62 [Note] WSREP: Provider paused at aadbd2c9-f7a7-11ec-8153-d6a691037b36:68 (6)
      2022-08-24 15:59:34 62 [Note] WSREP: Provider paused at: 68
      2022-08-24 15:59:35 62 [Note] WSREP: Resuming and resyncing the provider
      2022-08-24 15:59:35 62 [Note] WSREP: resume
      2022-08-24 15:59:35 62 [Note] WSREP: resuming provider at 6
      2022-08-24 15:59:35 62 [Note] WSREP: Provider resumed.
      2022-08-24 15:59:35 0 [Note] WSREP: Member 3.0 (mwpdbbackup-staging14) resyncs itself to group.
      2022-08-24 15:59:35 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 68)
      2022-08-24 15:59:35 0 [Note] WSREP: Processing event queue:... -nan% (0/0 events) complete.
      2022-08-24 15:59:35 0 [Note] WSREP: Member 3.0 (mwpdbbackup-staging14) synced with group.
      2022-08-24 15:59:35 0 [Note] WSREP: Processing event queue:...100.0% (1/1 events) complete.
      2022-08-24 15:59:35 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 68)
      2022-08-24 15:59:35 12 [Note] WSREP: Server mwpdbbackup-staging14 synced with group

      I can do incremental backup on an existing full backup successfully, but I cannot create a new full backup
      When I run incremental, I use the following command:
      mariabackup --defaults-file=/etc/mysql/backup.cnf --extra-lsndir=/var/lib/mysql_backups/2022-08-24 --backup --compress --stream=xbstream --binlog-info=ON --parallel=2 --compress-threads=2 --galera-info --incremental-lsn=377305294
      and I see the host desync itself and successfully complete the backup.

      [00] 2022-08-24 15:59:34 >> log scanned up to (377305303)
      [00] 2022-08-24 15:59:34 Acquiring BACKUP LOCKS...
      [00] 2022-08-24 15:59:34 Starting to backup non-InnoDB tables and files

      When I specify a full backup to be taken, using this command, it loops through the LSN reads:
      mariabackup --defaults-file=/etc/mysql/backup.cnf --extra-lsndir=/var/lib/mysql_backups/2022-08-24 --backup --compress --stream=xbstream --binlog-info=ON --parallel=2 --compress-threads=2 --galera-info
      and I don't see it desyncing from the cluster
      It looks like it doesn't finish the step to move onto non-innodb tables and files.

      I've tried turning the innodb_instances to 1 (as described by this bug report in case it was related) but it doesn't resolve the issue.

      Attachments

        Issue Links

          Activity

            By design, mariabackup uses a dedicated thread that keeps copying or polling the InnoDB redo log for any new changes. Concurrently with this thread, any data files will be copied.

            The log-copying thread is not supposed to terminate until the very end. If there is no file copying activity going on, then the progress should actually be blocked by the server. At the end, mariabackup will issue a command like FLUSH TABLES WITH READ LOCK or BACKUP STAGE BLOCK_COMMIT. Depending on the operations that are concurrently executing on the server, it may take a long time for such a command to complete.

            Please check what is happening on the server side, by issuing a command like SHOW PROCESSLIST.

            marko Marko Mäkelä added a comment - By design, mariabackup uses a dedicated thread that keeps copying or polling the InnoDB redo log for any new changes. Concurrently with this thread, any data files will be copied. The log-copying thread is not supposed to terminate until the very end. If there is no file copying activity going on, then the progress should actually be blocked by the server. At the end, mariabackup will issue a command like FLUSH TABLES WITH READ LOCK or BACKUP STAGE BLOCK_COMMIT . Depending on the operations that are concurrently executing on the server, it may take a long time for such a command to complete. Please check what is happening on the server side, by issuing a command like SHOW PROCESSLIST .
            walkleyn Walter added a comment - - edited

            Hi Marko,
            Thanks for the response.
            Here is the processlist. Are you able to spot anything out of the ordinary? Not sure if the PMM agent could play a role here?

            MariaDB [(none)]> SHOW PROCESSLIST;
            +------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
            | Id   | User        | Host      | db   | Command | Time | State                    | Info             | Progress |
            +------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
            |    1 | system user |           | NULL | Sleep   | 2669 | wsrep aborter idle       | NULL             |    0.000 |
            |    2 | system user |           | NULL | Sleep   | 2669 | wsrep applier idle       | NULL             |    0.000 |
            |    4 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
            |    5 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
            |    6 | system user |           | NULL | Daemon  | NULL | InnoDB purge coordinator | NULL             |    0.000 |
            |    7 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
            |    8 | system user |           | NULL | Daemon  | NULL | InnoDB shutdown handler  | NULL             |    0.000 |
            |   12 | system user |           | NULL | Sleep   | 2666 | Closing tables           | NULL             |    0.000 |
            |   13 | system user |           | NULL | Sleep   | 2666 | Closing tables           | NULL             |    0.000 |
            |   15 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   14 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   16 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   17 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   22 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   18 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   21 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   23 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   20 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   24 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   19 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   27 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   25 | system user |           | NULL | Sleep   | 2666 | wsrep applier idle       | NULL             |    0.000 |
            |   33 | pmm         | localhost | NULL | Sleep   |    2 |                          | NULL             |    0.000 |
            |  407 | mariabackup | localhost | NULL | Sleep   | 1855 |                          | NULL             |    0.000 |
            |  410 | root        | localhost | NULL | Query   |    0 | Init                     | SHOW PROCESSLIST |    0.000 |
            | 1302 | pmm         | localhost | NULL | Sleep   |    2 |                          | NULL             |    0.000 |
            | 1303 | pmm         | localhost | NULL | Sleep   |    2 |                          | NULL             |    0.000 |
            | 1320 | pmm         | localhost | NULL | Sleep   |    2 |                          | NULL             |    0.000 |
            +------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
            

            It has been running for almost an hour now, where it usually finishes in seconds.
            I can successfully use the 10.4.25 version of mariabackup binary against the 10.4.26 mariadb-server packages.

            walkleyn Walter added a comment - - edited Hi Marko, Thanks for the response. Here is the processlist. Are you able to spot anything out of the ordinary? Not sure if the PMM agent could play a role here? MariaDB [(none)]> SHOW PROCESSLIST; +------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2669 | wsrep aborter idle | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2669 | wsrep applier idle | NULL | 0.000 | | 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | | 5 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | | 6 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 | | 7 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | | 8 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 | | 12 | system user | | NULL | Sleep | 2666 | Closing tables | NULL | 0.000 | | 13 | system user | | NULL | Sleep | 2666 | Closing tables | NULL | 0.000 | | 15 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 14 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 16 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 17 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 22 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 18 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 21 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 23 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 20 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 24 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 19 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 27 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 25 | system user | | NULL | Sleep | 2666 | wsrep applier idle | NULL | 0.000 | | 33 | pmm | localhost | NULL | Sleep | 2 | | NULL | 0.000 | | 407 | mariabackup | localhost | NULL | Sleep | 1855 | | NULL | 0.000 | | 410 | root | localhost | NULL | Query | 0 | Init | SHOW PROCESSLIST | 0.000 | | 1302 | pmm | localhost | NULL | Sleep | 2 | | NULL | 0.000 | | 1303 | pmm | localhost | NULL | Sleep | 2 | | NULL | 0.000 | | 1320 | pmm | localhost | NULL | Sleep | 2 | | NULL | 0.000 | +------+-------------+-----------+------+---------+------+--------------------------+------------------+----------+ It has been running for almost an hour now, where it usually finishes in seconds. I can successfully use the 10.4.25 version of mariabackup binary against the 10.4.26 mariadb-server packages.

            Sorry, I failed to notice the --compress option. This looks like a duplicate of MDEV-29043.

            marko Marko Mäkelä added a comment - Sorry, I failed to notice the --compress option. This looks like a duplicate of MDEV-29043 .

            walkleyn, I'm going to close it as a duplicate of MDEV-29043. But if you'll find that the next release didn't fix the bug, please, do comment and we'll reopen it.

            serg Sergei Golubchik added a comment - walkleyn , I'm going to close it as a duplicate of MDEV-29043 . But if you'll find that the next release didn't fix the bug, please, do comment and we'll reopen it.

            People

              marko Marko Mäkelä
              walkleyn Walter
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.