Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20588

Mariabackup incorrectly thinks a compressed table is corrupted

    XMLWordPrintable

Details

    Description

      (copied in from a ServerFault question I asked a few days ago)

      mariabackup is choking on a compressed table, preventing me from backing up my database. I'm invoking mariabackup like this:

      mariabackup --backup --parallel=8 --compress --compress-threads=8 --tmpdir=/var/lib/mysql_backup/xbtmp --stream=xbstream
      

      The initial output:

      [00] 2019-09-11 17:29:23 Connecting to MySQL server host: localhost, user: root, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock
      [00] 2019-09-11 17:29:23 Using server version 10.2.21-MariaDB-10.2.21+maria~trusty-log
      mariabackup based on MariaDB server 10.2.23-MariaDB debian-linux-gnu (x86_64)
      [00] 2019-09-11 17:29:23 uses posix_fadvise().
      [00] 2019-09-11 17:29:23 cd to /var/lib/mysql/
      [00] 2019-09-11 17:29:23 open files limit requested 65535, set to 65535
      [00] 2019-09-11 17:29:23 mariabackup: using the following InnoDB configuration:
      [00] 2019-09-11 17:29:23 innodb_data_home_dir =
      [00] 2019-09-11 17:29:23 innodb_data_file_path = ibdata1:12M:autoextend
      [00] 2019-09-11 17:29:23 innodb_log_group_home_dir = ./
      [00] 2019-09-11 17:29:23 InnoDB: Using Linux native AIO
      [00] 2019-09-11 17:29:23 using O_DIRECT
      2019-09-11 17:29:23 140507208116096 [Note] InnoDB: Number of pools: 1
      

      An hour or so later, it stumbles across a large compressed table and decides it's corrupted:

      [06] 2019-09-11 18:43:24 Database page corruption detected at page 2155645, retrying...
      [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...
      [00] 2019-09-11 18:43:25 >> log scanned up to (59569660988430)
      [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...
      [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...
      [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...
      [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...
      [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...
      [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...
      [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...
      [00] 2019-09-11 18:43:26 >> log scanned up to (59569661525809)
      [06] 2019-09-11 18:43:26 Error: failed to read page after 10 retries. File ./hidden_database_name/hidden_table_name.ibd seems to be corrupted.
      2019-09-11 18:43:26 140506690463488 [Note] InnoDB: Page dump in ascii and hex (8192 bytes):
       len 8192; hex ...snip...
      InnoDB: End of page dump
      2019-09-11 18:43:26 140506690463488 [Note] InnoDB: Compressed page type (11); stored checksum in field1 348600413; calculated checksums for field1: crc32 348600413, innodb 1073927705, none 3735928559; page LSN 39856612572723; page number (if stored to page already) 2155645; space id (if stored to page already) 7077
      InnoDB: Page may be a compressed BLOB page
      [06] 2019-09-11 18:43:26 mariabackup: xtrabackup_copy_datafile() failed.
      [00] FATAL ERROR: 2019-09-11 18:43:26 failed to copy datafile.
      

      (I've snipped the contents of the page dump, as the table in question contains sensitive information. Please contact me directly if you need it.)

      This appears to be a consistent issue. This first happened to another table, which I later decompressed with pt-online-schema-change, was verified to be consistent with the primary database using pt-table-checksum, which I believe rules out actual table corruption. Decompressing all of the tables is not a sustainable solution because there are many other tables to migrate and it would consume a much larger amount of storage.

      I've just recently switched from xtrabackup because of the error Error: failed to execute query FLUSH NO_WRITE_TO_BINLOG TABLES: Query execution was interrupted (max_statement_time exceeded) during backup (see MDEV-18324) and the recommendation in the MariaDB documentation in favor of Mariabackup. (XtraBackup was working previously, but I'm unable to determine why because a hardware failure took down the replica backups were being taken from, prompting all of this.)

      There is an open bug against Xtrabackup that describes this situation, but the Mariabackup documentation suggests that that issue has been fixed in Mariabackup. My experience suggests otherwise, hence this bug.

      I would really appreciate assistance here, as I'd like to be able to backup my database again.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              nickmeharry Nick Meharry
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.