[MDEV-20588] Mariabackup incorrectly thinks a compressed table is corrupted - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.2.23
Fix Version/s: N/A
Component/s: Backup, mariabackup, Storage Engine - InnoDB
Labels:
- need_feedback
Environment:
Ubuntu 14.04, MariaDB Server 10.2.21, Mariabackup 10.2.23

Description

(copied in from a ServerFault question I asked a few days ago)

mariabackup is choking on a compressed table, preventing me from backing up my database. I'm invoking mariabackup like this:

mariabackup --backup --parallel=8 --compress --compress-threads=8 --tmpdir=/var/lib/mysql_backup/xbtmp --stream=xbstream

The initial output:

[00] 2019-09-11 17:29:23 Connecting to MySQL server host: localhost, user: root, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock

[00] 2019-09-11 17:29:23 Using server version 10.2.21-MariaDB-10.2.21+maria~trusty-log

mariabackup based on MariaDB server 10.2.23-MariaDB debian-linux-gnu (x86_64)

[00] 2019-09-11 17:29:23 uses posix_fadvise().

[00] 2019-09-11 17:29:23 cd to /var/lib/mysql/

[00] 2019-09-11 17:29:23 open files limit requested 65535, set to 65535

[00] 2019-09-11 17:29:23 mariabackup: using the following InnoDB configuration:

[00] 2019-09-11 17:29:23 innodb_data_home_dir =

[00] 2019-09-11 17:29:23 innodb_data_file_path = ibdata1:12M:autoextend

[00] 2019-09-11 17:29:23 innodb_log_group_home_dir = ./

[00] 2019-09-11 17:29:23 InnoDB: Using Linux native AIO

[00] 2019-09-11 17:29:23 using O_DIRECT

2019-09-11 17:29:23 140507208116096 [Note] InnoDB: Number of pools: 1

An hour or so later, it stumbles across a large compressed table and decides it's corrupted:

[06] 2019-09-11 18:43:24 Database page corruption detected at page 2155645, retrying...

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[00] 2019-09-11 18:43:25 >> log scanned up to (59569660988430)

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[00] 2019-09-11 18:43:26 >> log scanned up to (59569661525809)

[06] 2019-09-11 18:43:26 Error: failed to read page after 10 retries. File ./hidden_database_name/hidden_table_name.ibd seems to be corrupted.

2019-09-11 18:43:26 140506690463488 [Note] InnoDB: Page dump in ascii and hex (8192 bytes):

 len 8192; hex ...snip...

InnoDB: End of page dump

2019-09-11 18:43:26 140506690463488 [Note] InnoDB: Compressed page type (11); stored checksum in field1 348600413; calculated checksums for field1: crc32 348600413, innodb 1073927705, none 3735928559; page LSN 39856612572723; page number (if stored to page already) 2155645; space id (if stored to page already) 7077

InnoDB: Page may be a compressed BLOB page

[06] 2019-09-11 18:43:26 mariabackup: xtrabackup_copy_datafile() failed.

[00] FATAL ERROR: 2019-09-11 18:43:26 failed to copy datafile.

(I've snipped the contents of the page dump, as the table in question contains sensitive information. Please contact me directly if you need it.)

This appears to be a consistent issue. This first happened to another table, which I later decompressed with pt-online-schema-change, was verified to be consistent with the primary database using pt-table-checksum, which I believe rules out actual table corruption. Decompressing all of the tables is not a sustainable solution because there are many other tables to migrate and it would consume a much larger amount of storage.

I've just recently switched from xtrabackup because of the error Error: failed to execute query FLUSH NO_WRITE_TO_BINLOG TABLES: Query execution was interrupted (max_statement_time exceeded) during backup (see MDEV-18324) and the recommendation in the MariaDB documentation in favor of Mariabackup. (XtraBackup was working previously, but I'm unable to determine why because a hardware failure took down the replica backups were being taken from, prompting all of this.)

There is an open bug against Xtrabackup that describes this situation, but the Mariabackup documentation suggests that that issue has been fixed in Mariabackup. My experience suggests otherwise, hence this bug.

I would really appreciate assistance here, as I'd like to be able to backup my database again.

Attachments

Issue Links

relates to

MDEV-18644 Support FULL_CRC32 for compressed pages.

Closed

Activity

Ascending order - Click to sort in descending order

Jean Cardona added a comment - 2020-10-26 16:16

I have the exact same issue. Mariabackup is assuming that my compressed innodb tables are corrupted.
I recently switched from xtrabackup to mariabackup because I upgraded to mariadb 10.2.

Jean Cardona added a comment - 2020-10-26 16:16 I have the exact same issue. Mariabackup is assuming that my compressed innodb tables are corrupted. I recently switched from xtrabackup to mariabackup because I upgraded to mariadb 10.2.

Marko Mäkelä added a comment - 2021-02-12 14:18

Can we get a copy of such a corrupted page for analysis? For example, for this failure:

[06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying...

[00] 2019-09-11 18:43:26 >> log scanned up to (59569661525809)

[06] 2019-09-11 18:43:26 Error: failed to read page after 10 retries. File ./hidden_database_name/hidden_table_name.ibd seems to be corrupted.

(assuming innodb_page_size=16384) we would want

dd bs=16384 if=hidden_database_name/hidden_table_name.ibd count=1 skip=2155645 of=corrupted_page_2155645.bin

which would hopefully be identical to what was read by backup.

Is this repeatable with 10.4.4 or later, if the file was created with innodb_checksum_algorithm=full_crc32?

Marko Mäkelä added a comment - 2021-02-12 14:18 Can we get a copy of such a corrupted page for analysis? For example, for this failure: [06] 2019-09-11 18:43:25 Database page corruption detected at page 2155645, retrying... [00] 2019-09-11 18:43:26 >> log scanned up to (59569661525809) [06] 2019-09-11 18:43:26 Error: failed to read page after 10 retries. File ./hidden_database_name/hidden_table_name.ibd seems to be corrupted. (assuming innodb_page_size=16384 ) we would want dd bs=16384 if=hidden_database_name/hidden_table_name.ibd count=1 skip=2155645 of=corrupted_page_2155645.bin which would hopefully be identical to what was read by backup. Is this repeatable with 10.4.4 or later, if the file was created with innodb_checksum_algorithm=full_crc32 ?

People

Assignee:: Marko Mäkelä

Reporter:: Nick Meharry

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2019-09-13 19:09

Updated:: 2021-03-15 08:38

Resolved:: 2021-03-15 08:38

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server