[MDEV-11617] Occasional "failed to read page after 10 retries" with partial encrypted backup Created: 2016-12-20  Updated: 2020-05-04  Resolved: 2017-01-09

Status: Closed
Project: MariaDB Server
Component/s: Backup
Affects Version/s: 10.1.20
Fix Version/s: 10.1.22

Type: Bug Priority: Major
Reporter: Andrii Nikitin (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File 1.opt     File 1.test    
Sprint: 10.1.22

 Description   

Attached test case shows occasional database corruption errors (at least on Windows machine):

161220 23:29:51 [01] Copying .\testdatabase\t2.ibd to F:\m7\mysql-test\var\tmp\backup\testdatabase\t2.ibd
InnoDB: Read page 0 from tablespace for space 37 name testdatabase/t2 key_id 0 encryption 0 handle 568.
[01] xtrabackup: Database page corruption detected at page 1, retrying...
...
161220 23:29:52 >> log scanned up to (1770676)
[01] xtrabackup: Error: failed to read page after 10 retries. File .\testdatabase\t2.ibd seems to be corrupted.

Try running with --repeat=100
I couldn't reproduce the same without encryption enabled.

The same test on Ubuntu 16.04 occasionally results in crash below instead:

161220 23:32:42 [ERROR] mysqld got signal 11 ;
...
stack_bottom = 0x0 thread_stack 0x48400
/m7/extra/xtrabackup/xtrabackup(my_print_stacktrace+0x29)[0x55be6b13bef9]
/m7/extra/xtrabackup/xtrabackup(handle_fatal_signal+0x2f5)[0x55be6ad25d65]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x113e0)[0x7ff8b64ee3e0]
/lib/x86_64-linux-gnu/libc.so.6(re_compile_fastmap+0x17)[0x7ff8b47c3f37]
/lib/x86_64-linux-gnu/libc.so.6(regcomp+0xd2)[0x7ff8b47c46b2]
mysys/stacktrace.c:268(my_print_stacktrace)[0x55be6ab0f7ae]
xtrabackup/xtrabackup.cc:3418(xb_register_regex(char const*))[0x55be6ab0e816]
xtrabackup/xtrabackup.cc:3508(xb_filters_init())[0x55be6ab0f224]
xtrabackup/xtrabackup.cc:3860(xtrabackup_backup_func())[0x55be6ab137be]
xtrabackup/xtrabackup.cc:7187(main)[0x55be6aafabe3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff8b46fd830]
/m7/extra/xtrabackup/xtrabackup(_start+0x29)[0x55be6ab0e149]



 Comments   
Comment by Andrii Nikitin (Inactive) [ 2016-12-21 ]

couldn't make it fail on centos 7

Comment by Andrii Nikitin (Inactive) [ 2016-12-23 ]

With current source state Ubuntu doesn't crash anymore. just shows behavior similar to Windows and sometimes just ends without any error message:

161223 14:15:45 [01] Copying ./test1/ancor.ibd to /m7/mysql-test/var/tmp/backup/test1/ancor.ibd
[01] xtrabackup: Database page corruption detected at page 1, retrying...
[01] xtrabackup: Database page corruption detected at page 1, retrying...
[01] xtrabackup: Database page corruption detected at page 1, retrying...
[01] xtrabackup: Database page corruption detected at page 1, retrying...
[01] xtrabackup: Database page corruption detected at page 1, retrying...
[01] xtrabackup: Database page corruption detected at page 1, retrying...
[01] xtrabackup: Database page corruption detected at page 1, retrying...
[01] xtrabackup: Database page corruption detected at page 1, retrying...
[01] xtrabackup: Database page corruption detected at page 1, retrying...
161223 14:15:45 >> log scanned up to (3297427)
[01] xtrabackup: Error: failed to read page after 10 retries. File ./test1/ancor.ibd seems to be corrupted.
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

this is how it just ends without any message (I've checked memory usage was just 130M and no trace of problem in system logs)

161223 14:21:29 [01] Copying ./database1/t1.ibd to /m7/mysql-test/var/tmp/backup/database1/t1.ibd
161223 14:21:29 [01] ...done
161223 14:21:29 [01] Copying ./database1/thisisatable.ibd to /m7/mysql-test/var/tmp/backup/database1/thisisatable.ibd
161223 14:21:29 [01] ...done
161223 14:21:29 [01] Copying ./database1/ancor.ibd to /m7/mysql-test/var/tmp/backup/database1/ancor.ibd
161223 14:21:29 [01] ...done
161223 14:21:29 [01] Copying ./database1/t2.ibd to /m7/mysql-test/var/tmp/backup/database1/t2.ibd
161223 14:21:29 [01] ...done
1612

  • saving '/m7/mysql-test/var/log/xtrabackup.1/' to '/m7/mysql-test/var/log/xtrabackup.1/'

Without encryption it passes 100 runs, while with encryption it fails usually after 10-30 runs, sometimes up to 79

Comment by Vladislav Vaintroub [ 2017-01-04 ]

Jan, could you take a look? i
Sometimes, if encrypted file is created right before xtrabackup, checksum validation would fail permanently. Some sort of race condition. Andrii is your man for reproducing that, apparently it is not as easy

Comment by Jan Lindström (Inactive) [ 2017-01-09 ]

commit a142f5af18d35b65488365e0c219295d900b7f41
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Mon Jan 9 14:59:25 2017 +0200

MDEV-11617: Occasional "failed to read page after 10 retries" with partial encrypted backup

fil_space_verify_crypt_checksum() should not use buf_page_is_corrupt because
it will ignore the the actual checksum comparison anyway. Modified
fil_space_verify_crypt_checksum so that it will use buf_page_is_valid_method()
calls directly using post encryption checksum.

At retry we should also read page 0 again if crypt_data is not yet found.

Generated at Thu Feb 08 07:51:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.