Details
Description
mleich provided two rr replay traces where mariadb-backup --backup would fail as follows:
10.6 ec7db2bdf849fc1a5bad906764920edda4121bd6 |
2024-04-22 12:33:25 0 [ERROR] InnoDB: Checksum mismatch in the first page of file .//undo001
|
2024-04-22 12:33:25 0 [ERROR] InnoDB: Unable to read first page of file .//undo001
|
[00] 2024-04-22 12:33:25 merror: xb_load_tablespaces() failed with error Data structure corruption.
|
It is obvious that the copy of the page that was read is a mix of two versions, because the least significant 32 bits of log sequence numbers at the start and the end of the page differ:
0x000055cee647bbe4 595 if (crc32 != ut_crc32(read_buf,
|
(rr) display/i $pc
|
1: x/i $pc
|
=> 0x55cee647bbe4 <_Z21buf_page_is_corruptedbPKhm+620>: cmp %eax,%r13d
|
(rr) i reg eax
|
eax 0x3dc71f83 1036459907
|
(rr) i reg r13d
|
r13d 0xece2f286 -320671098
|
(rr) p/x read_buf[16]@8
|
$1 = {0x0, 0x0, 0x0, 0x0, 0x1, 0x29, 0xb3, 0x19}
|
(rr) p/x read_buf[srv_page_size-8]@8
|
$2 = {0x1, 0x1, 0x3, 0x2c, 0xec, 0xe2, 0xf2, 0x86}
|
(rr) bt
|
#0 0x000055cee647bbe4 in buf_page_is_corrupted (check_lsn=check_lsn@entry=false, read_buf=read_buf@entry=0x5c352ffe0000 "", fsp_flags=fsp_flags@entry=23)
|
at /data/Server/10.6B/storage/innobase/buf/buf0buf.cc:595
|
#1 0x000055cee672e0a4 in srv_undo_tablespace_open (create=create@entry=false, name=<optimized out>, name@entry=0x7fff9fd92ea0 ".//undo001", i=i@entry=0)
|
at /data/Server/10.6B/storage/innobase/srv/srv0start.cc:537
|
#2 0x000055cee6730322 in srv_all_undo_tablespaces_open (create_new_db=create_new_db@entry=false, n_undo=16) at /data/Server/10.6B/storage/innobase/srv/srv0start.cc:654
|
#3 0x000055cee6730c7e in srv_undo_tablespaces_init (create_new_db=create_new_db@entry=false) at /data/Server/10.6B/storage/innobase/srv/srv0start.cc:739
|
#4 0x000055cee5cf6f7f in xb_load_tablespaces () at /data/Server/10.6B/extra/mariabackup/xtrabackup.cc:4081
|
#5 0x000055cee5d030a7 in xtrabackup_backup_func () at /data/Server/10.6B/extra/mariabackup/xtrabackup.cc:4861
|
#6 0x000055cee5d03df5 in main_low (argv=0x55cee86d7650) at /data/Server/10.6B/extra/mariabackup/xtrabackup.cc:7156
|
We do have retry logic for most other page reads; see the calls to buf_page_is_corrupted in fil_cur.cc. For the TRX_SYS page in xb_assign_undo_space_start() there is special handling of 5 reread attempts.
Having to re-read pages in case they were concurrently written by the server that is being backed up is needed by the current design. A better design would be to have the server responsible for making backups (MDEV-14992). But, we need to fix this bug in GA releases, especially given that MDEV-29986 made multiple undo tablespaces the default.
Attachments
Issue Links
- relates to
-
MDEV-14992 BACKUP: in-server backup
- Open