[MDEV-15270] mariabackup.data_directory, mariabackup.partial_exclude failed in buildbot with error on exec Created: 2018-02-11  Updated: 2022-07-26  Resolved: 2022-07-26

Status: Closed
Project: MariaDB Server
Component/s: Backup, Tests
Affects Version/s: 10.2
Fix Version/s: 10.2.13, 10.3.5, 10.4.0

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-29165 Merge new release of InnoDB 5.7.39 to... Closed

 Description   

10.2 cdb7a8fa6928f3fb103ed7f66486dc91 kvm-deb-xenial-x86 2018-01-10 23:54:03 4987 nm Normal run, no --ps-protocol

mariabackup.data_directory               w4 [ fail ]
        Test ended at 2018-01-10 17:37:01
 
CURRENT_TEST: mariabackup.data_directory
mysqltest: At line 9: exec of '/usr/bin/mariabackup --defaults-file=/dev/shm/var/4/my.cnf --backup --target-dir=/dev/shm/var/4/tmp/backup 2>&1' failed, error: 256, status: 1, errno: 11
Output from before failure:
180110 17:37:01 Connecting to MySQL server host: localhost, user: root, password: set, port: 16040, socket: /dev/shm/var/tmp/4/mysqld.1.sock
Using server version 10.2.13-MariaDB-10.2.13+maria~xenial-log
/usr/bin/mariabackup based on MariaDB server 10.2.13-MariaDB debian-linux-gnu (i686) 
mariabackup: uses posix_fadvise().
mariabackup: cd to /dev/shm/var/4/mysqld.1/data/
mariabackup: open files limit requested 1024, set to 1024
mariabackup: using the following InnoDB configuration:
mariabackup:   innodb_data_home_dir = .
mariabackup:   innodb_data_file_path = ibdata1:12M:autoextend
mariabackup:   innodb_log_group_home_dir = ./
2018-01-10 17:37:01 3068790720 [Note] InnoDB: Number of pools: 1
180110 17:37:01 >> log scanned up to (1656143)
2018-01-10 17:37:01 3068790720 [Note] InnoDB: Log sequence number at the start 1637250 and the end 1652116 do not match
2018-01-10 17:37:01 3068790720 [ERROR] InnoDB: Checksum mismatch in datafile: ./ibdata1, Space ID:0, Flags: 0. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2018-01-10 17:37:01 3068790720 [Note] InnoDB: Restoring page [page id: space=0, page number=0] of datafile './ibdata1' from the doublewrite buffer. Writing 16384 bytes into file './ibdata1'
2018-01-10 17:37:01 3068790720 [Warning] InnoDB: Retry attempts for writing partial data failed.
2018-01-10 17:37:01 3068790720 [ERROR] InnoDB: Write to file ./ibdata1failed at offset 0, 16384 bytes should have been written, only 0 were written. Operating system error number 9. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2018-01-10 17:37:01 3068790720 [ERROR] InnoDB: Error number 9 means 'Bad file descriptor'
2018-01-10 17:37:01 3068790720 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
mariabackup: Could not open or create data files.
mariabackup: If you tried to add new data files, and it failed here,
mariabackup: you should now edit innodb_data_file_path in my.cnf back
mariabackup: to what it was, and remove the new ibdata files InnoDB created
mariabackup: in this failed attempt. InnoDB only wrote those files full of
mariabackup: zeros, but did not yet use them in any way. But be careful: do not
mariabackup: remove old data files which contain your precious data!
mariabackup: error: xb_load_tablespaces() failed witherror code 39
mariabackup: Stopping log copying thread.
.
 
 
 
The result from queries just before the failure was:
CREATE TABLE t(a INT) ENGINE=InnoDB DATA DIRECTORY='table_data_dir';
INSERT INTO t VALUES(1);
# xtrabackup backup
 
 - saving '/dev/shm/var/4/log/mariabackup.data_directory/' to '/dev/shm/var/log/mariabackup.data_directory/'
 
Retrying test mariabackup.data_directory, attempt(2/3)...
 
worker[4] > Restart  - not started

Logs not available



 Comments   
Comment by Elena Stepanova [ 2018-02-11 ]

Looks similar, but with log (for now)
http://buildbot.askmonty.org/buildbot/builders/kvm-deb-artful-amd64/builds/420/steps/mtr/logs/stdio

mariabackup.partial_exclude              w1 [ fail ]
        Test ended at 2018-01-12 23:59:24
 
CURRENT_TEST: mariabackup.partial_exclude
mysqltest: At line 19: exec of '/usr/bin/mariabackup --defaults-file=/dev/shm/var/1/my.cnf  --backup "--tables-exclude=test.*2" "--databases-exclude=db2" --target-dir=/dev/shm/var/1/tmp/backup 2>&1' failed, error: 256, status: 1, errno: 95
Output from before failure:
180112 23:59:23 Connecting to MySQL server host: localhost, user: root, password: set, port: 16000, socket: /dev/shm/var/tmp/1/mysqld.1.sock
Using server version 10.2.13-MariaDB-10.2.13+maria~artful-log
/usr/bin/mariabackup based on MariaDB server 10.2.13-MariaDB debian-linux-gnu (x86_64) 
mariabackup: uses posix_fadvise().
mariabackup: cd to /dev/shm/var/1/mysqld.1/data/
mariabackup: open files limit requested 1024, set to 1024
mariabackup: using the following InnoDB configuration:
mariabackup:   innodb_data_home_dir = .
mariabackup:   innodb_data_file_path = ibdata1:12M:autoextend
mariabackup:   innodb_log_group_home_dir = ./
2018-01-12 23:59:23 140110908889280 [Note] InnoDB: Number of pools: 1
180112 23:59:23 >> log scanned up to (1685180)
2018-01-12 23:59:23 140110908889280 [Note] InnoDB: Log sequence number at the start 1668276 and the end 1682680 do not match
2018-01-12 23:59:23 140110908889280 [ERROR] InnoDB: Checksum mismatch in datafile: ./ibdata1, Space ID:0, Flags: 0. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2018-01-12 23:59:23 140110908889280 [Note] InnoDB: Restoring page [page id: space=0, page number=0] of datafile './ibdata1' from the doublewrite buffer. Writing 16384 bytes into file './ibdata1'
2018-01-12 23:59:23 140110908889280 [Warning] InnoDB: Retry attempts for writing partial data failed.
2018-01-12 23:59:23 140110908889280 [ERROR] InnoDB: Write to file ./ibdata1failed at offset 0, 16384 bytes should have been written, only 0 were written. Operating system error number 9. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2018-01-12 23:59:23 140110908889280 [ERROR] InnoDB: Error number 9 means 'Bad file descriptor'
2018-01-12 23:59:23 140110908889280 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
mariabackup: Could not open or create data files.
mariabackup: If you tried to add new data files, and it failed here,
mariabackup: you should now edit innodb_data_file_path in my.cnf back
mariabackup: to what it was, and remove the new ibdata files InnoDB created
mariabackup: in this failed attempt. InnoDB only wrote those files full of
mariabackup: zeros, but did not yet use them in any way. But be careful: do not
mariabackup: remove old data files which contain your precious data!
mariabackup: error: xb_load_tablespaces() failed witherror code 39
mariabackup: Stopping log copying thread.
.
 
 
 
The result from queries just before the failure was:
CREATE TABLE t1(i INT) ENGINE INNODB;
INSERT INTO t1 VALUES(1);
CREATE TABLE t2(i int) ENGINE INNODB;
CREATE DATABASE db2;
USE db2;
CREATE TABLE t1(i INT) ENGINE INNODB;
USE test;
# xtrabackup backup
 
 - saving '/dev/shm/var/1/log/mariabackup.partial_exclude/' to '/dev/shm/var/log/mariabackup.partial_exclude/'
 
Retrying test mariabackup.partial_exclude, attempt(2/3)...
 
worker[1] > Restart  - not started
main.partition_binlog                    w2 [ pass ]    106
worker[2] > Restart [mysqld.1 - pid: 32442, winpid: 32442] - running with different options '--loose-plugin-load-add=ha_mroonga.so --loose-plugin-mroonga=ON --log-bin=master-bin' != '--log-bin=master-bin --loose-enable-partition'
mroonga/storage.binlog_TODO_SPLIT_ME     w2 [ pass ]     71
worker[2] > Restart [mysqld.1 - pid: 32480, winpid: 32480] - running with different options '--log-bin=master-bin --tmp_table_size=1024' != '--loose-plugin-load-add=ha_mroonga.so --loose-plugin-mroonga=ON --log-bin=master-bin'
mariabackup.partial_exclude              w1 [ retry-pass ]   1102
 
Retrying test mariabackup.partial_exclude, attempt(3/3)...
 
mariabackup.partial_exclude              w1 [ retry-pass ]   1078

Comment by Vladislav Vaintroub [ 2018-02-11 ]

marko, can you check where this message comes from? I recall doublewrite was disabled in mariabackup in 10.1. There should not be "Restoring from doublewrite buffer", neither backup not prepare should ever use it.

Comment by Vladislav Vaintroub [ 2018-02-11 ]

one more
https://buildbot.askmonty.org/buildbot/builders/kvm-bintar-quantal-x86/builds/10684/steps/test/logs/stdio

CURRENT_TEST: mariabackup.mdev-14447
180209 01:15:52 Connecting to MySQL server host: localhost, user: root, password: set, port: 16040, socket: /usr/local/mariadb-10.2.13-linux-i686/mysql-test/var/tmp/3/mysqld.1.sock
Using server version 10.2.13-MariaDB-log
/usr/local/mariadb-10.2.13-linux-i686/bin/mariabackup based on MariaDB server 10.2.13-MariaDB Linux (i686) 
incremental backup from 1633659 is enabled.
mariabackup: uses posix_fadvise().
mariabackup: cd to /usr/local/mariadb-10.2.13-linux-i686/mysql-test/var/3/mysqld.1/data/
mariabackup: open files limit requested 1024, set to 1024
mariabackup: using the following InnoDB configuration:
mariabackup:   innodb_data_home_dir = .
mariabackup:   innodb_data_file_path = ibdata_first:3M;ibdata_second:1M:autoextend
mariabackup:   innodb_log_group_home_dir = ./
2018-02-09  1:15:53 3056838400 [Note] InnoDB: Number of pools: 1
180209 01:15:53 >> log scanned up to (42087927)
2018-02-09  1:15:53 3056838400 [Note] InnoDB: Log sequence number at the start 37713758 and the end 42087927 do not match
2018-02-09  1:15:53 3056838400 [ERROR] InnoDB: Checksum mismatch in datafile: ./ibdata_first, Space ID:0, Flags: 0. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2018-02-09  1:15:53 3056838400 [Note] InnoDB: Restoring page [page id: space=0, page number=0] of datafile './ibdata_first' from the doublewrite buffer. Writing 16384 bytes into file './ibdata_first'
2018-02-09  1:15:53 3056838400 [Warning] InnoDB: Retry attempts for writing partial data failed.
2018-02-09  1:15:53 3056838400 [ERROR] InnoDB: Write to file ./ibdata_firstfailed at offset 0, 16384 bytes should have been written, only 0 were written. Operating system error number 9. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2018-02-09  1:15:53 3056838400 [ERROR] InnoDB: Error number 9 means 'Bad file descriptor'
2018-02-09  1:15:53 3056838400 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
mariabackup: Could not open or create data files.
mariabackup: If you tried to add new data files, and it failed here,
mariabackup: you should now edit innodb_data_file_path in my.cnf back
mariabackup: to what it was, and remove the new ibdata files InnoDB created
mariabackup: in this failed attempt. InnoDB only wrote those files full of
mariabackup: zeros, but did not yet use them in any way. But be careful: do not
mariabackup: remove old data files which contain your precious data!
mariabackup: error: xb_load_tablespaces() failed witherror code 39
mariabackup: Stopping log copying thread.

Comment by Marko Mäkelä [ 2018-02-12 ]

I pushed a simple clean-up patch to prevent any potential disaster (writing to a source file in --backup). It probably does not prevent any disaster, because we would have opened the source files in read-only mode, and the writes were blocked by that.

thiru, please try to reproduce this problem. It probably is most feasible with DBUG_EXECUTE_IF("backup_read_page_fail", …) and something like

mariabackup --backup --dbug=d,xxx

Implement appropriate retry logic when a data file page appears corrupted. Test this with the system tablespace, the undo tablespace files, and with .ibd files, both with the first page and with some subsequent page (say, page 1).

At least in the function xb_load_single_table_tablespace() we should try rereading, and we should not call exit(EXIT_FAILURE) but instead return any errors to the caller. Also, we could try to tolerate missing files, because the user could be executing DROP TABLE or RENAME TABLE in parallel with the backup.

Once you have worked on the 10.2 fix, you should be able to tell if this fix is needed in Mariabackup 10.1.

Comment by Marko Mäkelä [ 2018-02-14 ]

I pushed a cleanup that removes Mariabackup use of the doublewrite buffer to MariaDB 10.2.13. With that fix merged, a test failed in bb-10.2-ext:

CURRENT_TEST: mariabackup.apply-log-only-incr
mysqltest: At line 21: exec of '/usr/bin/mariabackup --defaults-file=/dev/shm/var/2/my.cnf  --backup --target-dir=/dev/shm/var/2/tmp/backup 2>&1' failed, error: 256, status: 1, errno: 11
Output from before failure:
180213 19:29:11 Connecting to MySQL server host: localhost, user: root, password: set, port: 16040, socket: /dev/shm/var/tmp/2/mysqld.1.sock
Using server version 10.2.13-MariaDB-10.2.13+maria~xenial-log
/usr/bin/mariabackup based on MariaDB server 10.2.13-MariaDB debian-linux-gnu (i686) 
mariabackup: uses posix_fadvise().
mariabackup: cd to /dev/shm/var/2/mysqld.1/data/
mariabackup: open files limit requested 1024, set to 1024
mariabackup: using the following InnoDB configuration:
mariabackup:   innodb_data_home_dir = .
mariabackup:   innodb_data_file_path = ibdata1:12M:autoextend
mariabackup:   innodb_log_group_home_dir = ./
2018-02-13 19:29:11 3068540864 [Note] InnoDB: Number of pools: 1
180213 19:29:11 >> log scanned up to (1647154)
2018-02-13 19:29:11 3068540864 [Note] InnoDB: Log sequence number at the start 1638095 and the end 1642519 do not match
2018-02-13 19:29:11 3068540864 [ERROR] InnoDB: Checksum mismatch in datafile: ./ibdata1, Space ID:0, Flags: 0. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.

Just like I anticipated in my previous comment, Mariabackup fails to reread the page.
Apparently the last two writes of the first page of ibdata1 took place at LSN 1638095 and 1642519, and the Mariabackup read operation of the page was interleaved with the page write operation by the server, causing the FIL_PAGE_LSN in the page header to differ from the LSN stored in the page trailer.

Comment by Marko Mäkelä [ 2022-07-26 ]

According to the cross-reference, the last regular failures of the mentioned tests in main branches occurred in 2018. The bogus use of the doublewrite buffer was removed in MariaDB 10.2.13.

The probability of intermittent failures should have been drastically reduced by MDEV-12699 (in 2019) and some related changes.

Generated at Thu Feb 08 08:19:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.