Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20755

InnoDB: Database page corruption on disk or a failed file read of tablespace upon prepare of mariabackup incremental backup

Details

    Description

      Note: Can't be a duplicate of MDEV-19348 anymore, as it was closed in the previous release set.

      10.2 5e65c67c

      ...
      2019-10-05  3:25:49 140645837592384 [Note] InnoDB: Ignoring data file 'test/t1.ibd' with space ID 275, since the redo log references test/t1.ibd with space ID 274.
      2019-10-05  3:25:49 140645837592384 [Note] InnoDB: Ignoring data file './test/#sql-2a07_12.ibd' with space ID 275. Another data file called test/t1.ibd exists with the same space ID.
      2019-10-05  3:25:49 140645837592384 [Note] InnoDB: Ignoring data file './test/#sql-2a07_12.ibd' with space ID 275. Another data file called test/t1.ibd exists with the same space ID.
      2019-10-05  3:25:49 140645837592384 [Warning] InnoDB: Are you sure you are using the right ib_logfiles to start up the database? Log sequence number in the ib_logfiles is 2570499, less than the log sequence number in the first system tablespace file header, 2570528.
      2019-10-05  3:25:49 140645837592384 [Note] InnoDB: Starting final batch to recover 963 pages from redo log.
      2019-10-05  3:25:49 140645669078784 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace test/t1 page [page id: space=275, page number=3]. You may have to recover from a backup.
      2019-10-05  3:25:49 140645669078784 [Note] InnoDB: Uncompressed page, stored checksum in field1 3500439468, calculated checksums for field1: crc32 228036971, innodb 3302731224,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 3500439468, calculated checksums for field2: crc32 228036971, innodb 3703238392, none 3735928559,  page LSN 0 3445649, low 4 bytes of LSN at page end 3445649, page number (if stored to page already) 3, space id (if created with >= MySQL-4.1.1 and stored already) 275
      2019-10-05  3:25:49 140645669078784 [Note] InnoDB: Page may be an index page where index id is 8202515128101958035
      2019-10-05  3:25:49 140645669078784 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      2019-10-05  3:25:49 140645669078784 [ERROR] InnoDB: Failed to read file 'test/t1.ibd' at offset 3: Page read from tablespace is corrupted.
      2019-10-05  3:25:50 140645837592384 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
      2019-10-05  3:25:50 140645837592384 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2250] with error Data structure corruption
      [00] FATAL ERROR: 2019-10-05 03:25:50 mariabackup: innodb_init() returned 39 (Data structure corruption).
      

      The test case is non-deterministic, run with --repeat=N and other options listed below. It usually fails for me within 5 attempts, but it can vary on different machines and builds. The probability seems higher with larger number of tables, columns and data. Running in shm also seems to help.

      MTR command line

      perl ./mtr bug2.innodb_page_corrupt2 --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4  --mysqld=--file-key-management --mysqld=--file-key-management-filename=`pwd`/std_data/keys.txt --mysqld=--plugin-load-add=file_key_management --mem --repeat=5
      

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (a BINARY(8), b CHAR(8), c INT) ENGINE=InnoDB;
      INSERT INTO t1 VALUES ('a','foobar',8);
       
      --exec $XTRABACKUP --backup --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQLTEST_VARDIR/log/backup0.log 2>&1
       
      --connect (con1,localhost,root,,)
      --let $conid= `SELECT CONNECTION_ID() AS con14`
      --delimiter |
      --send
        BEGIN NOT ATOMIC
          LOOP OPTIMIZE TABLE t1;
          END LOOP;
        END |
      --delimiter ;
       
      --connection default
       
      --exec $XTRABACKUP --backup --target-dir=$MYSQLTEST_VARDIR/tmp/backup_1 --incremental-basedir=$MYSQLTEST_VARDIR/tmp/backup_0 --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQLTEST_VARDIR/log/backup1.log 2>&1
       
      --eval KILL $conid
       
      --exec $XTRABACKUP --prepare --apply-log-only --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 > $MYSQLTEST_VARDIR/log/prepare0.log 2>&1
      --error 0,1
      --exec $XTRABACKUP --prepare --apply-log-only --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 --incremental-dir=$MYSQLTEST_VARDIR/tmp/backup_1 > $MYSQLTEST_VARDIR/log/prepare1.log 2>&1
       
      if ($sys_errno)
      {
          --cat_file $MYSQLTEST_VARDIR/log/prepare1.log
          --die "Encountered error upon incremental backup"
      }
       
      # Cleanup
      DROP TABLE t1;
      --rmdir $MYSQLTEST_VARDIR/tmp/backup_0
      --rmdir $MYSQLTEST_VARDIR/tmp/backup_1
      

      Fails on 10.2-10.4, all of debug, ASAN, non-debug builds.
      10.1 fails with something irrelevant, I didn't pursue it.
      Couldn't reproduce on 10.5.

      Attachments

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin added a comment - - edited

            Briefly.

            The first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory.

            Detailed.

            I have repeated the bug locally with the test, provided by Elena, and done the following analysis.

            1) The content of "corrupted" page read by mariabackup during incremental --backup and --prepare is the same. But during --backup the page passes validation, and during --prepare it does not pass.

            2) The "corrupted" page is encrypted. In mariabackup if the page is encrypted, buf_page_is_corrupted() is invoked during backup only if --extended_validation option is set. But even if we set the option, buf_page_is_corrupted() returns false on --backup, and true on --prepare. The difference is that during --backup the page is passed to buf_page_is_corrupted() after decryption, but during --prepare the page is not decrypted before it is passed to the function.

            3) The page is not decrypted in --prepare because space->crypt_data is NULL. space->crypt_data is NULL because fil_space_read_crypt_data() can't find CRYPT_MAGIC in the first page. So it turned out that the first page of the tablespace after the second --prepare does not contain crypt data, but it does contain it after the first --prepare. What means the first page was rewritten during incremental --prepare.

            4) When innodb_log_optimize_ddl is enabled, backup_optimized_ddl_op() callback is invoked when MLOG_INDEX_LOAD is read on incremental --backup. The tablespace is pushed in the special container, and then is treated as a newly created tablespace in backup_fix_ddl(), and re-copied in *.new file. But as this is incremental backup, all copied data files are passed through wf_incremental filter, which creates *.delta file, and copies only the pages with LSN greater then incremental_lsn. So after backup_fix_ddl() is executed there will be *.new.delta file with the only page - "corrupted" page from (2), the file does not contain the first page of the tablespace.

            5) When incremental --prepare is executed(the second --prepare in Elena's test), the *.new.delta file is renamed to *.ibd.delta file, and applied as regular *.delta file. It turned out that the first page of the tablespace is rewritten when delta is applied with the following stack trace:

             #0  pfs_os_file_write_func (type=..., name=0x7ffd5d596cb0 "./test/t1.ibd", file=..., buf=0x557322804000, offset=0, n=16384, 
                 src_file=0x55731f873190 "./extra/mariabackup/xtrabackup.cc", src_line=4705)
                 at ./storage/innobase/include/os0file.ic:413
             #1  0x000055731ed20c98 in xb_space_create_file (path=0x7ffd5d596cb0 "./test/t1.ibd", space_id=255, flags=0, file=0x7ffd5d596380)
                 at ./extra/mariabackup/xtrabackup.cc:4704
             #2  0x000055731ed217bf in xb_delta_open_matching_space (dbname=0x7ffd5d5982d0 "test", name=0x7ffd5d5970b0 "t1.ibd", info=..., 
                 real_name=0x7ffd5d596cb0 "./test/t1.ibd", real_name_len=512, success=0x7ffd5d5969e6)
                 at ./extra/mariabackup/xtrabackup.cc:4897
             #3  0x000055731ed21c0e in xtrabackup_apply_delta (dirname=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/", 
                 dbname=0x7ffd5d5982d0 "test", filename=0x7ffd5d5992b0 "t1.ibd.delta")
                 at ./extra/mariabackup/xtrabackup.cc:4991
             #4  0x000055731ed22e8a in xb_process_datadir (path=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/", 
                 suffix=0x55731f878b24 ".delta", func=0x55731ed21811 <xtrabackup_apply_delta(char const*, char const*, char const*, void*)>)
                 at ./extra/mariabackup/xtrabackup.cc:5337
             #5  0x000055731ed22f84 in xtrabackup_apply_deltas () at ./extra/mariabackup/xtrabackup.cc:5374
             #6  0x000055731ed244d0 in xtrabackup_prepare_func (argv=0x55732279b6b8)
                 at ./extra/mariabackup/xtrabackup.cc:5597
            

            So the new tablespace is "created" instead of opening the existing tablespace because tablespace id from *.meta file(the tablespace id of *.new.delta file, created on (4)) can not be found in fil_system->spaces (fil_space_get_by_id() invoked from xb_delta_open_matching_space() returns NULL).

            fil_system->spaces is filled in xb_load_tablespaces() before applying delta's. It iterates data files in full backup directory and fill fil_system->spaces.

            So the first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory.

            How to fix?

            I see the following variants:

            1) I suppose that if tablespace id is changed during incremental backup, all non-zero pages of this tablespace contain new tablespace id, i.e. all non-zero pages were changed during incremental backup, so there is no need to get delta in backup_fix_ddl() for such data files, we could just copy the whole data file as *.new.

            2) Check if data file exists in full backup directory in xb_delta_open_matching_space(), then change it's id, and invoke fil_space_create() only for non-existing files.

            vlad.lesin Vladislav Lesin added a comment - - edited Briefly. The first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory. Detailed. I have repeated the bug locally with the test, provided by Elena, and done the following analysis. 1) The content of "corrupted" page read by mariabackup during incremental --backup and --prepare is the same. But during --backup the page passes validation, and during --prepare it does not pass. 2) The "corrupted" page is encrypted. In mariabackup if the page is encrypted, buf_page_is_corrupted() is invoked during backup only if --extended_validation option is set. But even if we set the option, buf_page_is_corrupted() returns false on --backup, and true on --prepare. The difference is that during --backup the page is passed to buf_page_is_corrupted() after decryption, but during --prepare the page is not decrypted before it is passed to the function. 3) The page is not decrypted in --prepare because space->crypt_data is NULL. space->crypt_data is NULL because fil_space_read_crypt_data() can't find CRYPT_MAGIC in the first page. So it turned out that the first page of the tablespace after the second --prepare does not contain crypt data, but it does contain it after the first --prepare. What means the first page was rewritten during incremental --prepare. 4) When innodb_log_optimize_ddl is enabled, backup_optimized_ddl_op() callback is invoked when MLOG_INDEX_LOAD is read on incremental --backup. The tablespace is pushed in the special container, and then is treated as a newly created tablespace in backup_fix_ddl(), and re-copied in *.new file. But as this is incremental backup, all copied data files are passed through wf_incremental filter, which creates *.delta file, and copies only the pages with LSN greater then incremental_lsn. So after backup_fix_ddl() is executed there will be *.new.delta file with the only page - "corrupted" page from (2), the file does not contain the first page of the tablespace. 5) When incremental --prepare is executed(the second --prepare in Elena's test), the *.new.delta file is renamed to *.ibd.delta file, and applied as regular *.delta file. It turned out that the first page of the tablespace is rewritten when delta is applied with the following stack trace: #0 pfs_os_file_write_func (type=..., name=0x7ffd5d596cb0 "./test/t1.ibd", file=..., buf=0x557322804000, offset=0, n=16384, src_file=0x55731f873190 "./extra/mariabackup/xtrabackup.cc", src_line=4705) at ./storage/innobase/include/os0file.ic:413 #1 0x000055731ed20c98 in xb_space_create_file (path=0x7ffd5d596cb0 "./test/t1.ibd", space_id=255, flags=0, file=0x7ffd5d596380) at ./extra/mariabackup/xtrabackup.cc:4704 #2 0x000055731ed217bf in xb_delta_open_matching_space (dbname=0x7ffd5d5982d0 "test", name=0x7ffd5d5970b0 "t1.ibd", info=..., real_name=0x7ffd5d596cb0 "./test/t1.ibd", real_name_len=512, success=0x7ffd5d5969e6) at ./extra/mariabackup/xtrabackup.cc:4897 #3 0x000055731ed21c0e in xtrabackup_apply_delta (dirname=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/", dbname=0x7ffd5d5982d0 "test", filename=0x7ffd5d5992b0 "t1.ibd.delta") at ./extra/mariabackup/xtrabackup.cc:4991 #4 0x000055731ed22e8a in xb_process_datadir (path=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/", suffix=0x55731f878b24 ".delta", func=0x55731ed21811 <xtrabackup_apply_delta(char const*, char const*, char const*, void*)>) at ./extra/mariabackup/xtrabackup.cc:5337 #5 0x000055731ed22f84 in xtrabackup_apply_deltas () at ./extra/mariabackup/xtrabackup.cc:5374 #6 0x000055731ed244d0 in xtrabackup_prepare_func (argv=0x55732279b6b8) at ./extra/mariabackup/xtrabackup.cc:5597 So the new tablespace is "created" instead of opening the existing tablespace because tablespace id from *.meta file(the tablespace id of *.new.delta file, created on (4)) can not be found in fil_system->spaces (fil_space_get_by_id() invoked from xb_delta_open_matching_space() returns NULL). fil_system->spaces is filled in xb_load_tablespaces() before applying delta's. It iterates data files in full backup directory and fill fil_system->spaces. So the first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory. How to fix? I see the following variants: 1) I suppose that if tablespace id is changed during incremental backup, all non-zero pages of this tablespace contain new tablespace id, i.e. all non-zero pages were changed during incremental backup, so there is no need to get delta in backup_fix_ddl() for such data files, we could just copy the whole data file as *.new. 2) Check if data file exists in full backup directory in xb_delta_open_matching_space(), then change it's id, and invoke fil_space_create() only for non-existing files.
            vlad.lesin Vladislav Lesin added a comment - - edited

            I failed to create stable test case for it. The test scenario I am trying to implement, is the following:

            1) Disable pages flushing.
            2) Create table.
            3) Make full backup.
            4) Insert some data.
            5) Star incremental backup.
            6) Execute "OPTIMIZE TABLE" just after the tablespace is copied during backup.
            7) Finish backup.
            8) Prepare full and incremental backup. The latter must fail.

            As page 0 is not flushed by the server, it's LSN is 0 just after the table is optimized, and that is why the page is not copied to .delta file during incremental backup, and that is why the tablespace is created during incremental prepare as there is no tablespace with the same id during prepare, and the tablespace is created without CRYPT_MAGIC mark on page 0. I supposed that the following page_is_corrupted() call must fail as it will check crc's for encrypted page which was not decrypted as fil_space_read_crypt_data() invoked on tablespace open does not read crypt data from page 0.

            But the test is unstable because the sequence of recv_recover_page() calls for different pages is not specified during recovery, and sometimes fil_parse_write_crypt_data() for page 0 is invoked before page_is_corrupted() for any other page, and the page is decrypted successfully before crc check as crypt data is set correctly for the tablespace in fil_parse_write_crypt_data().

            We could insert DBUG_EXECUTE_IF() in fil_parse_write_crypt_data() to avoid space->crypt_data initialization on MLOG_FILE_WRITE_CRYPT_DATA redo record execution to make the test stable.

            But I am not sure this is correct because I am not sure that changing any data on some page recovery, which could influence recovery process of the other pages, is correct. It's supposed that pages recovery can be done independently for different pages. So fil_parse_write_crypt_data() call from recv_recover_page() looks very suspicious.

            vlad.lesin Vladislav Lesin added a comment - - edited I failed to create stable test case for it. The test scenario I am trying to implement, is the following: 1) Disable pages flushing. 2) Create table. 3) Make full backup. 4) Insert some data. 5) Star incremental backup. 6) Execute "OPTIMIZE TABLE" just after the tablespace is copied during backup. 7) Finish backup. 8) Prepare full and incremental backup. The latter must fail. As page 0 is not flushed by the server, it's LSN is 0 just after the table is optimized, and that is why the page is not copied to .delta file during incremental backup, and that is why the tablespace is created during incremental prepare as there is no tablespace with the same id during prepare, and the tablespace is created without CRYPT_MAGIC mark on page 0. I supposed that the following page_is_corrupted() call must fail as it will check crc's for encrypted page which was not decrypted as fil_space_read_crypt_data() invoked on tablespace open does not read crypt data from page 0. But the test is unstable because the sequence of recv_recover_page() calls for different pages is not specified during recovery, and sometimes fil_parse_write_crypt_data() for page 0 is invoked before page_is_corrupted() for any other page, and the page is decrypted successfully before crc check as crypt data is set correctly for the tablespace in fil_parse_write_crypt_data(). We could insert DBUG_EXECUTE_IF() in fil_parse_write_crypt_data() to avoid space->crypt_data initialization on MLOG_FILE_WRITE_CRYPT_DATA redo record execution to make the test stable. But I am not sure this is correct because I am not sure that changing any data on some page recovery, which could influence recovery process of the other pages, is correct. It's supposed that pages recovery can be done independently for different pages. So fil_parse_write_crypt_data() call from recv_recover_page() looks very suspicious.
            vlad.lesin Vladislav Lesin added a comment - Pushed https://github.com/MariaDB/server/tree/bb-10.2-MDEV-20755-incr-copy-new-files for testing.

            There will be conflicts on merging it to 10.4, they are resolved in https://github.com/MariaDB/server/tree/10.4-MDEV-20755-incr-copy-new-files branch. The fix should also be merged in 10.5+ despite the issue does not affect 10.5, because it processes files, created by some DDL during backup, more effectively for incremental backup and prepare. As it is a performance fix only for 10.5, there is no need to merge the test case for it. Besides, the test case will always pass for 10.5+ as ddl log optimization is deprecated for those versions. 10.5 conflicts are resolved here: https://github.com/MariaDB/server/tree/10.5-MDEV-20755-incr-copy-new-files.

            vlad.lesin Vladislav Lesin added a comment - There will be conflicts on merging it to 10.4, they are resolved in https://github.com/MariaDB/server/tree/10.4-MDEV-20755-incr-copy-new-files branch. The fix should also be merged in 10.5+ despite the issue does not affect 10.5, because it processes files, created by some DDL during backup, more effectively for incremental backup and prepare. As it is a performance fix only for 10.5, there is no need to merge the test case for it. Besides, the test case will always pass for 10.5+ as ddl log optimization is deprecated for those versions. 10.5 conflicts are resolved here: https://github.com/MariaDB/server/tree/10.5-MDEV-20755-incr-copy-new-files .

            The fix was reviewed by wlad.

            vlad.lesin Vladislav Lesin added a comment - The fix was reviewed by wlad .

            People

              vlad.lesin Vladislav Lesin
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.