Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18194

Incremental prepare tries to access page which is out of tablespace bounds.

Details

    Description

      10.4 30da40bb8c3

      2019-01-09 22:15:09 0 [Note] InnoDB: Starting final batch to recover 2011 pages from redo log.
      2019-01-09 22:15:09 0 [ERROR] [FATAL] InnoDB: Trying to read page number 96 in space 1239, space name test/t8#P#ver_pn, which is outside the tablespace bounds. Byte offset 0, len 16384
      190109 22:15:09 [ERROR] mysqld got signal 6 ;
       
      #4  0x00007fdd4b54d3fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #5  0x000055da0a61be66 in ib::fatal::~fatal (this=0x7fff327b27d0, __in_chrg=<optimized out>) at /data/src/10.4/storage/innobase/ut/ut0ut.cc:765
      #6  0x000055da0a39c232 in fil_report_invalid_page_access (block_offset=96, space_id=1239, space_name=0x55da0d79b120 "test/t8#P#ver_pn", byte_offset=0, len=16384, is_read=true) at /data/src/10.4/storage/innobase/fil/fil0fil.cc:4141
      #7  0x000055da0a39ca5d in fil_io (type=..., sync=false, page_id=..., page_size=..., byte_offset=0, len=16384, buf=0x7fdd4695c000, message=0x7fdd4467aa98, ignore_missing_space=false) at /data/src/10.4/storage/innobase/fil/fil0fil.cc:4287
      #8  0x000055da0a34fbb5 in buf_read_page_low (err=0x7fff327b2f28, sync=false, type=64, mode=132, page_id=..., page_size=..., unzip=true, ignore_missing_space=false) at /data/src/10.4/storage/innobase/buf/buf0rea.cc:181
      #9  0x000055da0a351871 in buf_read_recv_pages (sync=false, space_id=1239, page_nos=0x7fff327b3110, n_stored=32) at /data/src/10.4/storage/innobase/buf/buf0rea.cc:924
      #10 0x000055da0a4adc6b in recv_read_in_area (page_id=...) at /data/src/10.4/storage/innobase/log/log0recv.cc:2001
      #11 0x000055da0a4ae2d4 in recv_apply_hashed_log_recs (last_batch=true) at /data/src/10.4/storage/innobase/log/log0recv.cc:2106
      #12 0x000055da0a5c4393 in srv_start (create_new_db=false) at /data/src/10.4/storage/innobase/srv/srv0start.cc:1861
      #13 0x000055da09b62845 in innodb_init () at /data/src/10.4/extra/mariabackup/xtrabackup.cc:1988
      #14 0x000055da09b6c9d8 in xtrabackup_prepare_func (argv=0x55da0cfd6a88) at /data/src/10.4/extra/mariabackup/xtrabackup.cc:5594
      #15 0x000055da09b6ebda in main_low (argv=0x55da0cfd6a88) at /data/src/10.4/extra/mariabackup/xtrabackup.cc:6418
      #16 0x000055da09b6e3ee in main (argc=7, argv=0x7fff327ba398) at /data/src/10.4/extra/mariabackup/xtrabackup.cc:6211
      

      The log is attached as mbackup_prepare_1.log

      Backup dirs before any attempts to prepare are stored here:
      ftp://ftp.askmonty.org/public/mdev18194/

      backup_0 is the initial full backup.
      backup_1 is the incremental backup.

      To reproduce, unpack the backups and run

      bin/mariabackup --prepare --apply-log-only --innodb-file-io-threads=1 --target-dir=backup_0 --user=root
      bin/mariabackup --prepare --apply-log-only --innodb-file-io-threads=1 --target-dir=backup_0 --incremental-dir=backup_1 --user=root
      


      Notes on the origin of the report. Please be aware it's not easily reproducible the same way, the test is concurrent by nature, and apparently there is a race condition involved.

      The problem initially happened by running the concurrent test with MariaBackupIncremental scenario:

      github.com/MariaDB/randgen elenst-dev 0ebbe452a69, github.com/elenst/mariadb-toolbox master 6ded4b71568

      perl ./runall-new.pl --basedir=/home/travis/server/10.4 --vardir=/home/travis/logs/vardir2 --duration=350 --threads=4 --seed=1547032787 --reporters=Backtrace,ErrorLog,Deadlock --skip-gendata --gendata-advanced --views --grammar=conf/mariadb/generic-dml.yy --redefine=conf/mariadb/bulk_insert.yy --mysqld=--log_output=FILE --mysqld=--max-statement-time=20 --mysqld=--lock-wait-timeout=10 --mysqld=--loose-innodb-lock-wait-timeout=5 --mysqld=--loose-debug_assert_on_not_freed_memory=0 --filter=/home/travis/mariadb-toolbox/travis/10.4-combo-filter.ff --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--aria-encrypt-tables=1 --mysqld=--encrypt-tmp-disk-tables=1 --mysqld=--file-key-management --mysqld=--file-key-management-filename=/home/travis/mariadb-toolbox/data/keys.txt --mysqld=--plugin-load-add=file_key_management --engine=Aria --scenario=MariaBackupIncremental --redefine=conf/mariadb/alter_table.yy --redefine=conf/mariadb/modules/admin.yy --redefine=conf/mariadb/versioning.yy --redefine=conf/mariadb/sequences.yy
      

      The test performs the following steps:

      • start the server with

        "--no-defaults" "--basedir=/home/travis/server/10.4" "--lc-messages-dir=/home/travis/server/10.4/share" "--character-sets-dir=/home/travis/server/10.4/share/charsets" "--tmpdir=/home/travis/logs/vardir2/tmp" "--core-file" "--datadir=/home/travis/logs/vardir2/data" "--max-allowed-packet=128Mb" "--port=19300" "--socket=/home/travis/logs/vardir2/mysql.sock" "--pid-file=/home/travis/logs/vardir2/mysql.pid" "--general-log" "--general-log-file=/home/travis/logs/vardir2/mysql.log" --log_output=FILE --max-statement-time=20 --lock-wait-timeout=10 --loose-innodb-lock-wait-timeout=5 --loose-debug_assert_on_not_freed_memory=0 --innodb-encrypt-tables --innodb-encrypt-log --innodb-encryption-threads=4 --aria-encrypt-tables=1 --encrypt-tmp-disk-tables=1 --file-key-management --file-key-management-filename=/home/travis/mariadb-toolbox/data/keys.txt --plugin-load-add=file_key_management
        

      • run concurrent DML/DDL flow on the server;
      • after ~1 min, take the full backup from the running server:

        bin/mariabackup --backup --target-dir=/home/travis/logs/vardir2/backup_0 --protocol=tcp --port=19300 --user=root
        

        (succeeded, the log attached as mbackup_backup_0.log;

      • after another ~1 minute, run the incremental backup as

        bin/mariabackup --backup --target-dir=/home/travis/logs/vardir2/backup_1 --incremental-basedir=/home/travis/logs/vardir2/backup_0 --protocol=tcp --port=19300 --user=root
        

        (succeeded, the log attached as mbackup_backup_1.log;

      • after another ~1 minute, run the incremental backup again as

        bin/mariabackup --backup --target-dir=/home/travis/logs/vardir2/backup_2 --incremental-basedir=/home/travis/logs/vardir2/backup_1 --protocol=tcp --port=19300 --user=root
        

        (succeeded, the log attached as mbackup_backup_2.log;

      • let the test flow continue for a little while
      • shut down the server gracefully (succeeded, the error log is attached as mysql.err_orig;
      • run mariabackup --prepare on the full backup as

        bin/mariabackup --prepare --apply-log-only --innodb-file-io-threads=1 --target-dir=/home/travis/logs/vardir2/backup_0 --user=root
        

        (succeeded, the log attached as mbackup_prepare_0.log;

      • run mariabackup --prepare on the first incremental backup as

        bin/mariabackup --prepare --apply-log-only --innodb-file-io-threads=1 --target-dir=/home/travis/logs/vardir2/backup_0 --incremental-dir=/home/travis/logs/vardir2/backup_1 --user=root
        

        (failed, the log is attached as mbackup_prepare_1.log.


      I have no information yet whether the problem happens on 10.1.

      Attachments

        1. mbackup_backup_0.log
          146 kB
        2. mbackup_backup_1.log
          149 kB
        3. mbackup_backup_2.log
          50 kB
        4. mbackup_prepare_0.log
          65 kB
        5. mbackup_prepare_1.log
          53 kB
        6. mysql.err_orig
          270 kB

        Issue Links

          Activity

            I think that this should likely affect mariabackup 10.1 already. The code should be equivalent between 10.2, 10.3, and 10.4.

            marko Marko Mäkelä added a comment - I think that this should likely affect mariabackup 10.1 already. The code should be equivalent between 10.2, 10.3, and 10.4.

            MLOG_FILE_RENAME2 doesn't replay in mariabackup during prepare phase. Because file path stored in mariabackup as "test/t1.ibd".
            But redo log records stores the file path as "./test/t1.ibd". So fil_op_replay_rename() does string matching and fails to
            do the renaming of the file.

            thiru Thirunarayanan Balathandayuthapani added a comment - MLOG_FILE_RENAME2 doesn't replay in mariabackup during prepare phase. Because file path stored in mariabackup as "test/t1.ibd". But redo log records stores the file path as "./test/t1.ibd". So fil_op_replay_rename() does string matching and fails to do the renaming of the file.

            After preparing full backup phase:
            Problematic space id is t8#P#ver_pn. Space id is 903. It has only 4 pages.

            While preparing incremental backup phase:
            t8#P#ver_pn.ibd has only .delta file. It is empty.
            Based on .meta file, mariabackup prepare creates new ibd file with new space id(1239) and name it as t8#P#ver_pn.ibd
            Redo log now tries to access 96th page. So it crashes.

            Basically backup phase went wrong during incremental backup i think.

            thiru Thirunarayanan Balathandayuthapani added a comment - After preparing full backup phase: Problematic space id is t8#P#ver_pn. Space id is 903. It has only 4 pages. While preparing incremental backup phase: t8#P#ver_pn.ibd has only .delta file. It is empty. Based on .meta file, mariabackup prepare creates new ibd file with new space id(1239) and name it as t8#P#ver_pn.ibd Redo log now tries to access 96th page. So it crashes. Basically backup phase went wrong during incremental backup i think.

            The following test case repeats the issue:

            --source include/have_debug.inc
            --source include/have_innodb.inc
            --source include/have_partition.inc
             
            call mtr.add_suppression("InnoDB: New log files created");
             
            let $basedir=$MYSQLTEST_VARDIR/tmp/backup;
            let $incremental_dir=$MYSQLTEST_VARDIR/tmp/backup_inc1;
            let $backuplog=$MYSQLTEST_VARDIR/tmp/backup.log;
             
            CREATE TABLE t1(i INT) ENGINE INNODB;
            alter table t1 partition by hash(i) partitions 4;
            CREATE TABLE t2(i INT PRIMARY KEY) ENGINE INNODB;
            CREATE TABLE t3(i INT) ENGINE INNODB;
             
            INSERT INTO t1 values(1), (2), (3), (4), (5), (6), (7), (8), (9);
            insert into t1 select * from t1;
            insert into t1 select * from t1;
             
            echo # Create full backup , modify table, then create incremental/differential backup;
            --disable_result_log
            exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --target-dir=$basedir;
            --enable_result_log
            drop table t1;
            create table t4(f1 int not null, f2 int not null)engine=innodb;
            insert into t4 values(1, 2), (2, 2), (3, 3), (5, 5), (6, 6), (4, 4), (9, 9);
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            alter table t4 partition by hash(f1) partitions 4;
            rename table t4 to t1;
             
            --echo # XTRABACKUP INCREMENTAL
            exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --target-dir=$incremental_dir --incremental-basedir=$basedir;
             
            --echo # XTRABACKUP PREPARE
            exec $XTRABACKUP --apply-log-only --prepare --target-dir=$basedir;
             
            --echo # XTRABACKUP INCREMENTAL PREPARE
            exec $XTRABACKUP --prepare --target-dir=$basedir --incremental-dir=$incremental_dir;
             
            let $targetdir=$basedir;
            -- source include/restart_and_restore.inc
            --enable_result_log
             
            select count(*) from t1;
            drop table t2, t1, t3;
            

            thiru Thirunarayanan Balathandayuthapani added a comment - The following test case repeats the issue: --source include/have_debug.inc --source include/have_innodb.inc --source include/have_partition.inc   call mtr.add_suppression("InnoDB: New log files created");   let $basedir=$MYSQLTEST_VARDIR/tmp/backup; let $incremental_dir=$MYSQLTEST_VARDIR/tmp/backup_inc1; let $backuplog=$MYSQLTEST_VARDIR/tmp/backup.log;   CREATE TABLE t1(i INT) ENGINE INNODB; alter table t1 partition by hash(i) partitions 4; CREATE TABLE t2(i INT PRIMARY KEY) ENGINE INNODB; CREATE TABLE t3(i INT) ENGINE INNODB;   INSERT INTO t1 values(1), (2), (3), (4), (5), (6), (7), (8), (9); insert into t1 select * from t1; insert into t1 select * from t1;   echo # Create full backup , modify table, then create incremental/differential backup; --disable_result_log exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$basedir; --enable_result_log drop table t1; create table t4(f1 int not null, f2 int not null)engine=innodb; insert into t4 values(1, 2), (2, 2), (3, 3), (5, 5), (6, 6), (4, 4), (9, 9); insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; alter table t4 partition by hash(f1) partitions 4; rename table t4 to t1;   --echo # XTRABACKUP INCREMENTAL exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$incremental_dir --incremental-basedir=$basedir;   --echo # XTRABACKUP PREPARE exec $XTRABACKUP --apply-log-only --prepare --target-dir=$basedir;   --echo # XTRABACKUP INCREMENTAL PREPARE exec $XTRABACKUP --prepare --target-dir=$basedir --incremental-dir=$incremental_dir;   let $targetdir=$basedir; -- source include/restart_and_restore.inc --enable_result_log   select count(*) from t1; drop table t2, t1, t3;

            --source include/have_debug.inc
             
            call mtr.add_suppression("InnoDB: New log files created");
             
            let $basedir=$MYSQLTEST_VARDIR/tmp/backup;
            let $incremental_dir=$MYSQLTEST_VARDIR/tmp/backup_inc1;
            let $backuplog=$MYSQLTEST_VARDIR/tmp/backup.log;
             
            CREATE TABLE t1(i INT) ENGINE INNODB;
            CREATE TABLE t2(i INT PRIMARY KEY) ENGINE INNODB;
            CREATE TABLE t3(i INT) ENGINE INNODB;
             
            --echo # FULL XTRABACKUP
             
            --disable_result_log
            exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --target-dir=$basedir;
            --enable_result_log
             
            create table t4(f1 int not null, f2 int not null)engine=innodb;
            insert into t4 values(1, 2), (2, 2), (3, 3), (5, 5), (6, 6), (4, 4), (9, 9);
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            insert into t4 select * from t4;
            rename table t4 to t7;
            select count(*) from t7;
             
            --echo # XTRABACKUP INCREMENTAL
            exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --target-dir=$incremental_dir --incremental-basedir=$basedir
             
            --echo # XTRABACKUP PREPARE
            exec $XTRABACKUP --apply-log-only --prepare --target-dir=$basedir;
             
            --echo # XTRABACKUP INCREMENTAL PREPARE
            --error 0, 1, 134
            exec $XTRABACKUP --prepare --target-dir=$basedir --incremental-dir=$incremental_dir --dbug="+d,ib_log" 2> $backuplog;
             
            let $targetdir=$basedir;
            -- source include/restart_and_restore.inc
            --enable_result_log
             
            select count(*) from t7;
            drop table t1, t2, t7, t3;
            

            The above is the another test case to repeat the failure. There is no need for partition or so.

            INCREMENTAL BACKUP:
            ===================
            xtrabackup tries to copy delta page of t7 tablespace when all pages are in the buffer pool. So lsn is zero in the file.
            It stores nothing in .delta file. But t7 tablespace has 12 pages in buffer pool.

            At the end, incremental backup contains t7.delta which has nothing and t7.meta which has page_size, space id.

            INCREMENTAL PREPARE PHASE:
            ===========================

            Mariabackup tries to apply delta for all tablespace. When it encounters t7 tablespace. Target directory doesn't have t7.
            So it creates new tablespace as t7 with space id stored in .meta file. It sets the file size to 4. (Default file size).

            redo log for MLOG_FILE_NAME will fail to load the tablespace to memory for t4 tablespace. Because fil_ibd_discovery doesn't find any t4 tablespace.

            redo log for FSP_SIZE will try to search the tablespace for the respective id. Since tablespace is not loaded yet for MLOG_FILE_NAME. It will fail to set.

            In applying redo log phase, redo log for the tablespace t7 and its respective pages. (12 pages in our case). So it tries to fetch the pages at all once to
            apply the redo log. Since mariabackup prepare creates only 4 pages. It leads to assert while accessing the 5th page.

            Fix:
            ====
            .meta file should contain file size info. Instead of setting the default file size, mariabackup prepare should create respective file size present in meta file.

            thiru Thirunarayanan Balathandayuthapani added a comment - --source include/have_debug.inc   call mtr.add_suppression("InnoDB: New log files created");   let $basedir=$MYSQLTEST_VARDIR/tmp/backup; let $incremental_dir=$MYSQLTEST_VARDIR/tmp/backup_inc1; let $backuplog=$MYSQLTEST_VARDIR/tmp/backup.log;   CREATE TABLE t1(i INT) ENGINE INNODB; CREATE TABLE t2(i INT PRIMARY KEY) ENGINE INNODB; CREATE TABLE t3(i INT) ENGINE INNODB;   --echo # FULL XTRABACKUP   --disable_result_log exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$basedir; --enable_result_log   create table t4(f1 int not null, f2 int not null)engine=innodb; insert into t4 values(1, 2), (2, 2), (3, 3), (5, 5), (6, 6), (4, 4), (9, 9); insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; insert into t4 select * from t4; rename table t4 to t7; select count(*) from t7;   --echo # XTRABACKUP INCREMENTAL exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$incremental_dir --incremental-basedir=$basedir   --echo # XTRABACKUP PREPARE exec $XTRABACKUP --apply-log-only --prepare --target-dir=$basedir;   --echo # XTRABACKUP INCREMENTAL PREPARE --error 0, 1, 134 exec $XTRABACKUP --prepare --target-dir=$basedir --incremental-dir=$incremental_dir --dbug="+d,ib_log" 2> $backuplog;   let $targetdir=$basedir; -- source include/restart_and_restore.inc --enable_result_log   select count(*) from t7; drop table t1, t2, t7, t3; The above is the another test case to repeat the failure. There is no need for partition or so. INCREMENTAL BACKUP: =================== xtrabackup tries to copy delta page of t7 tablespace when all pages are in the buffer pool. So lsn is zero in the file. It stores nothing in .delta file. But t7 tablespace has 12 pages in buffer pool. At the end, incremental backup contains t7.delta which has nothing and t7.meta which has page_size, space id. INCREMENTAL PREPARE PHASE: =========================== Mariabackup tries to apply delta for all tablespace. When it encounters t7 tablespace. Target directory doesn't have t7. So it creates new tablespace as t7 with space id stored in .meta file. It sets the file size to 4. (Default file size). redo log for MLOG_FILE_NAME will fail to load the tablespace to memory for t4 tablespace. Because fil_ibd_discovery doesn't find any t4 tablespace. redo log for FSP_SIZE will try to search the tablespace for the respective id. Since tablespace is not loaded yet for MLOG_FILE_NAME. It will fail to set. In applying redo log phase, redo log for the tablespace t7 and its respective pages. (12 pages in our case). So it tries to fetch the pages at all once to apply the redo log. Since mariabackup prepare creates only 4 pages. It leads to assert while accessing the 5th page. Fix: ==== .meta file should contain file size info. Instead of setting the default file size, mariabackup prepare should create respective file size present in meta file.
            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.2- MDEV-18194 https://github.com/MariaDB/server/commit/376f3a2ea6347397b6be9378042b8fd871eb8ea7

            The fix looks OK to me. With the additional patch that effectively disables MDEV-12103, it passed the additional testing by elenst.

            Alas, I believe that we must first fix the corner case of crash recovery that was broken in MDEV-12103. Without fixing that, this fix could make matters worse.

            The issue that MDEV-12103 caused is: If multiple redo log checkpoints ‘overlap’ each other, then crash recovery would fail to start parsing the log records near the first available MLOG_CHECKPOINT record, and start from the last record instead. Due to this, some recently created, renamed or deleted tablespaces could be unavailable during redo log processing in crash recovery.

            marko Marko Mäkelä added a comment - The fix looks OK to me. With the additional patch that effectively disables MDEV-12103 , it passed the additional testing by elenst . Alas, I believe that we must first fix the corner case of crash recovery that was broken in MDEV-12103 . Without fixing that, this fix could make matters worse. The issue that MDEV-12103 caused is: If multiple redo log checkpoints ‘overlap’ each other, then crash recovery would fail to start parsing the log records near the first available MLOG_CHECKPOINT record, and start from the last record instead. Due to this, some recently created, renamed or deleted tablespaces could be unavailable during redo log processing in crash recovery.

            People

              thiru Thirunarayanan Balathandayuthapani
              elenst Elena Stepanova
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.