Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28977

Race condition in the recovery of CREATE TABLE or table-rebuilding DDL

    XMLWordPrintable

Details

    Description

      The test mariabackup.huge_lsn that was rewritten in MDEV-14425 often fails, mostly on the amd64-windows builder, but occasionally also on Linux. I was able to reproduce the failure on a local ASAN build with the following simplification (disabling encryption for easier analysis):

      diff --git a/mysql-test/suite/mariabackup/huge_lsn.opt b/mysql-test/suite/mariabackup/huge_lsn.opt
      deleted file mode 100644
      index c65b76fb7d2..00000000000
      --- a/mysql-test/suite/mariabackup/huge_lsn.opt
      +++ /dev/null
      @@ -1,10 +0,0 @@
      ---innodb-encrypt-log=ON
      ---innodb-tablespaces-encryption
      ---innodb-encrypt-tables=ON
      ---innodb-encryption-rotate-key-age=1
      ---innodb-encryption-threads=4
      ---plugin-load-add=$FILE_KEY_MANAGEMENT_SO
      ---loose-file-key-management
      ---loose-file-key-management-filekey=FILE:$MTR_SUITE_DIR/filekeys-data.key
      ---loose-file-key-management-filename=$MTR_SUITE_DIR/filekeys-data.enc
      ---loose-file-key-management-encryption-algorithm=aes_cbc
      diff --git a/mysql-test/suite/mariabackup/huge_lsn.result b/mysql-test/suite/mariabackup/huge_lsn.result
      index 503d13fcac4..bc8c9b7a1ab 100644
      --- a/mysql-test/suite/mariabackup/huge_lsn.result
      +++ b/mysql-test/suite/mariabackup/huge_lsn.result
      @@ -1,9 +1,9 @@
       #
       # MDEV-13416 mariabackup fails with EFAULT "Bad Address"
       #
      -# restart: --innodb-log-file-size=4M --innodb-encrypt-log=0
      +# restart: --innodb-log-file-size=4M
       FOUND 1 /InnoDB: log sequence number 17596481011216/ in mysqld.1.err
      -CREATE TABLE t(i INT) ENGINE=INNODB ENCRYPTED=YES;
      +CREATE TABLE t(i INT) ENGINE=INNODB;
       INSERT INTO t VALUES(1);
       # xtrabackup backup
       SET GLOBAL innodb_flush_log_at_trx_commit=1;
      diff --git a/mysql-test/suite/mariabackup/huge_lsn.test b/mysql-test/suite/mariabackup/huge_lsn.test
      index 8850e9d8954..84702bcfdef 100644
      --- a/mysql-test/suite/mariabackup/huge_lsn.test
      +++ b/mysql-test/suite/mariabackup/huge_lsn.test
      @@ -1,5 +1,4 @@
       --source include/not_embedded.inc
      ---source include/have_file_key_management.inc
       
       --echo #
       --echo # MDEV-13416 mariabackup fails with EFAULT "Bad Address"
      @@ -66,14 +65,14 @@ print FILE $extra;
       close(FILE) or die "Unable to close $file\n";
       EOF
       --let SEARCH_PATTERN= InnoDB: log sequence number 17596481011216
      ---let $restart_parameters=--innodb-log-file-size=4M --innodb-encrypt-log=0
      +--let $restart_parameters=--innodb-log-file-size=4M
       }
       
       --source include/start_mysqld.inc
       let SEARCH_FILE= $MYSQLTEST_VARDIR/log/mysqld.1.err;
       --source include/search_pattern_in_file.inc
       
      -CREATE TABLE t(i INT) ENGINE=INNODB ENCRYPTED=YES;
      +CREATE TABLE t(i INT) ENGINE=INNODB;
       INSERT INTO t VALUES(1);
       
       echo # xtrabackup backup;
      @@ -85,6 +84,7 @@ SET GLOBAL innodb_flush_log_at_trx_commit=1;
       INSERT INTO t VALUES(2);
       echo # xtrabackup prepare;
       --disable_result_log
      +exec tar cJf $MYSQLTEST_VARDIR/targetdir.tar.xz $targetdir/
       exec $XTRABACKUP  --prepare --target-dir=$targetdir;
       --let $restart_parameters=
       --source include/restart_and_restore.inc
      

      The test would not fail in a non-ASAN build for me. targetdir.tar.xz is a copy of a database that was restored incorrectly. Actually, I had run the --backup command under rr record, because I was anticipating an error there. But the backed up log seems to be fine. The relevant part from the backup output (only available via rr replay) is:

      10.10 63961a08a6203f4d58363a9321e4cf9c8b07a9fe

      [00] 2022-06-29 11:13:25 Redo log (from LSN 17596478913900 to 17596481013381) was copied.
      [00] 2022-06-29 11:13:25 completed OK!
      

      The ib_logfile0 in targetdir.tar.xz indeed contains the artificially generated records for c/d.ibd, a/b.ibd as well as the records for creating and initializing the file test/t.ibd. The test failure output contains the following:

      10.10 63961a08a6203f4d58363a9321e4cf9c8b07a9fe

      CURRENT_TEST: mariabackup.huge_lsn
      tar: Removing leading `/' from member names
      /dev/shm/10.10a/extra/mariabackup/mariabackup based on MariaDB server 10.10.0-MariaDB Linux (x86_64)
      [01] 2022-06-29 11:13:32 Copying ib_logfile0 to /dev/shm/10.10a/mysql-test/var/22/mysqld.1/data/ib_logfile0
      [01] 2022-06-29 11:13:32         ...done
      …
      [01] 2022-06-29 11:13:33 Copying ./mysql/transaction_registry.ibd to /dev/shm/10.10a/mysql-test/var/22/mysqld.1/data/mysql/transaction_registry.ibd
      [01] 2022-06-29 11:13:33         ...done
      [01] 2022-06-29 11:13:33 Copying ib_buffer_pool to /dev/shm/10.10a/mysql-test/var/22/mysqld.1/data/ib_buffer_pool
      [01] 2022-06-29 11:13:33         ...done
      [00] 2022-06-29 11:13:33 completed OK!
      mysqltest: At line 92: query 'SELECT * FROM t' failed: ER_TABLE_CORRUPT (1877): Table test/t is corrupted. Please drop the table and recreate.
      

      There was no message about applying any log. The log was replaced with a dummy one that only represents the end LSN of the backup:

      000000 50 68 79 73 00 00 00 00 00 00 10 01 00 00 06 85
      000010 4d 61 72 69 61 44 42 20 31 30 2e 31 30 2e 30 00
      000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      *
      0001f0 00 00 00 00 00 00 00 00 00 00 00 00 41 17 f5 8a
      000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      *
      002000 00 00 10 01 00 00 06 85 00 00 10 01 00 00 06 85
      002010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      *
      002030 00 00 00 00 00 00 00 00 00 00 00 00 d7 50 d3 b5
      002040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      *
      003000 fa 00 00 00 00 10 01 00 00 06 85 01 79 77 29 c7
      003010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      *
      a00000
      

      Because the first page of test/t.ibd was all-zero at the time it was backed up, the subsequent server startup (and the SELECT statement) would fail due to this:

      10.10 63961a08a6203f4d58363a9321e4cf9c8b07a9fe

      2022-06-29 11:13:33 0 [ERROR] InnoDB: Expected tablespace id 5 but found 0 in the file ./test/t.ibd
      2022-06-29 11:13:33 0 [Note] Server socket created on IP: '127.0.0.1'.
      2022-06-29 11:13:33 0 [Note] /dev/shm/10.10a/sql/mariadbd: ready for connections.
      Version: '10.10.0-MariaDB-debug-log'  socket: '/dev/shm/10.10a/mysql-test/var/tmp/22/mysqld.1.sock'  port: 16420  Source distribution
      2022-06-29 11:13:33 3 [ERROR] mariadbd: Table test/t is corrupted. Please drop the table and recreate.
      

      Attachments

        1. targetdir.tar.xz
          288 kB
          Marko Mäkelä

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.