Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27939

Log buffer wrap-around errors on PMEM

    XMLWordPrintable

    Details

      Description

      Matthias Leich reproduced an assertion failure in mariadb-backup --prepare:

      mariabackup: /data/Server/bb-10.8-MDEV-26827/extra/mariabackup/xtrabackup.cc:3017: bool xtrabackup_copy_logfile():
      Assertion `recv_sys.offset < recv_sys.len' failed.
      

      In the core dump, we actually have recv_sys.offset == recv_sys.len, so it looks like the assertion should be relaxed to include the equality. Furthermore, at the end of the loop body, the recv_sys.offset needs to wrap around to log_sys.START_OFFSET (0x3000).

      To be able to repeat that bug, I modified an existing test case to trigger log file wrap-around exactly at a record boundary:

      diff --git a/mysql-test/suite/mariabackup/huge_lsn.test b/mysql-test/suite/mariabackup/huge_lsn.test
      index 00c7e66516c..62b16ef93fd 100644
      --- a/mysql-test/suite/mariabackup/huge_lsn.test
      +++ b/mysql-test/suite/mariabackup/huge_lsn.test
      @@ -40,27 +40,33 @@ do "$ENV{MTR_SUITE_DIR}/../innodb/include/crc32.pl";
       my $file= "$ENV{MYSQLD_DATADIR}/ib_logfile0";
       open(FILE, ">", $file) or die "Unable to open $file\n";
       binmode FILE;
      -# the desired log sequence number, plus 16
       my $extra_repeat = 139820;
      +# The desired log sequence number is 17596481011216 (0x1000fffffe10).
      +my $file_size=4<<20;
       my $lsn_hi=4096,$lsn_lo=0xfffffe00 - $extra_repeat * 15;
       my $polynomial = 0x82f63b78; # CRC-32C
       my ($header, $checkpoint, $log);
      -$header = "Phys" . pack("x[4]NN", $lsn_hi, $lsn_lo) .
      +$header = "Phys" . pack("x[4]NN", $lsn_hi, $lsn_lo - $file_size + 0x300f) .
                 "some Perl code" . pack("x[478]");
       $header .= pack("Nx[3584]", mycrc32($header, 0, $polynomial));
       $checkpoint = pack("NNNNx[44]", $lsn_hi, $lsn_lo, $lsn_hi, $lsn_lo);
       $checkpoint .= pack("Nx[8128]", mycrc32($checkpoint, 0, $polynomial));
       $log = pack("CxxNN", 0xfa, $lsn_hi, $lsn_lo);
      -$log .= pack("CN", 1, mycrc32($log, 0, $polynomial));
      +$log .= pack("CN", 0, mycrc32($log, 0, $polynomial));
       
       # Write more than 2MiB of FILE_MODIFY mini-transactions to exercise the parser.
       my $extra = pack("CCxa*", 0xb9, 127, "a/b.ibd");
      -$extra .= pack("CN", 1, mycrc32($extra, 0, $polynomial));
      +$extra .= pack("CN", 0, mycrc32($extra, 0, $polynomial));
       
      -print FILE $header, $checkpoint, $extra x $extra_repeat, $log;
      +print FILE $header, $checkpoint, $extra x ($extra_repeat - 1), $log;
      +seek(FILE, $file_size - 15, 0);
      +$extra = pack("CCxa*", 0xb9, 127, "c/d.ibd");
      +$extra .= pack("CN", 1, mycrc32($extra, 0, $polynomial));
      +print FILE $extra;
       close(FILE) or die "Unable to close $file\n";
       EOF
      ---let SEARCH_PATTERN= redo log from 2\\.012MiB to [0-9.]*[KMGT]iB; LSN=17596481011216\\b
      +--let SEARCH_PATTERN= InnoDB: log sequence number 17596481011216
      +--let $restart_parameters=--innodb-log-file-size=4M
       }
       
       --source include/start_mysqld.inc
      @@ -80,6 +86,7 @@ INSERT INTO t VALUES(2);
       echo # xtrabackup prepare;
       --disable_result_log
       exec $XTRABACKUP  --prepare --target-dir=$targetdir;
      +--let $restart_parameters=
       --source include/restart_and_restore.inc
       --enable_result_log
       SELECT * FROM t;
      

      This test works fine when the mmap() based interface to the log is not being used. That interface would be used when InnoDB is linked with libpmem and the data directory resides in /dev/shm or in a mount -o dax mounted file system. If those conditions are met, InnoDB would fail to start up like this:

      10.8

      CURRENT_TEST: mariabackup.huge_lsn
      mysqltest: At line 74: query 'CREATE TABLE t(i INT) ENGINE INNODB' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'INNODB'
      2022-02-24 17:03:33 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT(17596478913900) at 17596478913900
      

      An analysis shows that parsing wrongly continues at recv_sys.offset=0x400000, that is, right past the end of the mmap() buffer. The offset should have wrapped around to log_sys.START_OFFSET.

      So, there is something to fix in both the server and the backup.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.