Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28994

Backup produces garbage when using memory-mapped log (PMEM)

Details

    Description

      MDEV-14425 introduced a memory-mapped interface to the redo log when the log is stored in persistent memory, or in the /dev/shm file system. This improves performance and makes debugging more convenient, because the entire circular log file will be accessible at log_sys.buf in rr replay traces.

      However, this also exposed us to a (very reasonable) limitation of the rr debugger: it is assumed that the memory-mapped file will not be changed by processes that are not traced by the same rr record instance.

      Because it may be hard to remember the rule that mariadb-backup --backup must not be traced with rr record when the PMEM interface is being used, we’d better make backup refuse the use of the "fake PMEM" (mmap on redo log located in /dev/shm) when the process is apparently being run under rr record. Normal file system calls should be fine. We can use innodb_use_native_aio=0 as a proxy for detecting rr record, because both the old io_setup (libaio) nor the newer io_uring interface will return ENOSYS under rr record.

      Access to log files on real persistent memory (mount -o dax) will not be affected. It is assumed that rr record mariadb-backup --backup seldomly needs to be run such that the log is stored in persistent memory.

      Attachments

        Issue Links

          Activity

            mleich did some more testing. Unfortunately, the problems occur even when not using rr. Both /dev/shm and PMEM are affected. We even tried the following patch to force mariadb-backup --backup to always read the server’s ib_logfile0 via file system calls (pread(2)):

            diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
            index efdd527a28f..60d661f95f1 100644
            --- a/storage/innobase/log/log0log.cc
            +++ b/storage/innobase/log/log0log.cc
            @@ -176,7 +176,7 @@ void log_t::attach(log_file_t file, os_offset_t size)
             #ifdef HAVE_PMEM
               ut_ad(!buf);
               ut_ad(!flush_buf);
            -  if (size && !(size_t(size) & 4095))
            +  if (size && !(size_t(size) & 4095) && srv_operation != SRV_OPERATION_BACKUP)
               {
                 void *ptr=
                   my_mmap(0, size_t(size),
            

            mleich, can you please fill in more details?

            I am afraid that the only way to fix this is to make the server process provide a copy of the log to the backup process. That would be a subset of MDEV-14992.

            marko Marko Mäkelä added a comment - mleich did some more testing. Unfortunately, the problems occur even when not using rr . Both /dev/shm and PMEM are affected. We even tried the following patch to force mariadb-backup --backup to always read the server’s ib_logfile0 via file system calls ( pread(2) ): diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc index efdd527a28f..60d661f95f1 100644 --- a/storage/innobase/log/log0log.cc +++ b/storage/innobase/log/log0log.cc @@ -176,7 +176,7 @@ void log_t::attach(log_file_t file, os_offset_t size) #ifdef HAVE_PMEM ut_ad(!buf); ut_ad(!flush_buf); - if (size && !(size_t(size) & 4095)) + if (size && !(size_t(size) & 4095) && srv_operation != SRV_OPERATION_BACKUP) { void *ptr= my_mmap(0, size_t(size), mleich , can you please fill in more details? I am afraid that the only way to fix this is to make the server process provide a copy of the log to the backup process. That would be a subset of MDEV-14992 .

            I forgot to mention the most important part: The error that was reported by mariadb-backup --prepare was

            [ERROR] InnoDB: Missing FILE_CHECKPOINT…
            

            In the dataset that I analyzed, the ib_logfile0 that had been produced by mariadb-backup --backup looked like garbage. I did not recognize proper mini-transaction boundaries. The checkpoint header pointed to the middle of something that looked a little like updating some FULLTEXT INDEX related fields, while it should have been a FILE_CHECKPOINT record, optionally preceded by a number of FILE_MODIFY records. Because of this mismatch, no mini-transaction was successfully parsed from the log. The end-of-minitransaction byte should always be 1 and never 0 in a log file that is produced by backup, but here the parser reached the byte 0. That triggered the end of file.

            What could have happened is the following:

            1. Backup parsed a mini-transaction from the memory-mapped buffer. This took some time.
            2. Meanwhile, the server might have overwritten the circular log buffer with something else, that is, backup was too slow to keep up and the backup should actually have failed.
            3. Backup did memcpy() of the overwritten data to the output.

            Never using mmap for reading the log in mariadb-backup --backup should fix this type of scenario.

            marko Marko Mäkelä added a comment - I forgot to mention the most important part: The error that was reported by mariadb-backup --prepare was [ERROR] InnoDB: Missing FILE_CHECKPOINT… In the dataset that I analyzed, the ib_logfile0 that had been produced by mariadb-backup --backup looked like garbage. I did not recognize proper mini-transaction boundaries. The checkpoint header pointed to the middle of something that looked a little like updating some FULLTEXT INDEX related fields, while it should have been a FILE_CHECKPOINT record, optionally preceded by a number of FILE_MODIFY records. Because of this mismatch, no mini-transaction was successfully parsed from the log. The end-of-minitransaction byte should always be 1 and never 0 in a log file that is produced by backup, but here the parser reached the byte 0. That triggered the end of file. What could have happened is the following: Backup parsed a mini-transaction from the memory-mapped buffer. This took some time. Meanwhile, the server might have overwritten the circular log buffer with something else, that is, backup was too slow to keep up and the backup should actually have failed. Backup did memcpy() of the overwritten data to the output. Never using mmap for reading the log in mariadb-backup --backup should fix this type of scenario.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.