Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17084

Optimize append only files for NVDIMM




      The task is to optimize the speed performance of append only files with
      the help of persistent memory (NVDIMM).


      To understand what I would like to do, we first have to consider
      the the optimal "user" interface for using persistent memory on
      append-only files:

      • Least possible changes to existing applications.
      • Application should work unchanged even if there is no or limited
        persistent memory on the machine.
      • Should work on any file, any where.

      One way to do this would be something like:

      • Create log file
      • Execute an ioctl(file, IOTCL_OPTIMIZE_FOR_APPEND_ONLY, 1024*1024)
      • This should use assign 1M to optimize append write performance for
        the file until it's closed.
      • Close log file
      • This should flush everything from persistent memory to the file
      • Any read, write, sync to the file should work 'normally', like if
        it would be a normal file.
      • On reboot the persistent memory should be flushed to the file, so
        that the file contains what was written to it.
      • There should also some calls/tool to check how much persistent memory
        exists, who is allocating how much etc.

      In other words, to use persistent memory to improve append only,
      in most cases there would be only a one line change for each log file.

      Unfortunately the above approach is not practical as it's hard to get to
      work on all platforms.

      However, I would like to create something that is as close as the above
      as possible. This would allow anyone to, with minimum amount of changes
      to adopt the library for their C or C++ application.


      The suggested library to use is http://pmem.io/ and especially
      http://pmem.io/pmdk/libpmem/. It seams to be available on most
      modern linux.
      Note that this suggested library should be written so that it works
      even if there is no pmem library or persistent library available. In this
      case it everything should work exactly like before (with the overhead
      of one virtual call per pwrite()).

      There is a library based on the above that implements support for
      append only files
      but this assumes that the full log files should be in persistent
      memory, which is not optimal for the end user. It's also quite
      complex to use with MariaDB, would require a lot of changes in MariaDB
      to use and would be hard to get to work with and without persistent

      Because of the above, I suggest we would base our work on the
      low level http://pmem.io/pmdk/libpmem/ library.

      Here is what I envision as an interface:

      struct pem_append_base_handler {
        void *map;               /* 0 if no persistent memory */
        size_t mapped_length;    /* Available persistent memory */
      /* Initialize pmem_append */
      pmem_append_base_handler *pmem_append_base_init(const char* path_to_mem_dev);
      /* Write all cached memory to files and free up memory for reuse */
      int pmem_append_write_all(pmem_append_base_handler *ptr);
      /* end usage of pmem_append */
      void pmem_append_close(pmem_append_base_handler *ptr);
      /* Allocate a file_append_handler for a specific file */
      pemem_append_handler *pmem_append_init(pmem_handler, path_to_log_file...);

      To use the library one should do something like:

      pmem_handler= pmem_append_base_init(path_to_memory_device);
      log_file_handler= open(path_to_log_file,...);
      /* request to use half of available persistent memory for this file */
      handler= pmem_append_init(pmem_handler, path_to_log_file,
                                pmem_handler->mapped_length / 2);
      handler->fsync();    /* Write out memory to file (in case of crash before) */

      The handler would be a struct where the public members would be something

      struct pem_append_handler {
        int log_file_handler;
        const char *path_to_log_file;
        /* size of the persistent buffer for this file */
        ulonglong memory_available;
        pmem_append_base_handler *pmem;
        off_t     offset;   /* End of file */
        ssize_t (*append)(int fildes, const void *buf, size_t nbyte);
        ssize_t (*pwrite)(int fildes, const void *buf, size_t nbyte,
                        off_t offset);
        ssize_t (*pread)(int fildes, const void *buf, size_t nbyte,
                         off_t offset);
        /* Write persistent memory in file region to file */
        int sync(off_t offset, size_t length);
        /* Write all persistent memory to file and fsync file */
        int fsync(void);

      If there is no persistent memory, the above calls to pread/pwrite would be
      mapped to normal read/writes.

      The use this interface, one would have to do the following changes in the

      • Add a call to pmem_append_init() when one opens the log file.
      • This call will also flush any cached data to the file.
      • Change write calls from pwrite to handler->pwrite() or handler->append
        or add a call to handler->sync() to ensure that the area is already written.
      • Change read calls from pread to handler->pread()
      • Change fseek(SEEK_END) use handler->offset
      • Change fsync to handler->fsync

      Note that any normal reads will work on the file normally. The user can
      always call handler->fsync() to be able to use any file operations normally

      Some applications may have their own version of pwrite/pread (like MariaDB).
      To allow these to work with the above, there should also be a mapping
      trough which the library calls pread, pwrite and sync so that one can use
      the applications calls. For example, Aria is using my_pwrite() instead of

      The library would internally do also the following things:

      • Create a background thread (in pmem_append_base_init()) that will
        monitor all append files and start flushing as soon as half of the
        memory of the respective cache is used.
      • Create a separate segment for each pmem_append_init() call and
        store information about the file there that can be used on restart.

      There should also be a external tool that one can use to:

      • See which files are cached by a persistent memory file and how much
        is still not written.
      • Force the cache to be written to some or all of the files
      • Reset the cache

      With the above library, one should be able to take an application like
      MariaDB and convert all append only files (MariaDB has usually 3
      active log files: binary log, InnoDB redo log, Aria redo log) to use
      persistent memory in a matter of a few hours and still work when there is
      no persistent memory available.


        Issue Links



              svoj Sergey Vojtovich
              monty Michael Widenius
              2 Vote for this issue
              13 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.