The task is to optimize the speed performance of append only files with
the help of persistent memory (NVDIMM).
To understand what I would like to do, we first have to consider
the the optimal "user" interface for using persistent memory on
- Least possible changes to existing applications.
- Application should work unchanged even if there is no or limited
persistent memory on the machine.
- Should work on any file, any where.
One way to do this would be something like:
- Create log file
- Execute an ioctl(file, IOTCL_OPTIMIZE_FOR_APPEND_ONLY, 1024*1024)
- This should use assign 1M to optimize append write performance for
the file until it's closed.
- Close log file
- This should flush everything from persistent memory to the file
- Any read, write, sync to the file should work 'normally', like if
it would be a normal file.
- On reboot the persistent memory should be flushed to the file, so
that the file contains what was written to it.
- There should also some calls/tool to check how much persistent memory
exists, who is allocating how much etc.
In other words, to use persistent memory to improve append only,
in most cases there would be only a one line change for each log file.
Unfortunately the above approach is not practical as it's hard to get to
work on all platforms.
However, I would like to create something that is as close as the above
as possible. This would allow anyone to, with minimum amount of changes
to adopt the library for their C or C++ application.
The suggested library to use is http://pmem.io/ and especially
http://pmem.io/pmdk/libpmem/. It seams to be available on most
Note that this suggested library should be written so that it works
even if there is no pmem library or persistent library available. In this
case it everything should work exactly like before (with the overhead
of one virtual call per pwrite()).
There is a library based on the above that implements support for
append only files
but this assumes that the full log files should be in persistent
memory, which is not optimal for the end user. It's also quite
complex to use with MariaDB, would require a lot of changes in MariaDB
to use and would be hard to get to work with and without persistent
Because of the above, I suggest we would base our work on the
low level http://pmem.io/pmdk/libpmem/ library.
Here is what I envision as an interface:
To use the library one should do something like:
The handler would be a struct where the public members would be something
If there is no persistent memory, the above calls to pread/pwrite would be
mapped to normal read/writes.
The use this interface, one would have to do the following changes in the
- Add a call to pmem_append_init() when one opens the log file.
- This call will also flush any cached data to the file.
- Change write calls from pwrite to handler->pwrite() or handler->append
or add a call to handler->sync() to ensure that the area is already written.
- Change read calls from pread to handler->pread()
- Change fseek(SEEK_END) use handler->offset
- Change fsync to handler->fsync
Note that any normal reads will work on the file normally. The user can
always call handler->fsync() to be able to use any file operations normally
Some applications may have their own version of pwrite/pread (like MariaDB).
To allow these to work with the above, there should also be a mapping
trough which the library calls pread, pwrite and sync so that one can use
the applications calls. For example, Aria is using my_pwrite() instead of
The library would internally do also the following things:
- Create a background thread (in pmem_append_base_init()) that will
monitor all append files and start flushing as soon as half of the
memory of the respective cache is used.
- Create a separate segment for each pmem_append_init() call and
store information about the file there that can be used on restart.
There should also be a external tool that one can use to:
- See which files are cached by a persistent memory file and how much
is still not written.
- Force the cache to be written to some or all of the files
- Reset the cache
With the above library, one should be able to take an application like
MariaDB and convert all append only files (MariaDB has usually 3
active log files: binary log, InnoDB redo log, Aria redo log) to use
persistent memory in a matter of a few hours and still work when there is
no persistent memory available.