Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24676

Concurrent multi-reader, multi-writer buffer for IO_CACHE

Details

    Description

      IO_CACHE has basically three read/write modes: only read, only write, and a sequential read/write FIFO mode SEQ_READ_APPEND.

      While some performance-sensitive places, like replication slave thread, use SEQ_READ_APPEND, that may be a bottleneck. since reads and writes are sequential (and co-sequential i.e. reads and writes block each other).

      The task is to implement a non-blocking mode or multi-reader, multi-writer use-case through a concurrent ring buffer implementation.

      Possible approaches

      Lock-free n-consumer, m-producer ring buffer

      This implementation requires limiting a number of simultaneous accessors and reserving slots for them.
      Lock-free implementations can contain busy waits, but no locks, except when a number of consumers or producers is exceeded. This can be controlled by a semaphore with a capacity of a number of cores.
      This is an ideal way, but can be an overhaul because of complicated busy loops and slot management.
      This is also hard because writes can be bigger than a buffer. See buffer excess.

      Simple rwlock-based non-blocking approach

      The bottleneck basically occurs because SEQ_READ_APPEND blocks the whole time during buffer copy.
      We can avoid it by moving the pointers first, thus allocating a place for copying, and then make a copy from/to the buffer without a lock.
      rwlock will be used to access the pointers, i.e. reads access IO_CACHE::end_of_file with read lock to make ensure the borders, but writers access it with write lock.

      Buffer excess

      Excesses make things work sequential.
      When the buffer is full, a separate write buffer is created. When the write buffer is full, a flush happens.
      Flushes wait for all writers to finish first, then lock the write buffer for flushing.
      The read buffer can be flushed in a more relaxed way: no need to need to lock for flushing, but we have to lock for buffer allocation and wait for all writers.
      Waiting for writers can be done with another rwlock.

      Single-readerness

      The real-world cases are mostly single-consumer, and it is essential for IO_CACHE: it is variable-lengthed, and has no underlying data format,
      so the reader always has to make at least two sequential reads (one to read size and another to read the body)

      Single-readerness considerations can relax some conditions and ease the implementation

      io_cache_reserve api

      We can add a function to reserve the space to writing for the case of writing big objects (both bigger then the write cache
      and smaller then the latter, but big enough to not fit to the external buffer), for the cases like copying one cache to another.

      The function should return future-like object, since we have to notify IO_CACHE back that the writing is finished (to make flush for example)

      Attachments

        Issue Links

          Activity

            Hi, I'm Rupeekshan Maheswaran, currently pursuing B.Sc Honours degree in Computer Science at the University of Jaffna, Sri Lanka. I'm particularly interested in this issue to complete as my GSoC 2021 project. I would like to request guidance regarding this issue.Thanks.

            Rupeekshan Rupeekshan Maheswaran added a comment - Hi, I'm Rupeekshan Maheswaran, currently pursuing B.Sc Honours degree in Computer Science at the University of Jaffna, Sri Lanka. I'm particularly interested in this issue to complete as my GSoC 2021 project. I would like to request guidance regarding this issue.Thanks.

            Hello, Rupeekshan! Sorry to tell you, but this task is already taken by Vladislav Kakurin, another B.Sc student Hope we'll find another task you'll be interested in for this summer!

            nikitamalyavin Nikita Malyavin added a comment - Hello, Rupeekshan ! Sorry to tell you, but this task is already taken by Vladislav Kakurin, another B.Sc student Hope we'll find another task you'll be interested in for this summer!

            Thank you so much for the response!

            Rupeekshan Rupeekshan Maheswaran added a comment - Thank you so much for the response!
            nikitamalyavin Nikita Malyavin added a comment - - edited

            So far, the 2021 GSoC is finished.
            The latest commit to this time is: https://github.com/MagHErmit/server/commit/c011f29ac9e835e50716242aab5bc89b952ee712
            Author: Vladislav Kakurin.

            We have reached the Fine grained Single-reader, Multiple-writers milestone. Multiple-readers is still in TODO.

            One important design flaw has been found by Vladislav: we cannot acquire the _buffer_lock on _slot_release(), while holding flush_rw_lock. Suchwise, flush_rw_lock should be released before acquiring _buffer_lock.
            The meaning of flush_rw_lock will be "controlling the buffer copy access, and file access", but no slot access control.

            With this change, two problems pop up:

            1. When the buffer is full, and flush is happened, the buffer is invalidated and new data can be written in the addresses overlapping with the slots not yet released. This problem should be carefully worked out: most probably by introducing buffer versions increasing after each flush, or more careful helping in this case, if possible (like, checking for own slot's finished, which will mean we can help for the next version's slots).

            2. locking _buffer_lock in _slot_release() may result in waiting for the flush end. This should be also handled somehow. For example, by unlocking _buffer_lock earlier by the flusher.

            Further optimizations can be made, as seen now:

            • When flushing, reorganize the data in such a way that it can be dumped into a file with single write.
            • When reading, truncate the file when it's empty. Alternatively, use native pipes.
            • Reorganize the slots into queue
            • Use spinlock-based locks, which may be faster.
            • We can get rid of semaphore in cases when the amount of accessing threads is limited by the allocated slots number. Therefore, it'll never exceed. With this regard, semaphore can be moved to a trait.

            The API is minimalistic yet. We should at least add size() and ability to contigiously (atomically) write from some reentrant source (generator in Python terms), like another file, to make the solution work for us

            nikitamalyavin Nikita Malyavin added a comment - - edited So far, the 2021 GSoC is finished. The latest commit to this time is: https://github.com/MagHErmit/server/commit/c011f29ac9e835e50716242aab5bc89b952ee712 Author: Vladislav Kakurin. We have reached the Fine grained Single-reader, Multiple-writers milestone. Multiple-readers is still in TODO. One important design flaw has been found by Vladislav: we cannot acquire the _buffer_lock on _slot_release() , while holding flush_rw_lock . Suchwise, flush_rw_lock should be released before acquiring _buffer_lock . The meaning of flush_rw_lock will be "controlling the buffer copy access, and file access", but no slot access control. With this change, two problems pop up: 1. When the buffer is full, and flush is happened, the buffer is invalidated and new data can be written in the addresses overlapping with the slots not yet released. This problem should be carefully worked out: most probably by introducing buffer versions increasing after each flush, or more careful helping in this case, if possible (like, checking for own slot's finished , which will mean we can help for the next version's slots). 2. locking _buffer_lock in _slot_release() may result in waiting for the flush end. This should be also handled somehow. For example, by unlocking _buffer_lock earlier by the flusher. Further optimizations can be made, as seen now: When flushing, reorganize the data in such a way that it can be dumped into a file with single write. When reading, truncate the file when it's empty. Alternatively, use native pipes. Reorganize the slots into queue Use spinlock-based locks, which may be faster. We can get rid of semaphore in cases when the amount of accessing threads is limited by the allocated slots number. Therefore, it'll never exceed. With this regard, semaphore can be moved to a trait. The API is minimalistic yet. We should at least add size() and ability to contigiously (atomically) write from some reentrant source (generator in Python terms), like another file, to make the solution work for us

            People

              nikitamalyavin Nikita Malyavin
              nikitamalyavin Nikita Malyavin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.