Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3875

Columnstore write cache

    XMLWordPrintable

Details

    Description

      Background:
      ColumnStore is very slow in inserting single rows but fast to insert multiple rows. This is especially a problem in a master-slave setup where ColumnStore is is used on the slave for some of the tables in the master.

      What we are trying to solve:

      • Speed up inserts to ColumnStore to help ColumnStore on a slave to keep up
        with replication.
      • The solution will also speed up single or few row insert's.

      Requirements:

      • Cache should be totally invisible for the normal user
      • Cache usage should be crash safe. In no case will we loose any committed
        data.
      • Cache should be able to hold data from thousands of independent insert
        transactions as the master may have done each insert in a separate
        transaction. The transactions can be intermixed with other transactions
        for other tables.

      Assumptions for ColumnStore usage:

      • ColumnStore is most often used on a slave
      • Lots of inserts, few updates and deletes.
      • Occasional selects (much fewer than inserts)

      Note that the above requirements require us to store the inserted data on a
      persistent medium at commits.

      Considerations:

      • In this draft we are suggesting to use MyISAM because it's trivial to
        have watermarks in MyISAM dynamic row format that tells us what is the
        last committed transaction, even after a crash. This is not possible
        with CVS. We can also not use MEMORY tables as it would not allow us
        to meet the requirements.
      • We may choose to instead use Aria instead of MyISAM. However even if
        Aria would be used, we would not use crash safe tables as the overhead
        for storing undo's for inserts is notable and not needed.
      • The MyISAM table will not use any indexes, which make inserts very fast.
        We need two syncs per commit (one for committing the data and one for
        committing the water mark). All row data will be cached by the file
        system, there is no internal cache.

      Solution:

      • A new table handler, COLUMNSTORE_CACHE, that inheritens from
        ColumnStore and overrides the following methods:
      • Create table
      • Creates table in MyISAM and then an identical table in ColumnStore
      • ha_write_row
      • Insert row into MyISAM
      • ha_delete_row and ha_update
      • Flush committed rows in cache as a separate transaction.
      • Copy all not committed rows to ColumnStore.
      • Run rest of transaction as normal transaction (slow but safe).
      • external_lock
      • In case of read lock, flush insert cache to ColumnStore up to
        last commit
      • Call columnstore::external_lock
      • close
      • drop
      • rename

      Stages:

      1)

      • Inserts will limited to one transaction at a time. This can be crash safe
        in MyISAM. In case of crash all not committed rows can trivially be
        discarded.
      • Will not be 100% crash safe in the case where we crash exactly after
        copying data but before we have time to truncate the cache
        (very small window)
      • In cache of flush, wait until currently running transaction using
        ColumnStore has ended and then flush everything at once and truncate cache.
      • Assumption is that there are no update and delete's in a transaction that
        starts with an insert. (Most common case)

      2)

      • Add handling of delete and update in a transaction that also used inserts
        according to 'Solution'.

      3) Add support for batched inserts without cpimport to ColumnStore.
      This will speed up flushing of insert cache, LOAD DATA and INSERT SELECT.

      4)

      • Add 'XA' kind of security around 'copy to ColumnStore, commit and truncate
        MyISAM. This will make the cache totally crash safe.
      • Add an interface for ColumnStore engine to ask for current transaction id
        and after crash be able to ask if the id was committed.
      • Add crash recovery to ensure that an interrupted the copy was fully done
        and, if not done, redo it.

      5)

      • Allow multiple inserts at same time.
      • In case of flush, block new inserts, wait until old transactions has
        committed and move all rows to ColumnStore at once.

      6)

      • Don't truncate the table as part of copy. Only add a watermark in the
        MyISAM table what is copied. This allows us to do copying of data while
        other transactions are doing inserts.
      • Truncate can be done when there is no inserts since last copy or
        when the cached files has reached some size limit.

      7)

      • Add a timer that flushes committed transactions from cache once in a while
        (once per hour?)
      • Add logic to know when inserts are idle and flush in idle time.
      • FLUSH TABLE table_name on a cached table should wait until current
        transactions has ended and flush all data to ColumnStore.
      • Option to flush data to ColumnStore on server shutdown

      Attachments

        Issue Links

          Activity

            People

              drrtuy Roman
              monty Michael Widenius
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.