Details
- 
    New Feature 
- 
    Status: Stalled (View Workflow)
- 
    Major 
- 
    Resolution: Unresolved
- 
    1.5.3
- 
        2020-7
Description
Background:
ColumnStore is very slow in inserting single rows but fast to insert multiple rows. This is especially a problem in a master-slave setup where ColumnStore is is used on the slave for some of the tables in the master.
What we are trying to solve:
- Speed up inserts to ColumnStore to help ColumnStore on a slave to keep up
 with replication.
- The solution will also speed up single or few row insert's.
Requirements:
- Cache should be totally invisible for the normal user
- Cache usage should be crash safe. In no case will we loose any committed
 data.
- Cache should be able to hold data from thousands of independent insert
 transactions as the master may have done each insert in a separate
 transaction. The transactions can be intermixed with other transactions
 for other tables.
Assumptions for ColumnStore usage:
- ColumnStore is most often used on a slave
- Lots of inserts, few updates and deletes.
- Occasional selects (much fewer than inserts)
Note that the above requirements require us to store the inserted data on a
persistent medium at commits.
Considerations:
- In this draft we are suggesting to use MyISAM because it's trivial to
 have watermarks in MyISAM dynamic row format that tells us what is the
 last committed transaction, even after a crash. This is not possible
 with CVS. We can also not use MEMORY tables as it would not allow us
 to meet the requirements.
- We may choose to instead use Aria instead of MyISAM. However even if
 Aria would be used, we would not use crash safe tables as the overhead
 for storing undo's for inserts is notable and not needed.
- The MyISAM table will not use any indexes, which make inserts very fast.
 We need two syncs per commit (one for committing the data and one for
 committing the water mark). All row data will be cached by the file
 system, there is no internal cache.
Solution:
- A new table handler, COLUMNSTORE_CACHE, that inheritens from
 ColumnStore and overrides the following methods:
- Create table
- Creates table in MyISAM and then an identical table in ColumnStore
- ha_write_row
- Insert row into MyISAM
- ha_delete_row and ha_update
- Flush committed rows in cache as a separate transaction.
- Copy all not committed rows to ColumnStore.
- Run rest of transaction as normal transaction (slow but safe).
- external_lock
- In case of read lock, flush insert cache to ColumnStore up to
 last commit
- Call columnstore::external_lock
- close
- drop
- rename
Stages:
1)
- Inserts will limited to one transaction at a time. This can be crash safe
 in MyISAM. In case of crash all not committed rows can trivially be
 discarded.
- Will not be 100% crash safe in the case where we crash exactly after
 copying data but before we have time to truncate the cache
 (very small window)
- In cache of flush, wait until currently running transaction using
 ColumnStore has ended and then flush everything at once and truncate cache.
- Assumption is that there are no update and delete's in a transaction that
 starts with an insert. (Most common case)
2)
- Add handling of delete and update in a transaction that also used inserts
 according to 'Solution'.
3) Add support for batched inserts without cpimport to ColumnStore.
   This will speed up flushing of insert cache, LOAD DATA and INSERT SELECT.
4)
- Add 'XA' kind of security around 'copy to ColumnStore, commit and truncate
 MyISAM. This will make the cache totally crash safe.
- Add an interface for ColumnStore engine to ask for current transaction id
 and after crash be able to ask if the id was committed.
- Add crash recovery to ensure that an interrupted the copy was fully done
 and, if not done, redo it.
5)
- Allow multiple inserts at same time.
- In case of flush, block new inserts, wait until old transactions has
 committed and move all rows to ColumnStore at once.
6)
- Don't truncate the table as part of copy. Only add a watermark in the
 MyISAM table what is copied. This allows us to do copying of data while
 other transactions are doing inserts.
- Truncate can be done when there is no inserts since last copy or
 when the cached files has reached some size limit.
7)
- Add a timer that flushes committed transactions from cache once in a while
 (once per hour?)
- Add logic to know when inserts are idle and flush in idle time.
- FLUSH TABLE table_name on a cached table should wait until current
 transactions has ended and flush all data to ColumnStore.
- Option to flush data to ColumnStore on server shutdown