[MCOL-4769] Complete ColumnStore Insert Cache Part 1 Created: 2021-06-22  Updated: 2022-03-31  Resolved: 2022-03-04

Status: Closed
Project: MariaDB ColumnStore
Component/s: MDB Plugin
Affects Version/s: None
Fix Version/s: 6.2.1

Type: New Feature Priority: Critical
Reporter: Gregory Dorman (Inactive) Assignee: David Hall (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MCOL-3875 Columnstore write cache Stalled
PartOf
includes MCOL-4790 LDIF & cache_inserts - table lock stu... Closed
Problem/Incident
is caused by MCOL-4925 Insert Cache - warning messages when ... Closed
Relates
relates to MCOL-4805 can't start PM2/PM3 node after signal 11 Closed
relates to MCOL-4914 Complete ColumnStore Cache Part 2 Open
Sprint: 2021-9, 2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15, 2021-16, 2021-17

 Description   

The objective of this task is to advance ColumnStore write cache - as defined in MCOL-3875.

First phase of it was limited to supporting HTAP replication only. This task is extending it into more general status: INSERTS and LDI.

Known problems to be solved under priority:

1. Once the feature is activated in configs, all previous columnstore tables can no longer be accessed. Once de-activated, they come back.
2. Once the feature is activated, and a table is created and populated, it can be dropped, but then cannot be re-created. (this is confirmed as not happening anymore, not a problem).
3. In case of multi-node CS cluster set up for DDL replication from primary to the rest, when the cache is flushed, it appears that the data replication occurs also (in addition to DDL). It is supposed to be suppressed in a normal CS cluster operation.
4. Under heavier concurrent bombardment of LDIs, some sessions go into "waiting for table lock" state, which in some cases clears after a few hours (unclear why, perhaps some sort of timeout), and some remain in that state forever. Waiting for the lock is understandable - cache is being locked for the duration of a flush. But even if flush takes a few seconds, why would some sessions lock out for hours, and some forever? (workaround - MCOL-4790, use columnstore_cache_use_import)

Basics of the algorithms (full description in 3875):

  • inserts are accumulated in cache, transactions serialized by cache locks
  • Rollbacks will roll back cache
  • Accumulated inserts are flushed into columnstore on various events (all non inserts, including SELECT - and settable size threshold defaulted to 500,000 rows in the cache).
    • we may want to consider relaxing the above rule and letting SELECTS run as "dirty reads"

Testing instructions

Part 1. Positive tests.

  • AUTOCOMMIT=ON, do a lot of individual INSERTs. With cache_insert and cpimport_cache_insert the speed should be orders of magnitude greater.
  • Do same inserts but this time within multiple transactions - have something like 3000-4000 inserts in each transaction.
  • Do lots of LDIs (also 3000-4000 rows each) in rapid loop (see test scripts in MCOL-4790 as an example). Should be a huge difference in speed.
  • Do above from multiple concurrent sessions - the speed improvements should be even greater.

Part 2. Adversarial tests.

  • While running AUTOCOMMIT=ON and inserts, also do occasional updates, deletes, and even selects. These reduce the effectiveness of the feature (cache has to be flushed every time these happen). The feature is not useful for workflows if they do not have a predominance of uninterrupted inserts or LDIs.
  • Put some updates, deletes or selects inside transactions which do inserts (say, one of those for each 500 inserts). compare the speed.
  • Run transactions with inserts from multiple sessions, but rollback some and not others. Make sure the end result data is correct.


 Comments   
Comment by Gagan Goel (Inactive) [ 2021-11-04 ]

For QA:

To test Point 1:
Start mariadb with columnstore_cache_inserts set to OFF. Then create a table, perform some INSERTs. After this, restart the server with columnstore_cache_inserts set to ON. Make sure the table created when the cache inserts were disabled is now accessible.

To test Point 3:
Launch a 3-node MCS cluster as normal, but make sure the cache is enabled on the primary and the replica nodes using the conf file. Create a table and perform some inserts into the primary. Then perform SELECTs on this table on the primary as well as the replica nodes. You should see the correct number of records in the table. Before the fix, if a single record is inserted, it will be duplicated on the replica nodes as well, and when the cache is flushed on the replicas, the record will be inserted into the CS table. So in the end, you would see 3 copies of the same record (provided a SELECT is executed on the primary as well as both the replicas) instead of 1. This is now fixed.

Point 4 is already tested and closed in MCOL-4790.

Comment by Daniel Lee (Inactive) [ 2021-11-15 ]

Build tests: 6.2.1-1 (#3379)

The following issues has been fixed.

1. Once the feature is activated in configs, all previous columnstore tables can no longer be accessed. Once de-activated, they come back.

Comment by Daniel Lee (Inactive) [ 2021-11-15 ]

Build tests: 6.2.1-1 (#3379)

Timing tests

Insert cache ON vs OFF (1,000 rows)

1. With insert cache on, batch insert and LDI are over 200x and 4x faster, respectively

Insert tests with insert cache ON (400,000)

1. When columnstore_cache_use_import=on
      insert is about 3% faster
      LDI is about 33% faster
 
2. Single session vs 10 concurrent sesions, 10 concurrent sessions is faster
      insert is about 10% faster
      LDI is about 30% faster
 
3. While insert test is being executed, executing a "select count(*)"query,
   which would cause the insert cache to be flushed, from another session 
   every second would have a serious performance impact.
 
   Single session
      insert is about 152% slower
      LDI is about 33% slower
   
   10 sessionss
      insert is about 147% slower
      LDI is aout 19% slower

LDI vs cpimport

When Use_cpimport_for_batch_insert=on, LDI is 24% slower when insert cache is on
When Use_cpimport_for_batch_insert=off, LDI is 350% faster when insert cache is on
 
When insert cache is on
columnstore_use_import_for_batchinsert=on would increase LDI performance by 227%
Having Use_cpimport_for_batch_insert also on would increase LDI performance by 234%
 
Whether insert cache is on or off, cpimport is still many times faster than LDI for immporting data.

Detail timing information can be found in the following link
https://docs.google.com/spreadsheets/d/1V475H96YJ0pcZ4BoGT1JWLpLfcH5a2g8XP-bqpY9byI/edit?usp=sharing

Others

It has been verified that transaction rollback is not supported.
 
With autocommit=off, while inserting 400,000 rows in a single session,
a "select count(*)" query from another session showed intermediate 
inserted rows.  It seems that rows are committed as soon as they are inserted,
just like autocommit is always on.  Is that the reason why transaction
rollback is not supported?  If yes, we need to note that in the release note.

Generated at Thu Feb 08 02:52:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.