[MCOL-4769] Complete ColumnStore Insert Cache Part 1 - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.2.1
Component/s: MDB Plugin
Labels:
None

Sprint:
2021-9, 2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15, 2021-16, 2021-17

Description

The objective of this task is to advance ColumnStore write cache - as defined in MCOL-3875.

First phase of it was limited to supporting HTAP replication only. This task is extending it into more general status: INSERTS and LDI.

Known problems to be solved under priority:

1. Once the feature is activated in configs, all previous columnstore tables can no longer be accessed. Once de-activated, they come back.
~~2. Once the feature is activated, and a table is created and populated, it can be dropped, but then cannot be re-created.~~ (this is confirmed as not happening anymore, not a problem).
3. In case of multi-node CS cluster set up for DDL replication from primary to the rest, when the cache is flushed, it appears that the data replication occurs also (in addition to DDL). It is supposed to be suppressed in a normal CS cluster operation.
4. Under heavier concurrent bombardment of LDIs, some sessions go into "waiting for table lock" state, which in some cases clears after a few hours (unclear why, perhaps some sort of timeout), and some remain in that state forever. Waiting for the lock is understandable - cache is being locked for the duration of a flush. But even if flush takes a few seconds, why would some sessions lock out for hours, and some forever? (workaround - ~~MCOL-4790~~, use columnstore_cache_use_import)

Basics of the algorithms (full description in 3875):

inserts are accumulated in cache, transactions serialized by cache locks
Rollbacks will roll back cache
Accumulated inserts are flushed into columnstore on various events (all non inserts, including SELECT - and settable size threshold defaulted to 500,000 rows in the cache).
- we may want to consider relaxing the above rule and letting SELECTS run as "dirty reads"

Testing instructions

Part 1. Positive tests.

AUTOCOMMIT=ON, do a lot of individual INSERTs. With cache_insert and cpimport_cache_insert the speed should be orders of magnitude greater.
Do same inserts but this time within multiple transactions - have something like 3000-4000 inserts in each transaction.
Do lots of LDIs (also 3000-4000 rows each) in rapid loop (see test scripts in ~~MCOL-4790~~ as an example). Should be a huge difference in speed.
Do above from multiple concurrent sessions - the speed improvements should be even greater.

Part 2. Adversarial tests.

While running AUTOCOMMIT=ON and inserts, also do occasional updates, deletes, and even selects. These reduce the effectiveness of the feature (cache has to be flushed every time these happen). The feature is not useful for workflows if they do not have a predominance of uninterrupted inserts or LDIs.
Put some updates, deletes or selects inside transactions which do inserts (say, one of those for each 500 inserts). compare the speed.
Run transactions with inserts from multiple sessions, but rollback some and not others. Make sure the end result data is correct.

Attachments

Issue Links

blocks

MCOL-3875 Columnstore write cache

Stalled

includes

MCOL-4790 LDIF & cache_inserts - table lock stuck - unstable - crashes aria table on restart

Closed

is caused by

MCOL-4925 Insert Cache - warning messages when dropping a non-insert-cache table

Closed

relates to

MCOL-4805 can't start PM2/PM3 node after signal 11

Closed

MCOL-4914 Complete ColumnStore Cache Part 2

Open

Activity

People

Assignee:: David Hall (Inactive)

Reporter:: Gregory Dorman (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2021-06-22 13:23

Updated:: 2024-10-03 15:52

Resolved:: 2022-03-04 19:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.