Details
-
New Feature
-
Status: Confirmed (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
- DELETE DML statements.
- bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.
In case when the number of empty records is significant it is a disk space and CPU time waste.
The project will deliver a functionality that allows to:
- analyze empty values/records percentage in an extent, file, partition, table.
- manually cleanup empty values to reduce disk space usage.
- automaticaly cleanup empty values using a background worker.
Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
1. vacuuming job is a triple `(mariadb client connection id that triggers vacuuming job, target partition, new partition)`
2. take another partition to cleanup and lock it for writes, so that SELECT queries can use it, UPDATE, DELETE must fail if they address the partition(extent type 1)
3. create a new partition with a type(extent type 2) that is not visible for SELECT,UPDATE,DELETE and cpimport except special cpimport mode with targeted partition load
4. save vacuuming job on disk or MDB
5. run `SELECT * from target_table WHERE idbpartition(any_col) = (target_partition);` and pipe its output stream into `cpimport target_schema target_table`. cpimport must have an additional parameter to send data into the target partition.
6. atomically remove target partition and enable a new partition, effectively swapping them
7. there must be a SQL query against information_schema.columnstore_extents to find out whether there is a failed vacuuming job
Exceptions processing.
At any moment manual vacuuming worker can crash or got killed. There must be an automation to recover from the failures.
Simplest version is to restart work on the partition from the begging removing results of the previous attempt. To do so one has to store target table, partition triple and a new partition triple.
Attachments
Issue Links
- includes
-
MCOL-6137 Research how EXTENTOUTOFSERVICE and EXTENTUNAVAILABLE extent status affects SELECT,UPDATE,DELETE
-
- Open
-
-
MCOL-6138 ExtentMap method to create a new partition for a table not visible to all operations except specific ones
-
- Open
-
-
MCOL-6139 Add new cpimport parameter to ingest data into specific partition
-
- Open
-
- is blocked by
-
MCOL-5969 Design w/ GSOC MCOL 4889
-
- Closed
-
- relates to
-
MCOL-4887 Background worker to automate on-disk data housekeeping
-
- Closed
-