[MCOL-4889] Manual and automated vacuum cleaning for on-disk data empty records Created: 2021-10-05  Updated: 2022-04-06  Resolved: 2022-04-06

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc, writeengine
Affects Version/s: None
Fix Version/s: Icebox

Type: New Feature Priority: Major
Reporter: Roman Assignee: Todd Stoffel (Inactive)
Resolution: Won't Do Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-4887 Background worker to automate on-disk... Closed

 Description   

MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:

  1. DELETE DML statements.
  2. bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

In case when the number of empty records is significant it is a disk space and CPU time waste.

The project will deliver a functionality that allows to:

  • analyze empty values/records percentage in an extent, file, partition, table.
  • manually cleanup empty values to reduce disk space usage.
  • automaticaly cleanup empty values using a background worker.

Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:

  • take another partition to cleanup
    • take another extent in the partition choosen; lock it and store its original HWM;
      • open a segment file with the extent with a min(col width) in the partition
      • take a pointer to the last used value in the extent
      • while !eof of the min width extent
        • find another empty value in the extent
        • save block in Version Buffer if not yet
        • replace empty with the pointer
        • update pointer
        • reduce HWM droping empty values

See FSM diagram for details.

TBD Exceptions processing


Generated at Thu Feb 08 02:53:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.