Manual and automated vacuum cleaning for on-disk data empty records




      MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:

      1. DELETE DML statements.
      2. bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

      In case when the number of empty records is significant it is a disk space and CPU time waste.

      The project will deliver a functionality that allows to:

      • analyze empty values/records percentage in an extent, file, partition, table.
      • manually cleanup empty values to reduce disk space usage.
      • automaticaly cleanup empty values using a background worker.

      Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:

      • take another partition to cleanup
        • take another extent in the partition choosen; lock it and store its original HWM;
          • open a segment file with the extent with a min(col width) in the partition
          • take a pointer to the last used value in the extent
          • while !eof of the min width extent
            • find another empty value in the extent
            • save block in Version Buffer if not yet
            • replace empty with the pointer
            • update pointer
            • reduce HWM droping empty values

      See FSM diagram for details.

      TBD Exceptions processing


