Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4889

Manual and automated vacuum cleaning for on-disk data empty records

    XMLWordPrintable

Details

    Description

      MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:

      1. DELETE DML statements.
      2. bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

      In case when the number of empty records is significant it is a disk space and CPU time waste.

      The project will deliver a functionality that allows to:

      • analyze empty values/records percentage in an extent, file, partition, table.
      • manually cleanup empty values to reduce disk space usage.
      • automaticaly cleanup empty values using a background worker.

      Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
      1. vacuuming job is a triple `(mariadb client connection id that triggers vacuuming job, target partition, new partition)`
      2. take another partition to cleanup and lock it for writes, so that SELECT queries can use it, UPDATE, DELETE must fail if they address the partition(extent type 1)
      3. create a new partition with a type(extent type 2) that is not visible for SELECT,UPDATE,DELETE and cpimport except special cpimport mode with targeted partition load
      4. save vacuuming job on disk or MDB
      5. run `SELECT * from target_table WHERE idbpartition(any_col) = (target_partition);` and pipe its output stream into `cpimport target_schema target_table`. cpimport must have an additional parameter to send data into the target partition.
      6. atomically remove target partition and enable a new partition, effectively swapping them
      7. there must be a SQL query against information_schema.columnstore_extents to find out whether there is a failed vacuuming job

      Exceptions processing.
      At any moment manual vacuuming worker can crash or got killed. There must be an automation to recover from the failures.
      Simplest version is to restart work on the partition from the begging removing results of the previous attempt. To do so one has to store target table, partition triple and a new partition triple.

      Attachments

        Issue Links

          Activity

            People

              drrtuy Roman
              drrtuy Roman
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.