Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4889

Manual and automated vacuum cleaning for on-disk data empty records

Details

    Description

      MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:

      1. DELETE DML statements.
      2. bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

      In case when the number of empty records is significant it is a disk space and CPU time waste.

      The project will deliver a functionality that allows to:

      • analyze empty values/records percentage in an extent, file, partition, table.
      • manually cleanup empty values to reduce disk space usage.
      • automaticaly cleanup empty values using a background worker.

      Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:

      • take another partition to cleanup
        • take another extent in the partition choosen; lock it and store its original HWM;
          • open a segment file with the extent with a min(col width) in the partition
          • take a pointer to the last used value in the extent
          • while !eof of the min width extent
            • find another empty value in the extent
            • save block in Version Buffer if not yet
            • replace empty with the pointer
            • update pointer
            • reduce HWM droping empty values

      See FSM diagram for details.

      TBD Exceptions processing

      Attachments

        Issue Links

          Activity

            drrtuy Roman created issue -
            drrtuy Roman made changes -
            Field Original Value New Value
            drrtuy Roman made changes -
            Description MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
            # DELETE DML statements.
            # bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

            In case when the number of empty records is significant it is a disk space and CPU time waste.

            The project will deliver a functionality that allows to:
            - analyze empty values/records percentage in an extent, file, partition, table.
            - manually cleanup empty values to reduce disk space usage.
            - automaticaly cleanup empty values using a background worker.

            Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
            # take another partition to cleanup
             # take another extent in the partition choosen; lock it and store its original HWM;
             # open a segment file with the extent with a min(col width) in the partition
             # take a pointer to the last used value in the extent
             # while !eof of the min width extent
              # find another empty value in the extent
              # save block in Version Buffer if not yet
              # replace empty with the pointer
              # update pointer
             # reduce HWM droping empty values

            See FSM diagram for more details.

            TBD Exceptions processing

            MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
            # DELETE DML statements.
            # bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

            In case when the number of empty records is significant it is a disk space and CPU time waste.

            The project will deliver a functionality that allows to:
            - analyze empty values/records percentage in an extent, file, partition, table.
            - manually cleanup empty values to reduce disk space usage.
            - automaticaly cleanup empty values using a background worker.

            Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
            # take another partition to cleanup
             ## take another extent in the partition choosen; lock it and store its original HWM;
             # open a segment file with the extent with a min(col width) in the partition
             # take a pointer to the last used value in the extent
             # while !eof of the min width extent
              # find another empty value in the extent
              # save block in Version Buffer if not yet
              # replace empty with the pointer
              # update pointer
             # reduce HWM droping empty values

            See FSM diagram for more details.

            TBD Exceptions processing

            drrtuy Roman made changes -
            Description MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
            # DELETE DML statements.
            # bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

            In case when the number of empty records is significant it is a disk space and CPU time waste.

            The project will deliver a functionality that allows to:
            - analyze empty values/records percentage in an extent, file, partition, table.
            - manually cleanup empty values to reduce disk space usage.
            - automaticaly cleanup empty values using a background worker.

            Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
            # take another partition to cleanup
             ## take another extent in the partition choosen; lock it and store its original HWM;
             # open a segment file with the extent with a min(col width) in the partition
             # take a pointer to the last used value in the extent
             # while !eof of the min width extent
              # find another empty value in the extent
              # save block in Version Buffer if not yet
              # replace empty with the pointer
              # update pointer
             # reduce HWM droping empty values

            See FSM diagram for more details.

            TBD Exceptions processing

            MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
            # DELETE DML statements.
            # bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

            In case when the number of empty records is significant it is a disk space and CPU time waste.

            The project will deliver a functionality that allows to:
            - analyze empty values/records percentage in an extent, file, partition, table.
            - manually cleanup empty values to reduce disk space usage.
            - automaticaly cleanup empty values using a background worker.

            Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
            - take another partition to cleanup
            -- take another extent in the partition choosen; lock it and store its original HWM;
             # open a segment file with the extent with a min(col width) in the partition
             # take a pointer to the last used value in the extent
             # while !eof of the min width extent
              # find another empty value in the extent
              # save block in Version Buffer if not yet
              # replace empty with the pointer
              # update pointer
             # reduce HWM droping empty values

            See FSM diagram for more details.

            TBD Exceptions processing

            drrtuy Roman made changes -
            Description MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
            # DELETE DML statements.
            # bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

            In case when the number of empty records is significant it is a disk space and CPU time waste.

            The project will deliver a functionality that allows to:
            - analyze empty values/records percentage in an extent, file, partition, table.
            - manually cleanup empty values to reduce disk space usage.
            - automaticaly cleanup empty values using a background worker.

            Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
            - take another partition to cleanup
            -- take another extent in the partition choosen; lock it and store its original HWM;
             # open a segment file with the extent with a min(col width) in the partition
             # take a pointer to the last used value in the extent
             # while !eof of the min width extent
              # find another empty value in the extent
              # save block in Version Buffer if not yet
              # replace empty with the pointer
              # update pointer
             # reduce HWM droping empty values

            See FSM diagram for more details.

            TBD Exceptions processing

            MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
            # DELETE DML statements.
            # bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

            In case when the number of empty records is significant it is a disk space and CPU time waste.

            The project will deliver a functionality that allows to:
            - analyze empty values/records percentage in an extent, file, partition, table.
            - manually cleanup empty values to reduce disk space usage.
            - automaticaly cleanup empty values using a background worker.

            Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
            - take another partition to cleanup
            --take another extent in the partition choosen; lock it and store its original HWM;
            ---open a segment file with the extent with a min(col width) in the partition
            ---take a pointer to the last used value in the extent
            ---while !eof of the min width extent
            ----find another empty value in the extent
            ----save block in Version Buffer if not yet
            ----replace empty with the pointer
            ----update pointer
            ----reduce HWM droping empty values

            See FSM diagram for details.

            TBD Exceptions processing

            drrtuy Roman made changes -
            Description MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
            # DELETE DML statements.
            # bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

            In case when the number of empty records is significant it is a disk space and CPU time waste.

            The project will deliver a functionality that allows to:
            - analyze empty values/records percentage in an extent, file, partition, table.
            - manually cleanup empty values to reduce disk space usage.
            - automaticaly cleanup empty values using a background worker.

            Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
            - take another partition to cleanup
            --take another extent in the partition choosen; lock it and store its original HWM;
            ---open a segment file with the extent with a min(col width) in the partition
            ---take a pointer to the last used value in the extent
            ---while !eof of the min width extent
            ----find another empty value in the extent
            ----save block in Version Buffer if not yet
            ----replace empty with the pointer
            ----update pointer
            ----reduce HWM droping empty values

            See FSM diagram for details.

            TBD Exceptions processing

            MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values corresponds with empty record if treat the table data in row orientation. The empty records are the results of:
            # DELETE DML statements.
            # bulk insertion operations that leverage cpimport explicitly or implicitly, e.g. INSERT..SELECT uses cpimport to ingest the data internally.

            In case when the number of empty records is significant it is a disk space and CPU time waste.

            The project will deliver a functionality that allows to:
            - analyze empty values/records percentage in an extent, file, partition, table.
            - manually cleanup empty values to reduce disk space usage.
            - automaticaly cleanup empty values using a background worker.

            Here is the initial scenario that automated background worker should follow to automatically clean-up empty records in a partition:
            - take another partition to cleanup
            -- take another extent in the partition choosen; lock it and store its original HWM;
            --- open a segment file with the extent with a min(col width) in the partition
            --- take a pointer to the last used value in the extent
            --- while !eof of the min width extent
            ---- find another empty value in the extent
            ---- save block in Version Buffer if not yet
            ---- replace empty with the pointer
            ---- update pointer
            ---- reduce HWM droping empty values

            See FSM diagram for details.

            TBD Exceptions processing

            toddstoffel Todd Stoffel (Inactive) made changes -
            Rank Ranked higher
            toddstoffel Todd Stoffel (Inactive) made changes -
            Rank Ranked higher
            toddstoffel Todd Stoffel (Inactive) made changes -
            Assignee Todd Stoffel [ toddstoffel ]
            toddstoffel Todd Stoffel (Inactive) made changes -
            Fix Version/s Icebox [ 22302 ]
            drrtuy Roman made changes -
            Resolution Won't Do [ 10201 ]
            Status Open [ 1 ] Closed [ 6 ]
            drrtuy Roman made changes -
            Assignee Todd Stoffel [ toddstoffel ] Roman [ drrtuy ]
            Resolution Won't Do [ 10201 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            drrtuy Roman made changes -
            Status Stalled [ 10000 ] Confirmed [ 10101 ]
            drrtuy Roman made changes -
            Labels gsoc24
            drrtuy Roman made changes -
            Labels gsoc24 gsoc24 gsoc25
            allen.herrera Allen Herrera made changes -

            People

              drrtuy Roman
              drrtuy Roman
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.