Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5021

Implement an auxiliary (hidden) column to improve DELETE performance.

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 22.08.1
    • None
    • None
    • 2021-17

    Description

      This task is to implement an auxiliary (AUX) column in ColumnStore to improve the performance of DELETE operations.

      This will be accomplished by implementing a hidden 1-byte boolean column with 0 indicating the column value is disabled (deleted) and 1 indicating the column value is enabled (active). Idea is to improve the performance of DELETE operation by simply toggling the AUX column value from 1 to 0, instead of the current implementation which writes empty magic values for all columns in the table for all the impacted rows. Empty magics are the current method in ColumnStore to detect if a given row is deleted or not. So for a wide table, the current method would lead to significant performance slowdown when performing DELETEs.

      Points to note:
      -New tables created in the CS release with this feature are subject to DELETE optimization.
      -Existing table w/o the aux column is not affected (no DELETE optimization). MCOL-5122 is created as an enhancement of this ticket to allow a user to create the AUX column for an existing table which was created with an older version of ColumnStore without the AUX column feature.

      Design Document for the feature:

      https://docs.google.com/document/d/1OAWYxlsfgMwPoeDtY6wZ9JcoCubFyodQ63cCZ9NjLCI/edit?usp=sharing

      Attachments

        Issue Links

          Activity

            Rather than writing to use uint_t value as 1, treat the aux as a bitmap and use one of the bits. This allows for easier expansion in the future if needed.

            David.Hall David Hall (Inactive) added a comment - Rather than writing to use uint_t value as 1, treat the aux as a bitmap and use one of the bits. This allows for easier expansion in the future if needed.

            toddstoffel Do we document a generic upgrade script that users have to run after they upgrade binary ?

            alexey.vorovich alexey vorovich (Inactive) added a comment - toddstoffel Do we document a generic upgrade script that users have to run after they upgrade binary ?

            toddstoffel We discussed two options
            1. document the following command that customer must to run during upgrade
            ALTER TABLE calpontsys.systable ADD COLUMN (auxcolumnoid INT NOT NULL DEFAULT 0);
            2. if we already have a script that must be run during upgrades , then we can add the command above to it. From your response it seems that we have no such script. If you confirm , then option 1 is way to go

            alexey.vorovich alexey vorovich (Inactive) added a comment - toddstoffel We discussed two options 1. document the following command that customer must to run during upgrade ALTER TABLE calpontsys.systable ADD COLUMN (auxcolumnoid INT NOT NULL DEFAULT 0); 2. if we already have a script that must be run during upgrades , then we can add the command above to it. From your response it seems that we have no such script. If you confirm , then option 1 is way to go

            FastDelete = false => min/max are updated and state is valid
            FastDelete = true => min/max are not set and state is invalid
            Before this feature, expect min/max to be updated and state to be valid (assuming it was valid before the delete). This is a from MCOL-2044 in 6.1.1
            Prior to 6.1.1, expect min/max to be not set and state = invalid.
            The purpose of this flag is to allow the user to choose which behavior they would like.

            David.Hall David Hall (Inactive) added a comment - FastDelete = false => min/max are updated and state is valid FastDelete = true => min/max are not set and state is invalid Before this feature, expect min/max to be updated and state to be valid (assuming it was valid before the delete). This is a from MCOL-2044 in 6.1.1 Prior to 6.1.1, expect min/max to be not set and state = invalid. The purpose of this flag is to allow the user to choose which behavior they would like.

            A minor update:

            To enable the FastDelete option, the .xml file should have a value 'y' or 'Y', instead of 'true':

                    <WriteEngine>
                            <BulkRoot>/var/lib/columnstore/data/bulk</BulkRoot>
                            <BulkRollbackDir>/var/lib/columnstore/data1/systemFiles/bulkRollback</BulkRollbackDir>
                            <MaxFileSystemDiskUsagePct>98</MaxFileSystemDiskUsagePct>
                            <CompressedPaddingBlocks>1</CompressedPaddingBlocks> <!-- Number of blocks used to pad compressed chunks -->
                            <FastDelete>y</FastDelete>
                    </WriteEngine>
            

            tntnatbry Gagan Goel (Inactive) added a comment - A minor update: To enable the FastDelete option, the .xml file should have a value 'y' or 'Y', instead of 'true': < WriteEngine > < BulkRoot >/var/lib/columnstore/data/bulk</ BulkRoot > < BulkRollbackDir >/var/lib/columnstore/data1/systemFiles/bulkRollback</ BulkRollbackDir > < MaxFileSystemDiskUsagePct >98</ MaxFileSystemDiskUsagePct > < CompressedPaddingBlocks >1</ CompressedPaddingBlocks > <!-- Number of blocks used to pad compressed chunks --> < FastDelete >y</ FastDelete > </ WriteEngine >

            Build verified: 22.08-1 (#5312)

            Feature implemented. Verified:

            system catalog
            delete functionality
            Performance tests
            MTR test suites

            Closing this ticket. New tickets will be opened if issues are later.

            dleeyh Daniel Lee (Inactive) added a comment - Build verified: 22.08-1 (#5312) Feature implemented. Verified: system catalog delete functionality Performance tests MTR test suites Closing this ticket. New tickets will be opened if issues are later.

            Below are the final numbers on DELETE performance tests:

            GCP VM:

            16vcpus, 64GB memory, 128GB HDD
            Dataset = 1,000,000 rows
            Results are averaged over 10 runs.
            OS: Ubuntu20.

            Local Machine:

            8vcpus, 24GB memory, 500GB SSD
            Dataset = 1,000,000 rows
            Results are averaged over 10 runs.
            OS: Ubuntu20

            tntnatbry Gagan Goel (Inactive) added a comment - Below are the final numbers on DELETE performance tests: GCP VM: 16vcpus, 64GB memory, 128GB HDD Dataset = 1,000,000 rows Results are averaged over 10 runs. OS: Ubuntu20. Local Machine: 8vcpus, 24GB memory, 500GB SSD Dataset = 1,000,000 rows Results are averaged over 10 runs. OS: Ubuntu20

            jacob.moorman GeoffMontee

            this is a significant performance improvement . You do make a reference to this in release notes.
            Should you also mention the actual numbers? 2x to 30x ..

            alexey.vorovich alexey vorovich (Inactive) added a comment - jacob.moorman GeoffMontee this is a significant performance improvement . You do make a reference to this in release notes. Should you also mention the actual numbers? 2x to 30x ..

            People

              tntnatbry Gagan Goel (Inactive)
              tntnatbry Gagan Goel (Inactive)
              Roman Roman
              Daniel Lee Daniel Lee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.