[MCOL-5021] Implement an auxiliary (hidden) column to improve DELETE performance. Created: 2022-03-21  Updated: 2023-09-06  Resolved: 2022-08-17

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 22.08.1

Type: New Feature Priority: Major
Reporter: Gagan Goel (Inactive) Assignee: Gagan Goel (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File DELETE_Performance_Test_GCP.png     PNG File DELETE_Performance_Test_Local_Machine.png     File mcol5021-dataset.tgz     File mcol5021-ddls.tgz    
Issue Links:
Problem/Incident
causes MCOL-4775 DMLproc is unable to complete a rollb... Closed
Relates
relates to MCOL-3574 EMPTY column for persistant storage Closed
relates to MCOL-5122 Enable the AUX column (DELETE optimiz... Open
Sprint: 2021-17
Assigned for Review: Roman Roman
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

This task is to implement an auxiliary (AUX) column in ColumnStore to improve the performance of DELETE operations.

This will be accomplished by implementing a hidden 1-byte boolean column with 0 indicating the column value is disabled (deleted) and 1 indicating the column value is enabled (active). Idea is to improve the performance of DELETE operation by simply toggling the AUX column value from 1 to 0, instead of the current implementation which writes empty magic values for all columns in the table for all the impacted rows. Empty magics are the current method in ColumnStore to detect if a given row is deleted or not. So for a wide table, the current method would lead to significant performance slowdown when performing DELETEs.

Points to note:
-New tables created in the CS release with this feature are subject to DELETE optimization.
-Existing table w/o the aux column is not affected (no DELETE optimization). MCOL-5122 is created as an enhancement of this ticket to allow a user to create the AUX column for an existing table which was created with an older version of ColumnStore without the AUX column feature.

Design Document for the feature:

https://docs.google.com/document/d/1OAWYxlsfgMwPoeDtY6wZ9JcoCubFyodQ63cCZ9NjLCI/edit?usp=sharing



 Comments   
Comment by David Hall (Inactive) [ 2022-03-25 ]

Rather than writing to use uint_t value as 1, treat the aux as a bitmap and use one of the bits. This allows for easier expansion in the future if needed.

Comment by alexey vorovich (Inactive) [ 2022-06-07 ]

toddstoffel Do we document a generic upgrade script that users have to run after they upgrade binary ?

Comment by alexey vorovich (Inactive) [ 2022-06-07 ]

toddstoffel We discussed two options
1. document the following command that customer must to run during upgrade
ALTER TABLE calpontsys.systable ADD COLUMN (auxcolumnoid INT NOT NULL DEFAULT 0);
2. if we already have a script that must be run during upgrades , then we can add the command above to it. From your response it seems that we have no such script. If you confirm , then option 1 is way to go

Comment by David Hall (Inactive) [ 2022-06-17 ]

FastDelete = false => min/max are updated and state is valid
FastDelete = true => min/max are not set and state is invalid
Before this feature, expect min/max to be updated and state to be valid (assuming it was valid before the delete). This is a from MCOL-2044 in 6.1.1
Prior to 6.1.1, expect min/max to be not set and state = invalid.
The purpose of this flag is to allow the user to choose which behavior they would like.

Comment by Gagan Goel (Inactive) [ 2022-08-04 ]

A minor update:

To enable the FastDelete option, the .xml file should have a value 'y' or 'Y', instead of 'true':

        <WriteEngine>
                <BulkRoot>/var/lib/columnstore/data/bulk</BulkRoot>
                <BulkRollbackDir>/var/lib/columnstore/data1/systemFiles/bulkRollback</BulkRollbackDir>
                <MaxFileSystemDiskUsagePct>98</MaxFileSystemDiskUsagePct>
                <CompressedPaddingBlocks>1</CompressedPaddingBlocks> <!-- Number of blocks used to pad compressed chunks -->
                <FastDelete>y</FastDelete>
        </WriteEngine>

Comment by Daniel Lee (Inactive) [ 2022-08-17 ]

Build verified: 22.08-1 (#5312)

Feature implemented. Verified:

system catalog
delete functionality
Performance tests
MTR test suites

Closing this ticket. New tickets will be opened if issues are later.

Comment by Gagan Goel (Inactive) [ 2022-08-23 ]

Below are the final numbers on DELETE performance tests:

GCP VM:

16vcpus, 64GB memory, 128GB HDD
Dataset = 1,000,000 rows
Results are averaged over 10 runs.
OS: Ubuntu20.

Local Machine:

8vcpus, 24GB memory, 500GB SSD
Dataset = 1,000,000 rows
Results are averaged over 10 runs.
OS: Ubuntu20

Comment by alexey vorovich (Inactive) [ 2022-09-15 ]

jacob.moorman GeoffMontee

this is a significant performance improvement . You do make a reference to this in release notes.
Should you also mention the actual numbers? 2x to 30x ..

Generated at Thu Feb 08 02:54:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.