[MCOL-5041] Table data bloat EMPTY records removal tool Created: 2022-04-06  Updated: 2022-10-14

Status: Open
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.2.5, 5.6.5, 6.3.1
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: andreas eschbacher Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: optimize_table
Environment:

Red Hat Enterprise Linux Server release 7.6
Mariadb 10.3 CS 1.2.5



 Description   

Hello,

is it somehow possible to maintain bloat tables?
One of our Servers needed 1,8T for data, after we reinstalled mariadb and after a fresh ETL the datasize is about 350Gb.
So we assume that the database is bloat somehow over time.
mysqlcheck --optimize is not working with Engine ColumnStore.
ty in advance.
br Andreas



 Comments   
Comment by Roman [ 2022-06-05 ]

Greetings aeae81.
In our columnar format EMPTY records are represented by a dedicated number(depends on the underlying data type). When one deletes a record EMPTY records replaces actual values to tell the record has been removed. However these records are not reused by cpimport or INSERT..SELECT only by INSERT so in most cases they are waste disk space.
No, we don't have bloat removal tool yet but we have for the tool in mind. The only way ATM is to re-ingest the data.

Comment by andreas eschbacher [ 2022-06-10 ]

would https://jira.mariadb.org/browse/MCOL-5021 solve the issue with bloated tables?

Comment by David Hall (Inactive) [ 2022-10-13 ]

Re: MCOL-5021. Unfortunately No. This change is to speed up the process of DELETE operations, but it doesn't address the file bloat issue.

There is a statement to remove entire partitions. But that is tricky as you would have to know that all data in the partition is no longer needed. Generally, if you are deleting random parts of the table, this is very difficult to ascertain.

Comment by andreas eschbacher [ 2022-10-14 ]

HI David,

We found an acceptable solution,
we created a script that finds bloated tables and creates them again using insert into select.

br andreas

Generated at Thu Feb 08 02:54:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.