[MCOL-987] LZ4 compression for on-disk columnar data Created: 2017-10-25  Updated: 2022-03-29  Resolved: 2021-07-07

Status: Closed
Project: MariaDB ColumnStore
Component/s: writeengine
Affects Version/s: 6.1.1
Fix Version/s: 6.1.1

Type: New Feature Priority: Major
Reporter: Andrew Hutchings (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 4
Labels: None

Issue Links:
Relates
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MCOL-4654 LZ4 compression support for `Compress... Sub-Task Closed Denis Khalikov  
Epic Link: ColumnStore Compression Improvements
Sprint: 2021-5, 2021-6, 2021-7, 2021-8, 2021-9

 Description   

According to different comparisions, e.g. here LZ4 might have:

  • better compression rate
  • better decompression speed
  • almost the same compression speed
    compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

The chunk size is an important parameter used to define how much worth of data is compressed in one go to store in the compressed columnar file. As of now it is set to 4MB that might be less apropriate for LZ4 so one should compare different compressed chunk size values.
In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket.

If LZ4 performs as well as expected(faster decompression, better compression, compression speed parity) it will become our default.



 Comments   
Comment by Costin Stefan [ 2019-03-12 ]

Are there any chances to implement this feature in near future?

We are evaluating MariaDB ColumnStore engine and found the compression ratio of the snappy algorithm a "no go".
For our data, the compression ratio is less than two (1.5 to 1.3).

Thank you in advance,

Costin

Comment by Denis Khalikov [ 2021-03-31 ]

First pull request is added for review https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1837
It modifies current compression interface to be able to add new compression algo.
The next one is LZ4 itself.

Comment by Denis Khalikov [ 2021-04-06 ]

Final version on review: https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1842
Also added tests to compare the compression ratio `snappy` vs `lz4` for different input data.

On top of this patch added a commit https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1842/commits/5d3d766e4bc55cc018632178cb4cf1256f47842d which set LZ4 as default to trigger all tests suite under LZ4 compresison, this commit should be removed before merging

Comment by Roman [ 2021-07-07 ]

4QA JFYI the output of call columnstore_info.compression_ratio() has changed.

Comment by Daniel Lee (Inactive) [ 2021-07-07 ]

Build verified: 6.6.1 (#2742)

MariaDB [mytest]> call columnstore_info.compression_ratio();
-------------------------------------+

compression_method compression_ratio

-------------------------------------+

Snappy 2.2256:1
LZ4 0.5288:1

-------------------------------------+

Also tested default value, as well as setting new value, for the columnstore_compression_type variable.

Generated at Thu Feb 08 02:25:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.