Details
-
New Feature
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
6.1.1
-
None
-
2021-5, 2021-6, 2021-7, 2021-8, 2021-9
Description
According to different comparisions, e.g. here LZ4 might have:
- better compression rate
- better decompression speed
- almost the same compression speed
compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.
The chunk size is an important parameter used to define how much worth of data is compressed in one go to store in the compressed columnar file. As of now it is set to 4MB that might be less apropriate for LZ4 so one should compare different compressed chunk size values.
In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket.
If LZ4 performs as well as expected(faster decompression, better compression, compression speed parity) it will become our default.
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Assignee | Todd Stoffel [ toddstoffel ] |
Fix Version/s | N/A [ 22302 ] |
Rank | Ranked higher |
Fix Version/s | 1.7 [ 23713 ] | |
Fix Version/s | N/A [ 22302 ] |
Fix Version/s | 1.5 [ 22800 ] | |
Fix Version/s | 1.7 [ 23713 ] |
Rank | Ranked higher |
Epic Link |
|
Rank | Ranked higher |
Fix Version/s | 1.6 [ 23712 ] | |
Fix Version/s | 1.5 [ 22800 ] |
Fix Version/s | Icebox [ 22302 ] | |
Fix Version/s | 1.6 [ 23712 ] |
Rank | Ranked higher |
Rank | Ranked higher |
Rank | Ranked higher |
Rank | Ranked higher |
Rank | Ranked higher |
Rank | Ranked lower |
Rank | Ranked lower |
Component/s | writeengine [ 13510 ] |
Affects Version/s | 6.1.1 [ 25600 ] |
Fix Version/s | 6.1.1 [ 25600 ] | |
Fix Version/s | Icebox [ 22302 ] |
Assignee | Todd Stoffel [ toddstoffel ] | Roman [ drrtuy ] |
Summary | New additional compression algorithms for ColumnStore | LZ4 compression for on-disk columnar data |
Description |
We should look into adding additional compression algorithms to ColumnStore for additional use cases:
1. LZ4 - should be faster than Snappy with a better compression ratio 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedISS(TCP socket) implementation. In the end MCS must have another compression method that is controlled via https://jira.mariadb.org/browse/MCOL-987https://jira.mariadb.org/browse/MCOL-987 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
Description |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedISS(TCP socket) implementation. In the end MCS must have another compression method that is controlled via https://jira.mariadb.org/browse/MCOL-987https://jira.mariadb.org/browse/MCOL-987 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedISS(TCP socket) implementation. In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
Description |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedISS(TCP socket) implementation. In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation. In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
Description |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation. In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation. In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket. 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
Description |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation. In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket. 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation. There must be a comparison In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket. 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
Description |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation. There must be a comparison In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket. 2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used. If LZ4 performs as well as expected it could possibly become our default. |
According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
* better compression rate * better decompression speed * almost the same compression speed compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation. The chunk size is an important parameter used to define how much worth of data is compressed in one go to store in the compressed columnar file. As of now it is set to 4MB that might be less apropriate for LZ4 so one should compare different compressed chunk size values. In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket. If LZ4 performs as well as expected(faster decompression, better compression, compression speed parity) it will become our default. |
Assignee | Roman [ drrtuy ] | Denis Khalikov [ JIRAUSER48434 ] |
Sprint | 2021-5 [ 504 ] |
Rank | Ranked higher |
Rank | Ranked lower |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link | This issue relates to MENT-1167 [ MENT-1167 ] |
Sprint | 2021-5 [ 504 ] | 2021-5, 2021-6 [ 504, 509 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Assignee | Denis Khalikov [ JIRAUSER48434 ] | Roman [ drrtuy ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Roman [ drrtuy ] | Gregory Dorman [ gdorman ] |
Assignee | Gregory Dorman [ gdorman ] | Roman [ drrtuy ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Sprint | 2021-5, 2021-6 [ 504, 509 ] | 2021-5, 2021-6, 2021-7 [ 504, 509, 514 ] |
Sprint | 2021-5, 2021-6, 2021-7 [ 504, 509, 514 ] | 2021-5, 2021-6, 2021-7, 2021-8 [ 504, 509, 514, 521 ] |
Sprint | 2021-5, 2021-6, 2021-7, 2021-8 [ 504, 509, 514, 521 ] | 2021-5, 2021-6, 2021-7, 2021-8, 2021-9 [ 504, 509, 514, 521, 541 ] |
Status | In Review [ 10002 ] | In Testing [ 10301 ] |
Assignee | Roman [ drrtuy ] | Daniel Lee [ dleeyh ] |
Resolution | Fixed [ 1 ] | |
Status | In Testing [ 10301 ] | Closed [ 6 ] |