[MDEV-11371] Big column compressed - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.3.2
Component/s: Storage Engine - InnoDB
Labels:
- contribution
- foundation

Sprint:
10.3.1-2

Description

Storage engine independent support for column compression.

TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT,
VARCHAR and VARBINARY columns can be compressed.

New COMPRESSED column attribute added:
COMPRESSED[=<compression_method>]

The only supported method currently is zlib. It is not possible to create index over compressed column.
CSV storage engine stores compressed field data uncompressed on disk.
Binary log stores compressed field data compressed on disk.

System variables added:
column_compression_threshold - Minimum column data length eligible for compression.
column_compression_zlib_level - zlib compression level (1 gives best speed, 9 gives best compression).
column_compression_zlib_strategy - The strategy parameter is used to tune the compression algorithm. Use the value DEFAULT_STRATEGY for normal data, FILTERED for data produced by a filter (or predictor), HUFFMAN_ONLY to force Huffman encoding only (no string match), or RLE to limit match distances to one (run-length encoding). Filtered data consists mostly of small values with a somewhat random distribution. In this case, the compression algorithm is tuned to compress them better. The effect of FILTERED is to force more Huffman coding and less string matching; it is somewhat intermediate between DEFAULT_STRATEGY and HUFFMAN_ONLY. RLE is designed to be almost as fast as HUFFMAN_ONLY, but give better compression for PNG image data. The strategy parameter only affects the compression ratio but not the correctness of the compressed output even if it is not set appropriately. FIXED prevents the use of dynamic Huffman codes, allowing for a simpler decoder for special applications.
column_compression_zlib_wrap - Generate zlib header and trailer and compute adler32 check value. It can be used with storage engines that don't provide data integrity verification to detect data corruption.

Status variables added:
Column_compressions - incremented every time field data is compressed.
Column_decompressions - incremented every time field data is decompressed.

Attachments

Issue Links

causes

MDEV-13857 Use the 10.2 libmariadb in 10.3

Closed

MDEV-24797 Column Compression - ERROR 1265 (01000): Data truncated for column

Closed

is blocked by

MDEV-13540 Server crashes in copy or Assertion `0' failed in virtual Field* Field_varstring_compressed::new_key_field

Closed

MDEV-13541 Server crashes in next_breadth_first_tab or Assertion `0' failed in Field_varstring_compressed::new_key_field

Closed

relates to

MDEV-13378 Mysqld crashes with COMPRESSED data types, when using partitions

Closed

MDEV-11381 AliSQL: [Feature] Issue#30 SUPPORT BIG COLUMN COMPRESS

Closed

MDEV-13359 Enable online ALTER TABLE for compressed columns

Closed

MDEV-13539 Latest revision of bb-10.3-svoj (big column compressed) does not compile

Closed

MDEV-13795 ALTER TABLE…DROP PRIMARY KEY, ADD PRIMARY KEY fails when VIRTUAL columns exist

Closed

(4 relates to)

Sub-Tasks

1.	Testing for MDEV-11371 (Big column compressed)		Closed	Alice Sherepa
2.	Document column compression introduced in 10.3.2		Closed	Ian Gilfillan

Activity

Ascending order - Click to sort in descending order

Sergey Vojtovich created issue - 2016-11-29 07:47

Sergey Vojtovich made changes - 2016-11-29 07:48

Field	Original Value	New Value
Description	Some big columns(blob/text/varchar/varbinary) waste a lot of space, we introduce “compressed” into column definition when create or alter a table. When a column was defined as a compressed column, the column data will be compressed using zlib (other compress algorithm not support yet). We could get a better compression ratio and performance, more flexibility (vs compressed row format) For example: Create table tcompress ( C1 int, C2 blob compressed, C3 text compressed, C4 text) engine = innodb We achieve this 'big columns compress' function by following step: Support 'compressed' syntax, and save this attribute in .frm file Store the 'compressed' attribute in innodb layer, we add DATA_IS_COMPRESSED flag in prtype If needed , we do compress in row_mysql_store_col_in_innobase_format and do decompress in row_sel_store_mysql_field Use a compress header to control how to compress/decompress. Compress Header is 1 Byte, 7 Bit: Always 1, mean compressed; 5-6 Bit: Compressed algorithm - Always 0, means zlib.It maybe support other compression algorithm in the future. 0-3 Bit: Bytes of "Record Original Length" Record Original Length: 1-4 Bytes*/ We add system global variable 'field_compress_min_len' was used to control that only compress the column if the data length exceeds 'field_compress_min_len'. Default 128. Also we add 3 error num: ER_FIELD_TYPE_NOT_ALLOWED_AS_COMPRESSED_FIELD: support text/blob/varchar/varbinary column has compress attribute only. ER_FIELD_CAN_NOT_COMPRESSED_AND_INDEX: column has compress attribute can not be an index ER_FIELD_CAN_NOT_COMPRESSED_IN_CURRENT_ENGINESS: column compress be supported in innodb only	Some big columns(blob/text/varchar/varbinary) waste a lot of space, we introduce “compressed” into column definition when create or alter a table. When a column was defined as a compressed column, the column data will be compressed using zlib (other compress algorithm not support yet). We could get a better compression ratio and performance, more flexibility (vs compressed row format) For example: Create table tcompress ( C1 int, C2 blob compressed, C3 text compressed, C4 text) engine = innodb We achieve this 'big columns compress' function by following step: # Support 'compressed' syntax, and save this attribute in .frm file # Store the 'compressed' attribute in innodb layer, we add DATA_IS_COMPRESSED flag in prtype # If needed , we do compress in row_mysql_store_col_in_innobase_format and do decompress in row_sel_store_mysql_field # Use a compress header to control how to compress/decompress. Compress Header is 1 Byte, 7 Bit: Always 1, mean compressed; 5-6 Bit: Compressed algorithm - Always 0, means zlib.It maybe support other compression algorithm in the future. 0-3 Bit: Bytes of "Record Original Length" Record Original Length: 1-4 Bytes*/ We add system global variable 'field_compress_min_len' was used to control that only compress the column if the data length exceeds 'field_compress_min_len'. Default 128. Also we add 3 error num: ER_FIELD_TYPE_NOT_ALLOWED_AS_COMPRESSED_FIELD: support text/blob/varchar/varbinary column has compress attribute only. ER_FIELD_CAN_NOT_COMPRESSED_AND_INDEX: column has compress attribute can not be an index ER_FIELD_CAN_NOT_COMPRESSED_IN_CURRENT_ENGINESS: column compress be supported in innodb only

Sergey Vojtovich made changes - 2016-11-29 07:48

Description

Some big columns(blob/text/varchar/varbinary) waste a lot of space, we introduce “compressed” into column definition when create or alter a table.
When a column was defined as a compressed column, the column data will be compressed using zlib (other compress algorithm not support yet).
We could get a better compression ratio and performance, more flexibility (vs compressed row format)
For example:
Create table tcompress (
C1 int,
C2 blob compressed,
C3 text compressed,
C4 text) engine = innodb

We achieve this 'big columns compress' function by following step:

# Support 'compressed' syntax, and save this attribute in .frm file
# Store the 'compressed' attribute in innodb layer, we add DATA_IS_COMPRESSED flag in prtype
# If needed , we do compress in row_mysql_store_col_in_innobase_format and do decompress in row_sel_store_mysql_field
# Use a compress header to control how to compress/decompress.
Compress Header is 1 Byte,
7 Bit: Always 1, mean compressed;
5-6 Bit: Compressed algorithm - Always 0, means zlib.It maybe support other compression algorithm in the future.
0-3 Bit: Bytes of "Record Original Length"
Record Original Length: 1-4 Bytes*/
We add system global variable 'field_compress_min_len' was used to control that only compress the column if the data length exceeds 'field_compress_min_len'. Default 128.
Also we add 3 error num:
ER_FIELD_TYPE_NOT_ALLOWED_AS_COMPRESSED_FIELD: support text/blob/varchar/varbinary column has compress attribute only.
ER_FIELD_CAN_NOT_COMPRESSED_AND_INDEX: column has compress attribute can not be an index
ER_FIELD_CAN_NOT_COMPRESSED_IN_CURRENT_ENGINESS: column compress be supported in innodb only

Some big columns(blob/text/varchar/varbinary) waste a lot of space, we introduce “compressed” into column definition when create or alter a table.
When a column was defined as a compressed column, the column data will be compressed using zlib (other compress algorithm not support yet).
We could get a better compression ratio and performance, more flexibility (vs compressed row format)
For example:
Create table tcompress (
C1 int,
C2 blob compressed,
C3 text compressed,
C4 text) engine = innodb

We achieve this 'big columns compress' function by following step:

# Support 'compressed' syntax, and save this attribute in .frm file
# Store the 'compressed' attribute in innodb layer, we add DATA_IS_COMPRESSED flag in prtype
# If needed , we do compress in row_mysql_store_col_in_innobase_format and do decompress in row_sel_store_mysql_field
# Use a compress header to control how to compress/decompress.
Compress Header is 1 Byte,
7 Bit: Always 1, mean compressed;
5-6 Bit: Compressed algorithm - Always 0, means zlib.It maybe support other compression algorithm in the future.
0-3 Bit: Bytes of "Record Original Length"
Record Original Length: 1-4 Bytes*/

We add system global variable 'field_compress_min_len' was used to control that only compress the column if the data length exceeds 'field_compress_min_len'. Default 128.
Also we add 3 error num:
ER_FIELD_TYPE_NOT_ALLOWED_AS_COMPRESSED_FIELD: support text/blob/varchar/varbinary column has compress attribute only.
ER_FIELD_CAN_NOT_COMPRESSED_AND_INDEX: column has compress attribute can not be an index
ER_FIELD_CAN_NOT_COMPRESSED_IN_CURRENT_ENGINESS: column compress be supported in innodb only

Sergei Golubchik made changes - 2016-11-29 11:53

Fix Version/s		10.3 [ 22126 ]
Fix Version/s	10.2 [ 14601 ]

Sergey Vojtovich made changes - 2016-11-29 13:38

Link

This issue relates to ~~MDEV-11381~~ [ ~~MDEV-11381~~ ]

Rasmus Johansson (Inactive) made changes - 2017-07-06 20:42

Sprint

10.3.1-2 [ 174 ]

Rasmus Johansson (Inactive) made changes - 2017-07-06 20:42

Rank

Ranked lower

Alice Sherepa made changes - 2017-07-24 13:40

Link

This issue relates to ~~MDEV-13378~~ [ ~~MDEV-13378~~ ]

Sergey Vojtovich made changes - 2017-07-25 11:58

Summary

Big column compressed(innodb)

Big column compressed

Sergey Vojtovich made changes - 2017-07-25 12:37

Description

Some big columns(blob/text/varchar/varbinary) waste a lot of space, we introduce “compressed” into column definition when create or alter a table.
When a column was defined as a compressed column, the column data will be compressed using zlib (other compress algorithm not support yet).
We could get a better compression ratio and performance, more flexibility (vs compressed row format)
For example:
Create table tcompress (
C1 int,
C2 blob compressed,
C3 text compressed,
C4 text) engine = innodb

We achieve this 'big columns compress' function by following step:

# Support 'compressed' syntax, and save this attribute in .frm file
# Store the 'compressed' attribute in innodb layer, we add DATA_IS_COMPRESSED flag in prtype
# If needed , we do compress in row_mysql_store_col_in_innobase_format and do decompress in row_sel_store_mysql_field
# Use a compress header to control how to compress/decompress.
Compress Header is 1 Byte,
7 Bit: Always 1, mean compressed;
5-6 Bit: Compressed algorithm - Always 0, means zlib.It maybe support other compression algorithm in the future.
0-3 Bit: Bytes of "Record Original Length"
Record Original Length: 1-4 Bytes*/

We add system global variable 'field_compress_min_len' was used to control that only compress the column if the data length exceeds 'field_compress_min_len'. Default 128.
Also we add 3 error num:
ER_FIELD_TYPE_NOT_ALLOWED_AS_COMPRESSED_FIELD: support text/blob/varchar/varbinary column has compress attribute only.
ER_FIELD_CAN_NOT_COMPRESSED_AND_INDEX: column has compress attribute can not be an index
ER_FIELD_CAN_NOT_COMPRESSED_IN_CURRENT_ENGINESS: column compress be supported in innodb only

Storage engine independent support for column compression.

TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT,
VARCHAR and VARBINARY columns can be compressed.

New COMPRESSED column attribute added:
COMPRESSED[=<compression_method>]

The only supported method currently is zlib. It is not possible to create index over compressed column.
CSV storage engine stores compressed field data uncompressed on disk.
Binary log stores compressed field data compressed on disk.

System variables added:
column_compression_threshold - Minimum column data length eligible for compression.
column_compression_zlib_level - zlib compression level (1 gives best speed, 9 gives best compression).
column_compression_zlib_strategy - The strategy parameter is used to tune the compression algorithm. Use the value DEFAULT_STRATEGY for normal data, FILTERED for data produced by a filter (or predictor), HUFFMAN_ONLY to force Huffman encoding only (no string match), or RLE to limit match distances to one (run-length encoding). Filtered data consists mostly of small values with a somewhat random distribution. In this case, the compression algorithm is tuned to compress them better. The effect of FILTERED is to force more Huffman coding and less string matching; it is somewhat intermediate between DEFAULT_STRATEGY and HUFFMAN_ONLY. RLE is designed to be almost as fast as HUFFMAN_ONLY, but give better compression for PNG image data. The strategy parameter only affects the compression ratio but not the correctness of the compressed output even if it is not set appropriately. FIXED prevents the use of dynamic Huffman codes, allowing for a simpler decoder for special applications.
column_compression_zlib_wrap - Generate zlib header and trailer and compute adler32 check value. It can be used with storage engines that don't provide data integrity verification to detect data corruption.

Status variables added:
Column_compressions - incremented every time field data is compressed.
Column_decompressions - incremented every time field data is decompressed.

Elena Stepanova made changes - 2017-08-15 20:55

Link

This issue relates to ~~MDEV-13539~~ [ ~~MDEV-13539~~ ]

Elena Stepanova made changes - 2017-08-15 22:01

Link

This issue relates to ~~MDEV-13540~~ [ ~~MDEV-13540~~ ]

Elena Stepanova made changes - 2017-08-15 22:31

Link

This issue is blocked by ~~MDEV-13541~~ [ ~~MDEV-13541~~ ]

Elena Stepanova made changes - 2017-08-15 22:32

Link

This issue relates to ~~MDEV-13540~~ [ ~~MDEV-13540~~ ]

Elena Stepanova made changes - 2017-08-15 22:32

Link

This issue is blocked by ~~MDEV-13540~~ [ ~~MDEV-13540~~ ]

Sergey Vojtovich added a comment - 2017-08-31 14:40

Pushed https://github.com/MariaDB/server/commit/fdc47792354c820aa4a8542d7c00d434424a63fb

Sergey Vojtovich added a comment - 2017-08-31 14:40 Pushed https://github.com/MariaDB/server/commit/fdc47792354c820aa4a8542d7c00d434424a63fb

Sergey Vojtovich made changes - 2017-08-31 14:40

Fix Version/s		10.3.2 [ 22533 ]
Fix Version/s	10.3 [ 22126 ]
Resolution		Fixed [ 1 ]
Status	Open [ 1 ]	Closed [ 6 ]

Marko Mäkelä made changes - 2017-09-01 13:33

Link

This issue relates to ~~MDEV-13359~~ [ ~~MDEV-13359~~ ]

Marko Mäkelä made changes - 2017-09-13 11:56

Link

This issue relates to ~~MDEV-13795~~ [ ~~MDEV-13795~~ ]

Marko Mäkelä made changes - 2017-09-20 15:18

Link

This issue causes ~~MDEV-13857~~ [ ~~MDEV-13857~~ ]

Marko Mäkelä made changes - 2021-08-17 09:29

Link

This issue causes ~~MDEV-24797~~ [ ~~MDEV-24797~~ ]

Sergei Golubchik made changes - 2021-12-06 21:23

Workflow

MariaDB v3 [ 78481 ]

MariaDB v4 [ 133015 ]

People

Assignee:: Sergey Vojtovich

Reporter:: Sergey Vojtovich

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Due:: 2016-12-22

Created:: 2016-11-29 07:47

Updated:: 2021-08-17 09:29

Resolved:: 2017-08-31 14:40

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Git Integration