[MDEV-15592] Column COMPRESSED should select a 'high order' datatype Created: 2018-03-17  Updated: 2018-05-29  Resolved: 2018-05-11

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.3
Fix Version/s: 10.3.7

Type: Bug Priority: Major
Reporter: roberto spadim Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-15762 VARCHAR(0) COMPRESSED crashes the server Closed
is blocked by MDEV-15763 VARCHAR(1) COMPRESSED crashes the server Closed
Relates
relates to MDEV-16134 Wrong I_S.COLUMNS.CHARACTER_XXX_LENGT... Closed
Sprint: 10.3.6-1

 Description   

I'm using a Azure VM (ubuntu), mariadb ubuntu package
Server version: 10.3.5-MariaDB-10.3.5+maria~xenial mariadb.org binary distribution

MariaDB [test]> show variables like 'column_compre%';
+----------------------------------+------------------+
| Variable_name                    | Value            |
+----------------------------------+------------------+
| column_compression_threshold     | 100              |
| column_compression_zlib_level    | 6                |
| column_compression_zlib_strategy | DEFAULT_STRATEGY |
| column_compression_zlib_wrap     | OFF              |
+----------------------------------+------------------+
4 rows in set (0.025 sec)
 
CREATE TABLE `table_without_compression` (
  `id` int(11) NOT NULL DEFAULT 0,
  `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
  `data` blob NOT NULL,
  PRIMARY KEY (`id`,`gmtime`)
)  ENGINE=InnoDB DEFAULT CHARSET=latin1;
#Query OK, 0 rows affected (0.385 sec)
 
CREATE TABLE `table_compressed` (
  `id` int(11) NOT NULL DEFAULT 0,
  `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
  `data` blob COMPRESSED NOT NULL DEFAULT '',
  PRIMARY KEY (`id`,`gmtime`)
)  ENGINE=InnoDB DEFAULT CHARSET=latin1;*
#Query OK, 0 rows affected (0.385 sec)
 
INSERT INTO table_without_compression VALUES (1,0,repeat('a',65535));
#Query OK, 1 row affected (0.135 sec)
 
INSERT INTO  table_without_compression VALUES (1,1,repeat('a',65536));
#ERROR 1406 (22001): Data too long for column 'data' at row 1   <- JUST TO CHECK IF WE GOT MAX SIZE OF BLOB COLUMN, maybe that's the problem, internally compression consider the from/to column instead of a bigger column (long blob for example when using blob/longblob, i will test with TEXT/LONG TEXT and VARCHAR too)
 
INSERT INTO table_compressed SELECT * FROM table_without_compression;
#ERROR 1406 (22001): Data too long for column 'data' at row 1 <- THAT'S THE ERROR


Same to TEXT

CREATE TABLE `table_compressed2` (
       `id` int(11) NOT NULL DEFAULT 0,
       `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
       `data` text COMPRESSED NOT NULL DEFAULT '',
       PRIMARY KEY (`id`,`gmtime`)
     )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
#Query OK, 0 rows affected (0.231 sec)
 
CREATE TABLE `table_without_compression2` (
       `id` int(11) NOT NULL DEFAULT 0,
       `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
       `data` text NOT NULL,
       PRIMARY KEY (`id`,`gmtime`)
     )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
#Query OK, 0 rows affected (0.253 sec)
 
INSERT INTO table_without_compression2 VALUES (1,0,repeat('a',65535));
#Query OK, 1 row affected (0.061 sec)
 
INSERT INTO table_without_compression2 VALUES (1,0,repeat('a',65536));
#ERROR 1406 (22001): Data too long for column 'data' at row 1  <- JUST TO CHECK IF WE GOT MAX SIZE OF BLOB COLUMN
 
INSERT INTO table_compressed2 SELECT * FROM table_without_compression2;
#ERROR 1406 (22001): Data too long for column 'data' at row 1  <- JUST TO CHECK IF WE GOT MAX SIZE OF BLOB COLUMN


mixing data types:

INSERT INTO table_compressed SELECT * FROM table_without_compression2;
#ERROR 1406 (22001): Data too long for column 'data' at row 1
INSERT INTO table_compressed2 SELECT * FROM table_without_compression;
#ERROR 1406 (22001): Data too long for column 'data' at row 1


creating now a LONGBLOB / LONGTEXT

CREATE TABLE table_compressed_long_blob (
     `id` int(11) NOT NULL DEFAULT 0,
     `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
     `data` LONGBLOB COMPRESSED NOT NULL,
     PRIMARY KEY (`id`,`gmtime`)
     )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
#Query OK, 0 rows affected (0.141 sec)
 
CREATE TABLE table_compressed_long_text (
     `id` int(11) NOT NULL DEFAULT 0,
     `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
     `data` LONGTEXT COMPRESSED NOT NULL,
     PRIMARY KEY (`id`,`gmtime`)
     )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
#Query OK, 0 rows affected (0.161 sec)
 
INSERT INTO table_compressed_long_blob SELECT * FROM table_without_compression2;
#Query OK, 1 row affected (0.045 sec)   <- NICE!
#Records: 1  Duplicates: 0  Warnings: 0
 
INSERT INTO table_compressed_long_text SELECT * FROM table_without_compression2;
#Query OK, 1 row affected (0.009 sec)  <- NICE!
#Records: 1  Duplicates: 0  Warnings: 0

Possible solutions:
1) Report at Documentation about data sizes,
2) Change code to use a bigger (high order) column type <- preffered, since we have users with "INSERT compressed_table SELECT * FROM same_column_type_structure" queries


problem found:

MariaDB [test]> truncate table_without_compression;
Query OK, 0 rows affected (0.131 sec)
 
MariaDB [test]> INSERT INTO table_without_compression VALUES (1,0,repeat('a',65534));
Query OK, 1 row affected (0.039 sec)
 
MariaDB [test]> INSERT INTO table_compressed SELECT * FROM table_without_compression;
Query OK, 1 row affected (0.044 sec)
Records: 1  Duplicates: 0  Warnings: 0

at compression source code, we have LENGTH-1 as field size

https://github.com/MariaDB/server/commit/fdc47792354c820aa4a8542d7c00d434424a63fb#diff-06426300d2671c77dd6763db8df79306R3241

maybe solution is here (i don't know the full source):
https://github.com/MariaDB/server/commit/fdc47792354c820aa4a8542d7c00d434424a63fb#diff-06426300d2671c77dd6763db8df79306R3257

instead of:
+ return n < 256 ? 1 : n < 65536 ? 2 : n < 16777216 ? 3 : 4;

use:
+ return n < 256-1 ? 1 : n < 65536-1 ? 2 : n < 16777216-1 ? 3 : 4;



 Comments   
Comment by roberto spadim [ 2018-03-17 ]

comment included
https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/

the problem:
VARCHAR(255) have length 255
VARCHAR(255) COMPRESSED have length 255 (but use 2 bytes)


disconsider, tested:

CREATE TABLE `v255` (
`l` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `v255c` (
`l` varchar(255) /!100301 COMPRESSED/ DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

MariaDB [test]> insert into v255 values (repeat('a',255));
Query OK, 1 row affected (0.019 sec)
MariaDB [test]> insert into v255c values (repeat('a',255));
Query OK, 1 row affected (0.055 sec)


that's just a problem about documentation and field limit at BLOB types

Comment by roberto spadim [ 2018-03-17 ]

i think just documentation is ok, we could close this issue (no code changes)

Comment by Sergey Vojtovich [ 2018-03-19 ]

If we simplify this even further, we'll end up with something like:
CREATE TABLE t1(a BLOB COMPRESSED);
INSERT INTO t1 VALUES(REPEAT('a', 65535));

We reserver 1 byte for compressed blobs to be able to store compression
metadata.

Unless I miss something, this is only required when we need to store data
uncompressed, which is edge case. In normal case we could probably remove
this restriction.

So this is pretty valid bug report. If there was some other restriction, we
should definitely document it.

Comment by roberto spadim [ 2018-03-19 ]

yeap, the problem is the edge case but with some particular cases maybe just documentation resolve this issue

I got the problem copying a old table (uncompressed) using SHOW CREATE TABLE xxxx, rewriting some columns with COMPRESSED, and trying to INSERT INTO yyy SELECT * FROM xxx, I didn`t checked that the LENGTH of some fields could be reduced (that wasn`t documented)

I included some comments at documentation, maybe a review and it could be ok:

https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/

"Field Length Compatibility
When using COMPRESSED attribute, check that FIELD LENGTH is reduced by 1, for example: BLOB have 65535 length, and BLOB COMPRESSED have 65535-1 length, more info check MDEV-15592.

Using BLOB, with max length = 65535, and inserting into BLOB COMPRESSED with max length=65534 generate an error (ERROR 1406 (22001): Data too long for column 'xxx' at row 1), increase the COMPRESSED LENGTH to +1 changing from BLOB COMPRESSED to LONGBLOB COMPRESSED to solve this problem"

Comment by Sergey Vojtovich [ 2018-04-03 ]

Found MDEV-15762 and MDEV-15763 while analysing this, fixed those first.

Comment by Sergey Vojtovich [ 2018-04-05 ]

serg, please review patch attached to this issue. This and all other column compression patches are also available via bb-10.3-svoj. Chronologically this one should be reviewed after MDEV-15763 and MDEV-15762.

Comment by Sergey Vojtovich [ 2018-05-03 ]

bar, please review updated patch b4853b2b0ed39c4a26b50c56b9e6a1578235824c.

Comment by Alexander Barkov [ 2018-05-11 ]

The patch
https://github.com/MariaDB/server/commit/6889cfc88c4c83fcb60fba7a95125a8d85d9047f
is OK to push.

Generated at Thu Feb 08 08:22:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.