Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15592

Column COMPRESSED should select a 'high order' datatype

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.3(EOL)
    • 10.3.7
    • Server
    • None
    • 10.3.6-1

    Description

      I'm using a Azure VM (ubuntu), mariadb ubuntu package
      Server version: 10.3.5-MariaDB-10.3.5+maria~xenial mariadb.org binary distribution

      MariaDB [test]> show variables like 'column_compre%';
      +----------------------------------+------------------+
      | Variable_name                    | Value            |
      +----------------------------------+------------------+
      | column_compression_threshold     | 100              |
      | column_compression_zlib_level    | 6                |
      | column_compression_zlib_strategy | DEFAULT_STRATEGY |
      | column_compression_zlib_wrap     | OFF              |
      +----------------------------------+------------------+
      4 rows in set (0.025 sec)
       
      CREATE TABLE `table_without_compression` (
        `id` int(11) NOT NULL DEFAULT 0,
        `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
        `data` blob NOT NULL,
        PRIMARY KEY (`id`,`gmtime`)
      )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
      #Query OK, 0 rows affected (0.385 sec)
       
      CREATE TABLE `table_compressed` (
        `id` int(11) NOT NULL DEFAULT 0,
        `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
        `data` blob COMPRESSED NOT NULL DEFAULT '',
        PRIMARY KEY (`id`,`gmtime`)
      )  ENGINE=InnoDB DEFAULT CHARSET=latin1;*
      #Query OK, 0 rows affected (0.385 sec)
       
      INSERT INTO table_without_compression VALUES (1,0,repeat('a',65535));
      #Query OK, 1 row affected (0.135 sec)
       
      INSERT INTO  table_without_compression VALUES (1,1,repeat('a',65536));
      #ERROR 1406 (22001): Data too long for column 'data' at row 1   <- JUST TO CHECK IF WE GOT MAX SIZE OF BLOB COLUMN, maybe that's the problem, internally compression consider the from/to column instead of a bigger column (long blob for example when using blob/longblob, i will test with TEXT/LONG TEXT and VARCHAR too)
       
      INSERT INTO table_compressed SELECT * FROM table_without_compression;
      #ERROR 1406 (22001): Data too long for column 'data' at row 1 <- THAT'S THE ERROR
      

      —
      Same to TEXT

      CREATE TABLE `table_compressed2` (
             `id` int(11) NOT NULL DEFAULT 0,
             `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
             `data` text COMPRESSED NOT NULL DEFAULT '',
             PRIMARY KEY (`id`,`gmtime`)
           )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
      #Query OK, 0 rows affected (0.231 sec)
       
      CREATE TABLE `table_without_compression2` (
             `id` int(11) NOT NULL DEFAULT 0,
             `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
             `data` text NOT NULL,
             PRIMARY KEY (`id`,`gmtime`)
           )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
      #Query OK, 0 rows affected (0.253 sec)
       
      INSERT INTO table_without_compression2 VALUES (1,0,repeat('a',65535));
      #Query OK, 1 row affected (0.061 sec)
       
      INSERT INTO table_without_compression2 VALUES (1,0,repeat('a',65536));
      #ERROR 1406 (22001): Data too long for column 'data' at row 1  <- JUST TO CHECK IF WE GOT MAX SIZE OF BLOB COLUMN
       
      INSERT INTO table_compressed2 SELECT * FROM table_without_compression2;
      #ERROR 1406 (22001): Data too long for column 'data' at row 1  <- JUST TO CHECK IF WE GOT MAX SIZE OF BLOB COLUMN
      

      —
      mixing data types:

      INSERT INTO table_compressed SELECT * FROM table_without_compression2;
      #ERROR 1406 (22001): Data too long for column 'data' at row 1
      INSERT INTO table_compressed2 SELECT * FROM table_without_compression;
      #ERROR 1406 (22001): Data too long for column 'data' at row 1
      

      —
      creating now a LONGBLOB / LONGTEXT

      CREATE TABLE table_compressed_long_blob (
           `id` int(11) NOT NULL DEFAULT 0,
           `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
           `data` LONGBLOB COMPRESSED NOT NULL,
           PRIMARY KEY (`id`,`gmtime`)
           )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
      #Query OK, 0 rows affected (0.141 sec)
       
      CREATE TABLE table_compressed_long_text (
           `id` int(11) NOT NULL DEFAULT 0,
           `gmtime` decimal(22,7) NOT NULL DEFAULT 0.0000000,
           `data` LONGTEXT COMPRESSED NOT NULL,
           PRIMARY KEY (`id`,`gmtime`)
           )  ENGINE=InnoDB DEFAULT CHARSET=latin1;
      #Query OK, 0 rows affected (0.161 sec)
       
      INSERT INTO table_compressed_long_blob SELECT * FROM table_without_compression2;
      #Query OK, 1 row affected (0.045 sec)   <- NICE!
      #Records: 1  Duplicates: 0  Warnings: 0
       
      INSERT INTO table_compressed_long_text SELECT * FROM table_without_compression2;
      #Query OK, 1 row affected (0.009 sec)  <- NICE!
      #Records: 1  Duplicates: 0  Warnings: 0
      

      Possible solutions:
      1) Report at Documentation about data sizes,
      2) Change code to use a bigger (high order) column type <- preffered, since we have users with "INSERT compressed_table SELECT * FROM same_column_type_structure" queries

      —
      problem found:

      MariaDB [test]> truncate table_without_compression;
      Query OK, 0 rows affected (0.131 sec)
       
      MariaDB [test]> INSERT INTO table_without_compression VALUES (1,0,repeat('a',65534));
      Query OK, 1 row affected (0.039 sec)
       
      MariaDB [test]> INSERT INTO table_compressed SELECT * FROM table_without_compression;
      Query OK, 1 row affected (0.044 sec)
      Records: 1  Duplicates: 0  Warnings: 0
      

      at compression source code, we have LENGTH-1 as field size

      https://github.com/MariaDB/server/commit/fdc47792354c820aa4a8542d7c00d434424a63fb#diff-06426300d2671c77dd6763db8df79306R3241

      maybe solution is here (i don't know the full source):
      https://github.com/MariaDB/server/commit/fdc47792354c820aa4a8542d7c00d434424a63fb#diff-06426300d2671c77dd6763db8df79306R3257

      instead of:
      + return n < 256 ? 1 : n < 65536 ? 2 : n < 16777216 ? 3 : 4;

      use:
      + return n < 256-1 ? 1 : n < 65536-1 ? 2 : n < 16777216-1 ? 3 : 4;

      Attachments

        Issue Links

          Activity

            rspadim roberto spadim added a comment - - edited

            comment included
            https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/

            the problem:
            VARCHAR(255) have length 255
            VARCHAR(255) COMPRESSED have length 255 (but use 2 bytes)

            —
            disconsider, tested:

            CREATE TABLE `v255` (
            `l` varchar(255) DEFAULT NULL
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
            CREATE TABLE `v255c` (
            `l` varchar(255) /!100301 COMPRESSED/ DEFAULT NULL
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

            MariaDB [test]> insert into v255 values (repeat('a',255));
            Query OK, 1 row affected (0.019 sec)
            MariaDB [test]> insert into v255c values (repeat('a',255));
            Query OK, 1 row affected (0.055 sec)


            that's just a problem about documentation and field limit at BLOB types

            rspadim roberto spadim added a comment - - edited comment included https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/ the problem: VARCHAR(255) have length 255 VARCHAR(255) COMPRESSED have length 255 (but use 2 bytes) — disconsider, tested: CREATE TABLE `v255` ( `l` varchar(255) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; CREATE TABLE `v255c` ( `l` varchar(255) / !100301 COMPRESSED / DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; MariaDB [test] > insert into v255 values (repeat('a',255)); Query OK, 1 row affected (0.019 sec) MariaDB [test] > insert into v255c values (repeat('a',255)); Query OK, 1 row affected (0.055 sec) that's just a problem about documentation and field limit at BLOB types

            i think just documentation is ok, we could close this issue (no code changes)

            rspadim roberto spadim added a comment - i think just documentation is ok, we could close this issue (no code changes)

            If we simplify this even further, we'll end up with something like:
            CREATE TABLE t1(a BLOB COMPRESSED);
            INSERT INTO t1 VALUES(REPEAT('a', 65535));

            We reserver 1 byte for compressed blobs to be able to store compression
            metadata.

            Unless I miss something, this is only required when we need to store data
            uncompressed, which is edge case. In normal case we could probably remove
            this restriction.

            So this is pretty valid bug report. If there was some other restriction, we
            should definitely document it.

            svoj Sergey Vojtovich added a comment - If we simplify this even further, we'll end up with something like: CREATE TABLE t1(a BLOB COMPRESSED); INSERT INTO t1 VALUES(REPEAT('a', 65535)); We reserver 1 byte for compressed blobs to be able to store compression metadata. Unless I miss something, this is only required when we need to store data uncompressed, which is edge case. In normal case we could probably remove this restriction. So this is pretty valid bug report. If there was some other restriction, we should definitely document it.

            yeap, the problem is the edge case but with some particular cases maybe just documentation resolve this issue

            I got the problem copying a old table (uncompressed) using SHOW CREATE TABLE xxxx, rewriting some columns with COMPRESSED, and trying to INSERT INTO yyy SELECT * FROM xxx, I didn`t checked that the LENGTH of some fields could be reduced (that wasn`t documented)

            I included some comments at documentation, maybe a review and it could be ok:

            https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/
            —
            "Field Length Compatibility
            When using COMPRESSED attribute, check that FIELD LENGTH is reduced by 1, for example: BLOB have 65535 length, and BLOB COMPRESSED have 65535-1 length, more info check MDEV-15592.

            Using BLOB, with max length = 65535, and inserting into BLOB COMPRESSED with max length=65534 generate an error (ERROR 1406 (22001): Data too long for column 'xxx' at row 1), increase the COMPRESSED LENGTH to +1 changing from BLOB COMPRESSED to LONGBLOB COMPRESSED to solve this problem"

            rspadim roberto spadim added a comment - yeap, the problem is the edge case but with some particular cases maybe just documentation resolve this issue I got the problem copying a old table (uncompressed) using SHOW CREATE TABLE xxxx, rewriting some columns with COMPRESSED, and trying to INSERT INTO yyy SELECT * FROM xxx, I didn`t checked that the LENGTH of some fields could be reduced (that wasn`t documented) I included some comments at documentation, maybe a review and it could be ok: https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/ — "Field Length Compatibility When using COMPRESSED attribute, check that FIELD LENGTH is reduced by 1, for example: BLOB have 65535 length, and BLOB COMPRESSED have 65535-1 length, more info check MDEV-15592 . Using BLOB, with max length = 65535, and inserting into BLOB COMPRESSED with max length=65534 generate an error (ERROR 1406 (22001): Data too long for column 'xxx' at row 1), increase the COMPRESSED LENGTH to +1 changing from BLOB COMPRESSED to LONGBLOB COMPRESSED to solve this problem"

            Found MDEV-15762 and MDEV-15763 while analysing this, fixed those first.

            svoj Sergey Vojtovich added a comment - Found MDEV-15762 and MDEV-15763 while analysing this, fixed those first.

            serg, please review patch attached to this issue. This and all other column compression patches are also available via bb-10.3-svoj. Chronologically this one should be reviewed after MDEV-15763 and MDEV-15762.

            svoj Sergey Vojtovich added a comment - serg , please review patch attached to this issue. This and all other column compression patches are also available via bb-10.3-svoj. Chronologically this one should be reviewed after MDEV-15763 and MDEV-15762 .

            bar, please review updated patch b4853b2b0ed39c4a26b50c56b9e6a1578235824c.

            svoj Sergey Vojtovich added a comment - bar , please review updated patch b4853b2b0ed39c4a26b50c56b9e6a1578235824c.
            bar Alexander Barkov added a comment - The patch https://github.com/MariaDB/server/commit/6889cfc88c4c83fcb60fba7a95125a8d85d9047f is OK to push.

            People

              svoj Sergey Vojtovich
              rspadim roberto spadim
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.