Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31724

Compressed varchar values lost on joins when sorting on columns from joined table(s)

Details

    Description

      When (inner or outer) joining a table with compressed varchar column(s) to another table and sorting on column(s) of the joined table, compressed values are replaced with NULLs in returned result set. We heuristically assume that minimum varchar length to compress is 100, values shorter than that are not compressed and returned correctly.

      A minimal reproducible test case (reproduced in 10.11.4 and 10.6.14, probably affects previous releases as well):

      Server version: 10.11.4-MariaDB MariaDB Server
       
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [db]> CREATE TABLE t1 (
          ->   id  int(10) unsigned not null,
          ->   txt varchar(5000) COMPRESSED NOT NULL DEFAULT '',
          ->   PRIMARY KEY (id)
          -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;
       
      Query OK, 0 rows affected (0.018 sec)
       
      MariaDB [db]> CREATE TABLE t2 (
          ->   id int(10) unsigned not null,
          ->   n1 bigint(20) NOT NULL,
          ->   n2 bigint(20) NOT NULL,
          ->   n3 bigint(20) NOT NULL,
          ->   PRIMARY KEY (id)
          -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;
      Query OK, 0 rows affected (0.009 sec)
       
      MariaDB [db]> INSERT INTO t1 VALUES
          -> (1, 'short string < 100 chars'),
          -> (2, 'long string =  99 chars                                                                            '),
          -> (3, 'long string = 100 chars                                                                            !'),
          -> (4, 'long string = 101 chars                                                                             !');
      Query OK, 4 rows affected (0.001 sec)
      Records: 4  Duplicates: 0  Warnings: 0
       
      MariaDB [db]> INSERT INTO t2 VALUES
          -> (1, 24, 1, 1),
          -> (2, 99, 2, 2),
          -> (3, 100, 3, 3),
          -> (4, 101, 4, 4)
          -> ;
      Query OK, 4 rows affected (0.001 sec)
      Records: 4  Duplicates: 0  Warnings: 0
       
      MariaDB [db]> SELECT txt, v.* FROM t1 LEFT JOIN t2 v ON t1.id = v.id;
      +-------------------------------------------------------------------------------------------------------+------+------+------+------+
      | txt                                                                                                   | id   | n1   | n2   | n3   |
      +-------------------------------------------------------------------------------------------------------+------+------+------+------+
      | short string < 100 chars                                                                              |    1 |   24 |    1 |    1 |
      | long string =  99 chars                                                                               |    2 |   99 |    2 |    2 |
      | long string = 100 chars                                                                            !  |    3 |  100 |    3 |    3 |
      | long string = 101 chars                                                                             ! |    4 |  101 |    4 |    4 |
      +-------------------------------------------------------------------------------------------------------+------+------+------+------+
      4 rows in set (0.000 sec)
       
      MariaDB [db]> SELECT txt, v.* FROM t1 LEFT JOIN t2 v ON t1.id = v.id ORDER BY v.n1;
      +-----------------------------------------------------------------------------------------------------+------+------+------+------+
      | txt                                                                                                 | id   | n1   | n2   | n3   |
      +-----------------------------------------------------------------------------------------------------+------+------+------+------+
      | short string < 100 chars                                                                            |    1 |   24 |    1 |    1 |
      | long string =  99 chars                                                                             |    2 |   99 |    2 |    2 |
      |                                                                                                     |    3 |  100 |    3 |    3 |
      |                                                                                                     |    4 |  101 |    4 |    4 |
      +-----------------------------------------------------------------------------------------------------+------+------+------+------+
      4 rows in set (0.001 sec)
      MariaDB [db]> SELECT txt, v.* FROM t1 JOIN t2 v ON t1.id = v.id;
      +-------------------------------------------------------------------------------------------------------+----+-----+----+----+
      | txt                                                                                                   | id | n1  | n2 | n3 |
      +-------------------------------------------------------------------------------------------------------+----+-----+----+----+
      | short string < 100 chars                                                                              |  1 |  24 |  1 |  1 |
      | long string =  99 chars                                                                               |  2 |  99 |  2 |  2 |
      | long string = 100 chars                                                                            !  |  3 | 100 |  3 |  3 |
      | long string = 101 chars                                                                             ! |  4 | 101 |  4 |  4 |
      +-------------------------------------------------------------------------------------------------------+----+-----+----+----+
      4 rows in set (0.002 sec)
       
      MariaDB [db]> SELECT txt, v.* FROM t1 JOIN t2 v ON t1.id = v.id ORDER BY v.n1;
      +-----------------------------------------------------------------------------------------------------+----+-----+----+----+
      | txt                                                                                                 | id | n1  | n2 | n3 |
      +-----------------------------------------------------------------------------------------------------+----+-----+----+----+
      | short string < 100 chars                                                                            |  1 |  24 |  1 |  1 |
      | long string =  99 chars                                                                             |  2 |  99 |  2 |  2 |
      |                                                                                                     |  3 | 100 |  3 |  3 |
      |                                                                                                     |  4 | 101 |  4 |  4 |
      +-----------------------------------------------------------------------------------------------------+----+-----+----+----+
      4 rows in set (0.001 sec)
      

      DDL and queries to reproduce:

      CREATE TABLE t1 (
        id  int(10) unsigned not null,
        txt varchar(5000) COMPRESSED NOT NULL DEFAULT '',
        PRIMARY KEY (id)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;
       
      CREATE TABLE t2 (
        id int(10) unsigned not null,
        n1 bigint(20) NOT NULL,
        n2 bigint(20) NOT NULL,
        n3 bigint(20) NOT NULL,
        PRIMARY KEY (id)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;
       
       
      INSERT INTO t1 VALUES 
      (1, 'short string < 100 chars'), 
      (2, 'long string =  99 chars                                                                            '),
      (3, 'long string = 100 chars                                                                            !'),
      (4, 'long string = 101 chars                                                                             !');
       
      INSERT INTO t2 VALUES
      (1, 24, 1, 1),
      (2, 99, 2, 2),
      (3, 100, 3, 3),
      (4, 101, 4, 4)
      ;
       
      SELECT txt, v.* FROM t1 LEFT JOIN t2 v ON t1.id = v.id;
      SELECT txt, v.* FROM t1 LEFT JOIN t2 v ON t1.id = v.id ORDER BY v.n1;
      SELECT txt, v.* FROM t1 JOIN t2 v ON t1.id = v.id;
      SELECT txt, v.* FROM t1 JOIN t2 v ON t1.id = v.id ORDER BY v.n1;
      

      Attachments

        Issue Links

          Activity

            danblack Daniel Black added a comment -

            Thanks for the test case. Much appreciated.

            Looks highly related (and probably duplicate) of MDEV-24797 - do_varstring2_mb works out the string length to be 0.

            danblack Daniel Black added a comment - Thanks for the test case. Much appreciated. Looks highly related (and probably duplicate) of MDEV-24797 - do_varstring2_mb works out the string length to be 0.
            bar Alexander Barkov added a comment - - edited

            Repeatable with this script with one table with InnoDB and MyISAM:

            CREATE OR REPLACE TABLE t1 (
              id INT NOT NULL PRIMARY KEY,
              txt varchar(5000) COMPRESSED NOT NULL DEFAULT ''
            ) CHARSET=utf8mb3;
             
            INSERT INTO t1 VALUES 
            (1, REPEAT('a', 10)),
            (2, REPEAT('b', 99)),
            (3, REPEAT('c', 100)),
            (4, REPEAT('d', 121));
             
            SELECT txt, sysdate(6) FROM t1 ORDER BY 2;
            

            +-----------------------------------------------------------------------------------------------------+----------------------------+
            | txt                                                                                                 | sysdate(6)                 |
            +-----------------------------------------------------------------------------------------------------+----------------------------+
            | aaaaaaaaaa                                                                                          | 2023-07-21 11:09:49.668463 |
            | bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb | 2023-07-21 11:09:49.668547 |
            |                                                                                                     | 2023-07-21 11:09:49.668598 |
            |                                                                                                     | 2023-07-21 11:09:49.668643 |
            +-----------------------------------------------------------------------------------------------------+----------------------------+
            

            bar Alexander Barkov added a comment - - edited Repeatable with this script with one table with InnoDB and MyISAM: CREATE OR REPLACE TABLE t1 ( id INT NOT NULL PRIMARY KEY , txt varchar (5000) COMPRESSED NOT NULL DEFAULT '' ) CHARSET=utf8mb3;   INSERT INTO t1 VALUES (1, REPEAT( 'a' , 10)), (2, REPEAT( 'b' , 99)), (3, REPEAT( 'c' , 100)), (4, REPEAT( 'd' , 121));   SELECT txt, sysdate(6) FROM t1 ORDER BY 2; +-----------------------------------------------------------------------------------------------------+----------------------------+ | txt | sysdate(6) | +-----------------------------------------------------------------------------------------------------+----------------------------+ | aaaaaaaaaa | 2023-07-21 11:09:49.668463 | | bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb | 2023-07-21 11:09:49.668547 | | | 2023-07-21 11:09:49.668598 | | | 2023-07-21 11:09:49.668643 | +-----------------------------------------------------------------------------------------------------+----------------------------+
            bar Alexander Barkov added a comment - - edited

            The problem happens because do_varstring* calculate prefixes of the data, but does not take into account that the data is compressed.

            The problem is also repeatable in the following scripts (with InnoDB and MyISAM):

            do_varstring1:

            CREATE OR REPLACE TABLE t1 (a VARCHAR(254) COMPRESSED CHARACTER SET latin1);
            INSERT INTO t1 VALUES (REPEAT('a',254));
            ALTER IGNORE TABLE t1 MODIFY a VARCHAR(4) COMPRESSED CHARACTER SET latin1;
            SELECT * FROM t1;
            

            ERROR 1259 (HY000): ZLIB: Input data corrupted
            

            do_varstring2

            DROP TABLE IF EXISTS t1;
            DROP FUNCTION IF EXISTS f1;
             
            DELIMITER $$
            CREATE FUNCTION f1(imax INT, jmax INT) RETURNS TEXT
            BEGIN
              DECLARE res TEXT DEFAULT 'x';
              FOR i IN 0..imax
              DO
                FOR j IN 0..jmax
                DO
                  SET res=CONCAT(res, ' ', i, ' ', j);
                END FOR;
              END FOR;
              RETURN res;
            END;
            $$
            DELIMITER ;
             
            CREATE TABLE t1 (a VARCHAR(32000) COMPRESSED CHARACTER SET latin1);
            INSERT INTO t1 VALUES (f1(31,31));
            SELECT LENGTH(a) FROM t1;
            ALTER IGNORE TABLE t1 MODIFY a VARCHAR(256) COMPRESSED CHARACTER SET latin1;
            SELECT * FROM t1;
            

            ERROR 1259 (HY000): ZLIB: Input data corrupted
            

            do_varstring1_mb:

            CREATE OR REPLACE TABLE t1 (a VARCHAR(80) COMPRESSED CHARACTER SET utf8);
            INSERT INTO t1 VALUES (REPEAT('a',80));
            ALTER IGNORE TABLE t1 MODIFY a VARCHAR(1) COMPRESSED CHARACTER SET utf8;
            SELECT * FROM t1;
            

            +------+
            | a    |
            +------+
            |      |
            +------+
            

            do_varstring2_mb

            CREATE OR REPLACE TABLE t1 (a VARCHAR(5000) COMPRESSED CHARACTER SET utf8);
            INSERT INTO t1 VALUES (repeat('a0b1c2d3e4f5g6h7i8j9k9l8m7n6o5p4q3a2s1t0uvwxyz',100));
            ALTER IGNORE TABLE t1 MODIFY a VARCHAR(256) COMPRESSED CHARACTER SET utf8;
            SELECT * FROM t1;
            

            +------+
            | a    |
            +------+
            |      |
            +------+
            

            bar Alexander Barkov added a comment - - edited The problem happens because do_varstring* calculate prefixes of the data, but does not take into account that the data is compressed. The problem is also repeatable in the following scripts (with InnoDB and MyISAM): do_varstring1: CREATE OR REPLACE TABLE t1 (a VARCHAR (254) COMPRESSED CHARACTER SET latin1); INSERT INTO t1 VALUES (REPEAT( 'a' ,254)); ALTER IGNORE TABLE t1 MODIFY a VARCHAR (4) COMPRESSED CHARACTER SET latin1; SELECT * FROM t1; ERROR 1259 (HY000): ZLIB: Input data corrupted do_varstring2 DROP TABLE IF EXISTS t1; DROP FUNCTION IF EXISTS f1;   DELIMITER $$ CREATE FUNCTION f1(imax INT , jmax INT ) RETURNS TEXT BEGIN DECLARE res TEXT DEFAULT 'x' ; FOR i IN 0..imax DO FOR j IN 0..jmax DO SET res=CONCAT(res, ' ' , i, ' ' , j); END FOR ; END FOR ; RETURN res; END ; $$ DELIMITER ;   CREATE TABLE t1 (a VARCHAR (32000) COMPRESSED CHARACTER SET latin1); INSERT INTO t1 VALUES (f1(31,31)); SELECT LENGTH(a) FROM t1; ALTER IGNORE TABLE t1 MODIFY a VARCHAR (256) COMPRESSED CHARACTER SET latin1; SELECT * FROM t1; ERROR 1259 (HY000): ZLIB: Input data corrupted do_varstring1_mb: CREATE OR REPLACE TABLE t1 (a VARCHAR (80) COMPRESSED CHARACTER SET utf8); INSERT INTO t1 VALUES (REPEAT( 'a' ,80)); ALTER IGNORE TABLE t1 MODIFY a VARCHAR (1) COMPRESSED CHARACTER SET utf8; SELECT * FROM t1; +------+ | a | +------+ | | +------+ do_varstring2_mb CREATE OR REPLACE TABLE t1 (a VARCHAR (5000) COMPRESSED CHARACTER SET utf8); INSERT INTO t1 VALUES (repeat( 'a0b1c2d3e4f5g6h7i8j9k9l8m7n6o5p4q3a2s1t0uvwxyz' ,100)); ALTER IGNORE TABLE t1 MODIFY a VARCHAR (256) COMPRESSED CHARACTER SET utf8; SELECT * FROM t1; +------+ | a | +------+ | | +------+
            bar Alexander Barkov added a comment - sanja , please review a fix: https://github.com/MariaDB/server/commit/cca573500141aa2588dfc9ec718bdc0d0631c00b Thanks
            danblack Daniel Black added a comment -

            FYI tested MDEV-24797 on the above commit and success:

            MariaDB [test]> drop table t1;
            Query OK, 0 rows affected (0.001 sec)
             
            MariaDB [test]> CREATE TABLE t1 (a VARCHAR(500) COMPRESSED CHARACTER SET utf8mb3);
            Query OK, 0 rows affected (0.001 sec)
             
            MariaDB [test]> 
            MariaDB [test]> INSERT INTO t1 SET a=REPEAT('x',127);
            Query OK, 1 row affected (0.001 sec)
             
            MariaDB [test]> 
            MariaDB [test]> ALTER TABLE t1 FORCE;
            Query OK, 0 rows affected (0.003 sec)
            Records: 0  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> select length(a),a from t1;
            +-----------+---------------------------------------------------------------------------------------------------------------------------------+
            | length(a) | a                                                                                                                               |
            +-----------+---------------------------------------------------------------------------------------------------------------------------------+
            |       127 | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
            +-----------+---------------------------------------------------------------------------------------------------------------------------------+
            1 row in set (0.001 sec)
            

            Previously warnings were generated. Up to you to make it as a duplicate or add more test cases.

            danblack Daniel Black added a comment - FYI tested MDEV-24797 on the above commit and success: MariaDB [test]> drop table t1; Query OK, 0 rows affected (0.001 sec)   MariaDB [test]> CREATE TABLE t1 (a VARCHAR(500) COMPRESSED CHARACTER SET utf8mb3); Query OK, 0 rows affected (0.001 sec)   MariaDB [test]> MariaDB [test]> INSERT INTO t1 SET a=REPEAT('x',127); Query OK, 1 row affected (0.001 sec)   MariaDB [test]> MariaDB [test]> ALTER TABLE t1 FORCE; Query OK, 0 rows affected (0.003 sec) Records: 0 Duplicates: 0 Warnings: 0   MariaDB [test]> select length(a),a from t1; +-----------+---------------------------------------------------------------------------------------------------------------------------------+ | length(a) | a | +-----------+---------------------------------------------------------------------------------------------------------------------------------+ | 127 | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | +-----------+---------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.001 sec) Previously warnings were generated. Up to you to make it as a duplicate or add more test cases.

            The commit is OK, not OK is using varstring1 varstring2 as something selfdescriptive or well known. Please add appropriate description to them

            sanja Oleksandr Byelkin added a comment - The commit is OK, not OK is using varstring1 varstring2 as something selfdescriptive or well known. Please add appropriate description to them

            sanja thanks for the review. I've updated the comments in the commit and inside the code not to use varstring1 and varstring2. Pushed.

            bar Alexander Barkov added a comment - sanja thanks for the review. I've updated the comments in the commit and inside the code not to use varstring1 and varstring2. Pushed.

            People

              bar Alexander Barkov
              vladzakh Vladimir "Bob" Zakharychev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.