[MDEV-31724] Compressed varchar values lost on joins when sorting on columns from joined table(s) Created: 2023-07-17  Updated: 2023-08-15  Resolved: 2023-08-15

Status: Closed
Project: MariaDB Server
Component/s: Character Sets, Data types, Server
Affects Version/s: 10.4.30, 10.6.14, 10.11.4
Fix Version/s: 10.8.8, 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.2, 11.2.1

Type: Bug Priority: Critical
Reporter: Vladimir "Bob" Zakharychev Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MDEV-31724.test    
Issue Links:
Duplicate
is duplicated by MDEV-24797 Column Compression - ERROR 1265 (0100... Closed

 Description   

When (inner or outer) joining a table with compressed varchar column(s) to another table and sorting on column(s) of the joined table, compressed values are replaced with NULLs in returned result set. We heuristically assume that minimum varchar length to compress is 100, values shorter than that are not compressed and returned correctly.

A minimal reproducible test case (reproduced in 10.11.4 and 10.6.14, probably affects previous releases as well):

Server version: 10.11.4-MariaDB MariaDB Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [db]> CREATE TABLE t1 (
    ->   id  int(10) unsigned not null,
    ->   txt varchar(5000) COMPRESSED NOT NULL DEFAULT '',
    ->   PRIMARY KEY (id)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;
 
Query OK, 0 rows affected (0.018 sec)
 
MariaDB [db]> CREATE TABLE t2 (
    ->   id int(10) unsigned not null,
    ->   n1 bigint(20) NOT NULL,
    ->   n2 bigint(20) NOT NULL,
    ->   n3 bigint(20) NOT NULL,
    ->   PRIMARY KEY (id)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;
Query OK, 0 rows affected (0.009 sec)
 
MariaDB [db]> INSERT INTO t1 VALUES
    -> (1, 'short string < 100 chars'),
    -> (2, 'long string =  99 chars                                                                            '),
    -> (3, 'long string = 100 chars                                                                            !'),
    -> (4, 'long string = 101 chars                                                                             !');
Query OK, 4 rows affected (0.001 sec)
Records: 4  Duplicates: 0  Warnings: 0
 
MariaDB [db]> INSERT INTO t2 VALUES
    -> (1, 24, 1, 1),
    -> (2, 99, 2, 2),
    -> (3, 100, 3, 3),
    -> (4, 101, 4, 4)
    -> ;
Query OK, 4 rows affected (0.001 sec)
Records: 4  Duplicates: 0  Warnings: 0
 
MariaDB [db]> SELECT txt, v.* FROM t1 LEFT JOIN t2 v ON t1.id = v.id;
+-------------------------------------------------------------------------------------------------------+------+------+------+------+
| txt                                                                                                   | id   | n1   | n2   | n3   |
+-------------------------------------------------------------------------------------------------------+------+------+------+------+
| short string < 100 chars                                                                              |    1 |   24 |    1 |    1 |
| long string =  99 chars                                                                               |    2 |   99 |    2 |    2 |
| long string = 100 chars                                                                            !  |    3 |  100 |    3 |    3 |
| long string = 101 chars                                                                             ! |    4 |  101 |    4 |    4 |
+-------------------------------------------------------------------------------------------------------+------+------+------+------+
4 rows in set (0.000 sec)
 
MariaDB [db]> SELECT txt, v.* FROM t1 LEFT JOIN t2 v ON t1.id = v.id ORDER BY v.n1;
+-----------------------------------------------------------------------------------------------------+------+------+------+------+
| txt                                                                                                 | id   | n1   | n2   | n3   |
+-----------------------------------------------------------------------------------------------------+------+------+------+------+
| short string < 100 chars                                                                            |    1 |   24 |    1 |    1 |
| long string =  99 chars                                                                             |    2 |   99 |    2 |    2 |
|                                                                                                     |    3 |  100 |    3 |    3 |
|                                                                                                     |    4 |  101 |    4 |    4 |
+-----------------------------------------------------------------------------------------------------+------+------+------+------+
4 rows in set (0.001 sec)
MariaDB [db]> SELECT txt, v.* FROM t1 JOIN t2 v ON t1.id = v.id;
+-------------------------------------------------------------------------------------------------------+----+-----+----+----+
| txt                                                                                                   | id | n1  | n2 | n3 |
+-------------------------------------------------------------------------------------------------------+----+-----+----+----+
| short string < 100 chars                                                                              |  1 |  24 |  1 |  1 |
| long string =  99 chars                                                                               |  2 |  99 |  2 |  2 |
| long string = 100 chars                                                                            !  |  3 | 100 |  3 |  3 |
| long string = 101 chars                                                                             ! |  4 | 101 |  4 |  4 |
+-------------------------------------------------------------------------------------------------------+----+-----+----+----+
4 rows in set (0.002 sec)
 
MariaDB [db]> SELECT txt, v.* FROM t1 JOIN t2 v ON t1.id = v.id ORDER BY v.n1;
+-----------------------------------------------------------------------------------------------------+----+-----+----+----+
| txt                                                                                                 | id | n1  | n2 | n3 |
+-----------------------------------------------------------------------------------------------------+----+-----+----+----+
| short string < 100 chars                                                                            |  1 |  24 |  1 |  1 |
| long string =  99 chars                                                                             |  2 |  99 |  2 |  2 |
|                                                                                                     |  3 | 100 |  3 |  3 |
|                                                                                                     |  4 | 101 |  4 |  4 |
+-----------------------------------------------------------------------------------------------------+----+-----+----+----+
4 rows in set (0.001 sec)

DDL and queries to reproduce:

CREATE TABLE t1 (
  id  int(10) unsigned not null,
  txt varchar(5000) COMPRESSED NOT NULL DEFAULT '',
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;
 
CREATE TABLE t2 (
  id int(10) unsigned not null,
  n1 bigint(20) NOT NULL,
  n2 bigint(20) NOT NULL,
  n3 bigint(20) NOT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;
 
 
INSERT INTO t1 VALUES 
(1, 'short string < 100 chars'), 
(2, 'long string =  99 chars                                                                            '),
(3, 'long string = 100 chars                                                                            !'),
(4, 'long string = 101 chars                                                                             !');
 
INSERT INTO t2 VALUES
(1, 24, 1, 1),
(2, 99, 2, 2),
(3, 100, 3, 3),
(4, 101, 4, 4)
;
 
SELECT txt, v.* FROM t1 LEFT JOIN t2 v ON t1.id = v.id;
SELECT txt, v.* FROM t1 LEFT JOIN t2 v ON t1.id = v.id ORDER BY v.n1;
SELECT txt, v.* FROM t1 JOIN t2 v ON t1.id = v.id;
SELECT txt, v.* FROM t1 JOIN t2 v ON t1.id = v.id ORDER BY v.n1;



 Comments   
Comment by Daniel Black [ 2023-07-17 ]

Thanks for the test case. Much appreciated.

Looks highly related (and probably duplicate) of MDEV-24797 - do_varstring2_mb works out the string length to be 0.

Comment by Alexander Barkov [ 2023-07-21 ]

Repeatable with this script with one table with InnoDB and MyISAM:

CREATE OR REPLACE TABLE t1 (
  id INT NOT NULL PRIMARY KEY,
  txt varchar(5000) COMPRESSED NOT NULL DEFAULT ''
) CHARSET=utf8mb3;
 
INSERT INTO t1 VALUES 
(1, REPEAT('a', 10)),
(2, REPEAT('b', 99)),
(3, REPEAT('c', 100)),
(4, REPEAT('d', 121));
 
SELECT txt, sysdate(6) FROM t1 ORDER BY 2;

+-----------------------------------------------------------------------------------------------------+----------------------------+
| txt                                                                                                 | sysdate(6)                 |
+-----------------------------------------------------------------------------------------------------+----------------------------+
| aaaaaaaaaa                                                                                          | 2023-07-21 11:09:49.668463 |
| bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb | 2023-07-21 11:09:49.668547 |
|                                                                                                     | 2023-07-21 11:09:49.668598 |
|                                                                                                     | 2023-07-21 11:09:49.668643 |
+-----------------------------------------------------------------------------------------------------+----------------------------+

Comment by Alexander Barkov [ 2023-07-21 ]

The problem happens because do_varstring* calculate prefixes of the data, but does not take into account that the data is compressed.

The problem is also repeatable in the following scripts (with InnoDB and MyISAM):

do_varstring1:

CREATE OR REPLACE TABLE t1 (a VARCHAR(254) COMPRESSED CHARACTER SET latin1);
INSERT INTO t1 VALUES (REPEAT('a',254));
ALTER IGNORE TABLE t1 MODIFY a VARCHAR(4) COMPRESSED CHARACTER SET latin1;
SELECT * FROM t1;

ERROR 1259 (HY000): ZLIB: Input data corrupted

do_varstring2

DROP TABLE IF EXISTS t1;
DROP FUNCTION IF EXISTS f1;
 
DELIMITER $$
CREATE FUNCTION f1(imax INT, jmax INT) RETURNS TEXT
BEGIN
  DECLARE res TEXT DEFAULT 'x';
  FOR i IN 0..imax
  DO
    FOR j IN 0..jmax
    DO
      SET res=CONCAT(res, ' ', i, ' ', j);
    END FOR;
  END FOR;
  RETURN res;
END;
$$
DELIMITER ;
 
CREATE TABLE t1 (a VARCHAR(32000) COMPRESSED CHARACTER SET latin1);
INSERT INTO t1 VALUES (f1(31,31));
SELECT LENGTH(a) FROM t1;
ALTER IGNORE TABLE t1 MODIFY a VARCHAR(256) COMPRESSED CHARACTER SET latin1;
SELECT * FROM t1;

ERROR 1259 (HY000): ZLIB: Input data corrupted

do_varstring1_mb:

CREATE OR REPLACE TABLE t1 (a VARCHAR(80) COMPRESSED CHARACTER SET utf8);
INSERT INTO t1 VALUES (REPEAT('a',80));
ALTER IGNORE TABLE t1 MODIFY a VARCHAR(1) COMPRESSED CHARACTER SET utf8;
SELECT * FROM t1;

+------+
| a    |
+------+
|      |
+------+

do_varstring2_mb

CREATE OR REPLACE TABLE t1 (a VARCHAR(5000) COMPRESSED CHARACTER SET utf8);
INSERT INTO t1 VALUES (repeat('a0b1c2d3e4f5g6h7i8j9k9l8m7n6o5p4q3a2s1t0uvwxyz',100));
ALTER IGNORE TABLE t1 MODIFY a VARCHAR(256) COMPRESSED CHARACTER SET utf8;
SELECT * FROM t1;

+------+
| a    |
+------+
|      |
+------+

Comment by Alexander Barkov [ 2023-07-21 ]

sanja, please review a fix:
https://github.com/MariaDB/server/commit/cca573500141aa2588dfc9ec718bdc0d0631c00b
Thanks

Comment by Daniel Black [ 2023-07-27 ]

FYI tested MDEV-24797 on the above commit and success:

MariaDB [test]> drop table t1;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> CREATE TABLE t1 (a VARCHAR(500) COMPRESSED CHARACTER SET utf8mb3);
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> 
MariaDB [test]> INSERT INTO t1 SET a=REPEAT('x',127);
Query OK, 1 row affected (0.001 sec)
 
MariaDB [test]> 
MariaDB [test]> ALTER TABLE t1 FORCE;
Query OK, 0 rows affected (0.003 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select length(a),a from t1;
+-----------+---------------------------------------------------------------------------------------------------------------------------------+
| length(a) | a                                                                                                                               |
+-----------+---------------------------------------------------------------------------------------------------------------------------------+
|       127 | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
+-----------+---------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.001 sec)

Previously warnings were generated. Up to you to make it as a duplicate or add more test cases.

Comment by Oleksandr Byelkin [ 2023-08-14 ]

The commit is OK, not OK is using varstring1 varstring2 as something selfdescriptive or well known. Please add appropriate description to them

Comment by Alexander Barkov [ 2023-08-15 ]

sanja thanks for the review. I've updated the comments in the commit and inside the code not to use varstring1 and varstring2. Pushed.

Generated at Thu Feb 08 10:25:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.