[MCOL-4539] WHERE short_char_column='literal' ignores the collation on a huge table Created: 2021-02-15  Updated: 2021-02-19  Resolved: 2021-02-19

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 5.5.1, 5.6.1
Fix Version/s: 5.5.2

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-495 Make string comparison not case sensi... Closed
relates to MCOL-4064 Make JOIN collation aware Closed
relates to MCOL-4065 DISTINCT is case sensitive Closed
relates to MCOL-4388 Equality does not respect the NOPAD c... Closed
relates to MCOL-4417 Non-equality comparison operators do ... Closed
relates to MCOL-4428 WHERE char6col <='STR' OR char6col IS... Closed
relates to MCOL-4498 LIKE is not collation aware Closed
relates to MCOL-4534 MariaDB collation library: improve co... Stalled
Sprint: 2021-3

 Description   

Preparing a dump file with data:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c CHAR(4)) ENGINE=MyISAM;
DELIMITER $$
FOR i IN 0..90000 DO
  INSERT INTO t1 VALUES ('aaaa');
  INSERT INTO t1 VALUES ('bbbb');
  INSERT INTO t1 VALUES ('AAAA');
  INSERT INTO t1 VALUES ('BBBB');
END FOR;
$$
DELIMITER ;
SELECT c INTO OUTFILE 't1.txt' FROM t1;

Creating and populating a ColumnStore table:

DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (c CHAR(4) CHARACTER SET latin1 COLLATE latin1_bin) ENGINE=ColumnStore;
LOAD DATA INFILE 't1.txt' INTO TABLE t2;

Querying the ColumnStore table:

SELECT * FROM t2 WHERE c='aaaa';

+------+
| c    |
+------+
| aaaa |
| aaaa |
| aaaa |
| aaaa |
| aaaa |
| aaaa |
| aaaa |
| aaaa |
| aaaa |
| aaaa |
........
| aaaa |
| AAAA |
| aaaa |
| AAAA |
+------+
147746 rows in set (0.054 sec)

Notice, the leading rows in the result set consist only of the expected values, but the trailing rows contains values with capital letter 'AAAA', which is wrong.

A DISTINCT query also returns a wrong result:

SELECT DISTINCT c FROM t2 WHERE c='aaaa';

+------+
| c    |
+------+
| aaaa |
| AAAA |
+------+

This script with INSERT..SELECT also returns a wrong result:

DROP TABLE IF EXISTS t3;
CREATE TABLE t3 LIKE t2;
INSERT INTO t3 SELECT * FROM t2 WHERE c='aaaa';
SELECT c, COUNT(*) FROM t3 GROUP BY c;

+------+----------+
| c    | COUNT(*) |
+------+----------+
| AAAA |    57745 |
| aaaa |    90001 |
+------+----------+



 Comments   
Comment by David Hall (Inactive) [ 2021-02-17 ]

This work was folded into the patch for MCOL-4527.

Comment by Daniel Lee (Inactive) [ 2021-02-19 ]

Build verified: 5.5.2-1 (Drone builds 1681)

Repeated the test case in the bug description. Only 'aaaa' results returned.

Generated at Thu Feb 08 02:51:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.