[MCOL-4388] Equality does not respect the NOPAD collation attribute Created: 2020-11-12  Updated: 2021-02-15  Resolved: 2020-12-04

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 5.4.3
Fix Version/s: 5.5.1

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: datatype

Issue Links:
Blocks
is blocked by MCOL-4422 Remove mariadb.h and my_sys.h depende... Closed
Relates
relates to MCOL-1246 Trailing whitespace in CHAR/VARCHAR b... Closed
relates to MCOL-4064 Make JOIN collation aware Closed
relates to MCOL-4417 Non-equality comparison operators do ... Closed
relates to MCOL-4498 LIKE is not collation aware Closed
relates to MCOL-495 Make string comparison not case sensi... Closed
relates to MCOL-4065 DISTINCT is case sensitive Closed
relates to MCOL-4539 WHERE short_char_column='literal' ign... Closed

 Description   

I create a table with a VARCHAR column using latin1_swedish_nopad_ci as a collation. I expect strings that have different number of trailing spaces to be compared as not equal. But in fact ColumnStore ignores the NOPAD collation attribute in the equality operator and compares 'a' and 'a ' as equal.

This script demonstrates the problem:

DROP TABLE IF EXISTS t5;
CREATE TABLE t5 (a VARCHAR(10) CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci) ENGINE=COLUMNSTORE;
INSERT INTO t5 VALUES ('a'),('a ');
SELECT HEX(a) FROM t5 WHERE a='a';

+--------+
| HEX(a) |
+--------+
| 61     |
| 6120   |
+--------+

Notice, it returned two rows. Only the row with the hex value 61 is expected.

SELECT HEX(a) FROM t5 WHERE a='a ';

+--------+
| HEX(a) |
+--------+
| 61     |
| 6120   |
+--------+

Notice, it returned two rows. Only the row with the hex value 6120 is expected.

Note, if I run the same script with another engine, e.g. InnoDB, it returns a correct result:

DROP TABLE IF EXISTS t5;
CREATE TABLE t5 (a VARCHAR(10) CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci) ENGINE=INNODB;
INSERT INTO t5 VALUES ('a'),('a ');
SELECT HEX(a) FROM t5 WHERE a='a';

+--------+
| HEX(a) |
+--------+
| 61     |
+--------+

SELECT HEX(a) FROM t5 WHERE a='a ';

+--------+
| HEX(a) |
+--------+
| 6120   |
+--------+



 Comments   
Comment by Alexander Barkov [ 2020-11-12 ]

Note, comparison of two fields seems to work fine:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci) ENGINE=COLUMNSTORE;
INSERT INTO t1 VALUES ('a'),('a ');
SELECT hex(tt1.a), hex(tt2.a) FROM t1 tt1, t1 tt2 WHERE tt1.a=tt2.a;

+------------+------------+
| hex(tt1.a) | hex(tt2.a) |
+------------+------------+
| 61         | 61         |
| 6120       | 6120       |
+------------+------------+

Comment by Daniel Lee (Inactive) [ 2020-12-04 ]

Build verified: 5.5.1 (Drone 1265)

The query now returns the same result as innodb.

Generated at Thu Feb 08 02:49:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.