[MCOL-4065] DISTINCT is case sensitive Created: 2020-06-15  Updated: 2021-05-14  Resolved: 2021-02-04

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 1.5.1
Fix Version/s: 5.6.1

Type: Bug Priority: Critical
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: datatype

Issue Links:
Blocks
is blocked by MCOL-4422 Remove mariadb.h and my_sys.h depende... Closed
Issue split
split from MCOL-3536 Order by with UTF Closed
Relates
relates to MCOL-4388 Equality does not respect the NOPAD c... Closed
relates to MCOL-4417 Non-equality comparison operators do ... Closed
relates to MCOL-4498 LIKE is not collation aware Closed
relates to MCOL-4575 Hash table performance for collation ... Open
relates to MCOL-495 Make string comparison not case sensi... Closed
relates to MCOL-4064 Make JOIN collation aware Closed
relates to MCOL-4539 WHERE short_char_column='literal' ign... Closed
relates to MCOL-4721 CHAR(1) is not collation-aware for GR... Closed
Epic Link: ColumnStore Compatibility Improvements
Sprint: 2021-1, 2021-2

 Description   

DISTINCT, as in SELECT DISTINCT c1 from t1 is currently a binary compare in Columnstore. It needs to be utf and case insensitive (for collations that are case insensitve).

SELECT DISTINCT

This script demonstrates the problem:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(20) CHARACTER SET latin1) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('a'),('b'),('A'),('B');
SELECT DISTINCT a FROM t1;

+------+
| a    |
+------+
| a    |
| b    |
| A    |
| B    |
+------+

Notice, all four rows returned.

The expected result should consist of two rows only:

+------+
| a    |
+------+
| a    |
| b    |
+------+

SELECT COUNT(DISTINCT)

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(20) CHARACTER SET latin1) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('a'),('b'),('A'),('B');
SELECT COUNT(DISTINCT a) FROM t1;

+-------------------+
| COUNT(DISTINCT a) |
+-------------------+
|                 4 |
+-------------------+

Looks wrong. The expected result is 2.

SELECT..GROUP BY

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(20) CHARACTER SET latin1) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('a'),('b'),('A'),('B');
SELECT a FROM t1 GROUP BY a;

+------+
| a    |
+------+
| A    |
| a    |
| B    |
| b    |
+------+

Looks wrong. The expected result should contain only two rows.



 Comments   
Comment by Gregory Dorman (Inactive) [ 2020-07-01 ]

@david - are you still trying this for 1.5.3, or should we move this to 1.5.4?

Comment by Daniel Lee (Inactive) [ 2021-02-04 ]

Builds verified: 5.6.1, 6.1.1
Branches develop-5, develop, respectively
Drone #1610j, #1620, respectively

Verified mentioned test cases.

Generated at Thu Feb 08 02:47:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.