Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
1.5.1
-
2021-1, 2021-2
Description
DISTINCT, as in SELECT DISTINCT c1 from t1 is currently a binary compare in Columnstore. It needs to be utf and case insensitive (for collations that are case insensitve).
SELECT DISTINCT
This script demonstrates the problem:
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a VARCHAR(20) CHARACTER SET latin1) ENGINE=ColumnStore; |
INSERT INTO t1 VALUES ('a'),('b'),('A'),('B'); |
SELECT DISTINCT a FROM t1; |
+------+
|
| a |
|
+------+
|
| a |
|
| b |
|
| A |
|
| B |
|
+------+
|
Notice, all four rows returned.
The expected result should consist of two rows only:
+------+
|
| a |
|
+------+
|
| a |
|
| b |
|
+------+
|
SELECT COUNT(DISTINCT)
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a VARCHAR(20) CHARACTER SET latin1) ENGINE=ColumnStore; |
INSERT INTO t1 VALUES ('a'),('b'),('A'),('B'); |
SELECT COUNT(DISTINCT a) FROM t1; |
+-------------------+
|
| COUNT(DISTINCT a) |
|
+-------------------+
|
| 4 |
|
+-------------------+
|
Looks wrong. The expected result is 2.
SELECT..GROUP BY
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a VARCHAR(20) CHARACTER SET latin1) ENGINE=ColumnStore; |
INSERT INTO t1 VALUES ('a'),('b'),('A'),('B'); |
SELECT a FROM t1 GROUP BY a; |
+------+
|
| a |
|
+------+
|
| A |
|
| a |
|
| B |
|
| b |
|
+------+
|
Looks wrong. The expected result should contain only two rows.
Attachments
Issue Links
- is blocked by
-
MCOL-4422 Remove mariadb.h and my_sys.h dependency from collation.h
- Closed
- relates to
-
MCOL-4388 Equality does not respect the NOPAD collation attribute
- Closed
-
MCOL-4417 Non-equality comparison operators do not work well with NOPAD collations
- Closed
-
MCOL-4498 LIKE is not collation aware
- Closed
-
MCOL-4575 Hash table performance for collation aware DISTINCT
- Open
-
MCOL-495 Make string comparison not case sensitive
- Closed
-
MCOL-4064 Make JOIN collation aware
- Closed
-
MCOL-4539 WHERE short_char_column='literal' ignores the collation on a huge table
- Closed
-
MCOL-4721 CHAR(1) is not collation-aware for GROUP/DISTINCT
- Closed
- split from
-
MCOL-3536 Order by with UTF
- Closed