[MCOL-4721] CHAR(1) is not collation-aware for GROUP/DISTINCT Created: 2021-05-14  Updated: 2021-05-19  Resolved: 2021-05-19

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 5.6.1, 6.1.1
Fix Version/s: 5.6.1, 6.1.1

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: incomplete_fix

Issue Links:
Relates
relates to MCOL-4065 DISTINCT is case sensitive Closed
relates to MCOL-4726 Wrong result of WHERE char1_col='A' Closed
Sprint: 2021-7, 2021-8

 Description   

It seems the fix for MCOL-4065 was incomplete.
There is a test in rowgroup/rowgroup.cpp which makes CHAR(1) go around the collation aware routines and perform the comparison always case sensitively:

         if (UNLIKELY(getColType(col) == execplan::CalpontSystemCatalog::VARCHAR ||
                      (getColType(col) == execplan::CalpontSystemCatalog::CHAR  && (colWidths[col] > 1)) ||
                      getColType(col) == execplan::CalpontSystemCatalog::TEXT))
         {
             CHARSET_INFO* cs = getCharset(col);

This script demonstrates the problem:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(1) CHARACTER SET latin1) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('a'),('A');
SELECT a, COUNT(*) FROM t1 GROUP BY a;

+------+----------+
| a    | COUNT(*) |
+------+----------+
| a    |        1 |
| A    |        1 |
+------+----------+

SELECT DISTINCT a FROM t1;

+------+
| a    |
+------+
| a    |
| A    |
+------+

The above results are wrong: 'A' and 'a' are equal values in latin1_swedish_ci. Both queries should return one row.

The expected results are:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(1) CHARACTER SET latin1) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('a'),('A');
SELECT a, COUNT(*) FROM t1 GROUP BY a;

+------+----------+
| a    | COUNT(*) |
+------+----------+
| a    |        2 |
+------+----------+

SELECT DISTINCT a FROM t1;

+------+
| a    |
+------+
| a    |
+------+


Generated at Thu Feb 08 02:52:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.