Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.6.1, 6.1.1
-
2021-7, 2021-8
Description
It seems the fix for MCOL-4065 was incomplete.
There is a test in rowgroup/rowgroup.cpp which makes CHAR(1) go around the collation aware routines and perform the comparison always case sensitively:
if (UNLIKELY(getColType(col) == execplan::CalpontSystemCatalog::VARCHAR || |
(getColType(col) == execplan::CalpontSystemCatalog::CHAR && (colWidths[col] > 1)) || |
getColType(col) == execplan::CalpontSystemCatalog::TEXT))
|
{
|
CHARSET_INFO* cs = getCharset(col);
|
This script demonstrates the problem:
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a CHAR(1) CHARACTER SET latin1) ENGINE=ColumnStore; |
INSERT INTO t1 VALUES ('a'),('A'); |
SELECT a, COUNT(*) FROM t1 GROUP BY a; |
+------+----------+
|
| a | COUNT(*) |
|
+------+----------+
|
| a | 1 |
|
| A | 1 |
|
+------+----------+
|
SELECT DISTINCT a FROM t1; |
+------+
|
| a |
|
+------+
|
| a |
|
| A |
|
+------+
|
The above results are wrong: 'A' and 'a' are equal values in latin1_swedish_ci. Both queries should return one row.
The expected results are:
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a CHAR(1) CHARACTER SET latin1) ENGINE=MyISAM; |
INSERT INTO t1 VALUES ('a'),('A'); |
SELECT a, COUNT(*) FROM t1 GROUP BY a; |
+------+----------+
|
| a | COUNT(*) |
|
+------+----------+
|
| a | 2 |
|
+------+----------+
|
SELECT DISTINCT a FROM t1; |
+------+
|
| a |
|
+------+
|
| a |
|
+------+
|