[MCOL-4100] functions LOCATE, INSTR, STRCMP and FIND_IN_SET are case sensitve Created: 2020-06-24  Updated: 2020-09-21  Resolved: 2020-08-27

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 1.5.2
Fix Version/s: 5.4.1

Type: Bug Priority: Major
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Issue split
split from MCOL-3536 Order by with UTF Closed
Epic Link: ColumnStore Compatibility Improvements

 Description   

The functions LOCATE, INSTR, STRCMP and FIND_IN_SET should follow the case sensitivity of the collation assigned to the columns involved. They do not.

The standard code is to use the collation of the item_func object, but for these functions, that is always latin1_bin. Don't know why. The correct collation can be found in item_func derived cmp_collation object, but unfortunately, that object is private.



 Comments   
Comment by David Hall (Inactive) [ 2020-06-24 ]

Added a call to Type_std_attributes::agg_arg_charsets_for_comparison for these functions. This is the same call used to populate cmp_collation. It returns the collation corrected for multiple columns using a predefined hierarchy if the columns have different collations.

Comment by Daniel Lee (Inactive) [ 2020-08-27 ]

Build verified: 1.5.4-1 (drone #479)

Case insensitive collation

MariaDB [mytest]> create table t1 (c1 varchar(50)) engine=columnstore default charset=utf8mb4 collate=utf8mb4_general_ci;
Query OK, 0 rows affected (0.182 sec)

MariaDB [mytest]> insert into t1 values ("this is a test");
Query OK, 1 row affected (0.202 sec)

MariaDB [mytest]> select c1 from t1;
----------------

c1

----------------

this is a test

----------------
1 row in set (0.110 sec)

MariaDB [mytest]> select c1, locate('test', c1) from t1;
----------------------------------+

c1 locate('test', c1)

----------------------------------+

this is a test 11

----------------------------------+
1 row in set (0.019 sec)

MariaDB [mytest]> select c1, locate('TEST', c1) from t1;
----------------------------------+

c1 locate('TEST', c1)

----------------------------------+

this is a test 11

----------------------------------+
1 row in set (0.014 sec)

MariaDB [mytest]> select c1, instr('TEST', c1) from t1;
---------------------------------+

c1 instr('TEST', c1)

---------------------------------+

this is a test 0

---------------------------------+
1 row in set (0.013 sec)

MariaDB [mytest]> select c1, instr(c1, 'test') from t1;
---------------------------------+

c1 instr(c1, 'test')

---------------------------------+

this is a test 11

---------------------------------+
1 row in set (0.040 sec)

MariaDB [mytest]> select c1, instr(c1, 'TEST') from t1;
---------------------------------+

c1 instr(c1, 'TEST')

---------------------------------+

this is a test 11

---------------------------------+
1 row in set (0.020 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'TEST') from t1;
----------------------------------+

c1 strcmp(c1, 'TEST')

----------------------------------+

this is a test 1

----------------------------------+
1 row in set (0.014 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'test') from t1;
----------------------------------+

c1 strcmp(c1, 'test')

----------------------------------+

this is a test 1

----------------------------------+
1 row in set (0.015 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'hello') from t1;
-----------------------------------+

c1 strcmp(c1, 'hello')

-----------------------------------+

this is a test 1

-----------------------------------+
1 row in set (0.039 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'this is a test') from t1;
--------------------------------------------+

c1 strcmp(c1, 'this is a test')

--------------------------------------------+

this is a test 0

--------------------------------------------+
1 row in set (0.016 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'THIS IS A TEST') from t1;
--------------------------------------------+

c1 strcmp(c1, 'THIS IS A TEST')

--------------------------------------------+

this is a test 0

--------------------------------------------+
1 row in set (0.024 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'shis is a test') from t1;
--------------------------------------------+

c1 strcmp(c1, 'shis is a test')

--------------------------------------------+

this is a test 1

--------------------------------------------+
1 row in set (0.014 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'uhis is a test') from t1;
--------------------------------------------+

c1 strcmp(c1, 'uhis is a test')

--------------------------------------------+

this is a test -1

--------------------------------------------+
1 row in set (0.014 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'Shis is a test') from t1;
--------------------------------------------+

c1 strcmp(c1, 'Shis is a test')

--------------------------------------------+

this is a test 1

--------------------------------------------+
1 row in set (0.016 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'Uhis is a test') from t1;
--------------------------------------------+

c1 strcmp(c1, 'Uhis is a test')

--------------------------------------------+

this is a test -1

--------------------------------------------+
1 row in set (0.016 sec)

MariaDB [mytest]> insert into t1 values ('test');
Query OK, 1 row affected (0.108 sec)

MariaDB [mytest]> select c1, find_in_set(c1, 'this,is,a,test') from t1;
-------------------------------------------------+

c1 find_in_set(c1, 'this,is,a,test')

-------------------------------------------------+

this is a test 0
test 4

-------------------------------------------------+
2 rows in set (0.020 sec)

MariaDB [mytest]> select c1, find_in_set(c1, 'THIS,IS,A,TEST') from t1;
-------------------------------------------------+

c1 find_in_set(c1, 'THIS,IS,A,TEST')

-------------------------------------------------+

this is a test 0
test 4

-------------------------------------------------+
2 rows in set (0.022 sec)

Case sensitive collation

MariaDB [mytest]> create table t1 (c1 varchar(50)) engine=columnstore default charset=cp1250 collate=cp1250_czech_cs;
Query OK, 0 rows affected (0.457 sec)

MariaDB [mytest]> insert into t1 values ("this is a test");
Query OK, 1 row affected (0.207 sec)

MariaDB [mytest]> select c1, locate('test', c1) from t1;
----------------------------------+

c1 locate('test', c1)

----------------------------------+

this is a test 11

----------------------------------+
1 row in set (0.073 sec)

MariaDB [mytest]> select c1, locate('TEST', c1) from t1;
----------------------------------+

c1 locate('TEST', c1)

----------------------------------+

this is a test 0

----------------------------------+
1 row in set (0.018 sec)

MariaDB [mytest]> select c1, instr(c1,'test') from t1;
--------------------------------+

c1 instr(c1,'test')

--------------------------------+

this is a test 11

--------------------------------+
1 row in set (0.018 sec)

MariaDB [mytest]> select c1, instr(c1,'TEST') from t1;
--------------------------------+

c1 instr(c1,'TEST')

--------------------------------+

this is a test 0

--------------------------------+
1 row in set (0.018 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'this is a test') from t1;
--------------------------------------------+

c1 strcmp(c1, 'this is a test')

--------------------------------------------+

this is a test 0

--------------------------------------------+
1 row in set (0.017 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'THIS IS A TEST') from t1;
--------------------------------------------+

c1 strcmp(c1, 'THIS IS A TEST')

--------------------------------------------+

this is a test 1

--------------------------------------------+
1 row in set (0.018 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'uhis is a test') from t1;
--------------------------------------------+

c1 strcmp(c1, 'uhis is a test')

--------------------------------------------+

this is a test -1

--------------------------------------------+
1 row in set (0.017 sec)

MariaDB [mytest]> select c1, strcmp(c1, 'shis is a test') from t1;
--------------------------------------------+

c1 strcmp(c1, 'shis is a test')

--------------------------------------------+

this is a test 1

--------------------------------------------+
1 row in set (0.017 sec)

MariaDB [mytest]> insert into t1 values ('test');
Query OK, 1 row affected (0.108 sec)

MariaDB [mytest]> select c1, find_in_set(c1, 'this,is,a,test') from t1;
-------------------------------------------------+

c1 find_in_set(c1, 'this,is,a,test')

-------------------------------------------------+

this is a test 0
test 4

-------------------------------------------------+
2 rows in set (0.020 sec)

MariaDB [mytest]> select c1, find_in_set(c1, 'THIS,IS,A,TEST') from t1;
-------------------------------------------------+

c1 find_in_set(c1, 'THIS,IS,A,TEST')

-------------------------------------------------+

this is a test 0
test 0

-------------------------------------------------+
2 rows in set (0.018 sec)

Generated at Thu Feb 08 02:47:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.