[MDEV-24584] Selecting INT column with COLLATE utf8mb4_general_ci throws an error Created: 2021-01-13  Updated: 2024-01-26  Resolved: 2021-11-08

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.5, 10.1, 10.5.4, 10.5.6, 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.5.14, 10.6.6, 10.7.2

Type: Bug Priority: Critical
Reporter: Rob Schwyzer Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None
Environment:

MariaDB Community 10.5 Docker Container


Issue Links:
Problem/Incident
causes MDEV-33318 ORDER BY COLLATE improperly applied t... In Testing

 Description   

Review below walk-through for how to reproduce the issue and the error messages involved.

Note it is not necessary for this to succeed- there should not be a reason to issue a COLLATE against an integer column like this. However, this behavior does not seem to be documented, so at a minimum if this is intended behavior, this should be more apparent.

If this can be made to work rather than error, this will ease how much code refactoring some users must perform when switching from MySQL/Oracle to MariaDB.

MariaDB [(none)]> SELECT VARIABLE_NAME, SESSION_VALUE
    -> FROM INFORMATION_SCHEMA.SYSTEM_VARIABLES WHERE
    -> VARIABLE_NAME LIKE 'character_set_c%' OR
    -> VARIABLE_NAME LIKE 'character_set_re%' OR
    -> VARIABLE_NAME LIKE 'collation_c%';
+--------------------------+-------------------+
| VARIABLE_NAME            | SESSION_VALUE     |
+--------------------------+-------------------+
| CHARACTER_SET_RESULTS    | latin1            |
| CHARACTER_SET_CONNECTION | latin1            |
| CHARACTER_SET_CLIENT     | latin1            |
| COLLATION_CONNECTION     | latin1_swedish_ci |
+--------------------------+-------------------+
4 rows in set (0.001 sec)
 
MariaDB [(none)]> select 1 collate utf8mb4_general_ci;
ERROR 1253 (42000): COLLATION 'utf8mb4_general_ci' is not valid for CHARACTER SET 'latin1'
MariaDB [(none)]> set names utf8mb4;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> SELECT VARIABLE_NAME, SESSION_VALUE FROM INFORMATION_SCHEMA.SYSTEM_VARIABLES WHERE VARIABLE_NAME LIKE 'character_set_c%' OR VARIABLE_NAME LIKE 'character_set_re%' OR VARIABLE_NAME LIKE 'collation_c%';
+--------------------------+--------------------+
| VARIABLE_NAME            | SESSION_VALUE      |
+--------------------------+--------------------+
| CHARACTER_SET_RESULTS    | utf8mb4            |
| CHARACTER_SET_CONNECTION | utf8mb4            |
| CHARACTER_SET_CLIENT     | utf8mb4            |
| COLLATION_CONNECTION     | utf8mb4_general_ci |
+--------------------------+--------------------+
4 rows in set (0.001 sec)
 
MariaDB [(none)]> select 1 collate utf8mb4_general_ci;
ERROR 1253 (42000): COLLATION 'utf8mb4_general_ci' is not valid for CHARACTER SET 'latin1'
MariaDB [(none)]> select 'TEST' collate utf8mb4_general_ci;
+-----------------------------------+
| 'TEST' collate utf8mb4_general_ci |
+-----------------------------------+
| TEST                              |
+-----------------------------------+
1 row in set (0.000 sec)



 Comments   
Comment by Elena Stepanova [ 2021-01-14 ]

Reproducible on all of 5.5-10.6.

I suppose it is not intended behavior, as it's been fixed in MySQL 5.7+, while reproducible with MySQL 5.6 as of 5.6.43 at least. bar, could you please take a look and confirm (or not)?

SET NAMES utf8mb4;
select 'TEST' collate utf8mb4_general_ci;
select 1 collate utf8mb4_general_ci;

10.5 e4205fba

SET NAMES utf8mb4;
select 'TEST' collate utf8mb4_general_ci;
'TEST' collate utf8mb4_general_ci
TEST
select 1 collate utf8mb4_general_ci;
bug.t                                    [ fail ]
        Test ended at 2021-01-14 14:40:58
 
CURRENT_TEST: bug.t
mysqltest: At line 3: query 'select 1 collate utf8mb4_general_ci' failed: 1253: COLLATION 'utf8mb4_general_ci' is not valid for CHARACTER SET 'latin1'

MySQL 5.7.25

SET NAMES utf8mb4;
select 'TEST' collate utf8mb4_general_ci;
'TEST' collate utf8mb4_general_ci
TEST
select 1 collate utf8mb4_general_ci;
1 collate utf8mb4_general_ci
1

Comment by Alexander Barkov [ 2021-01-14 ]

According to the SQL standard, COLLATE is applicable to string data types only.
But MySQL/MariaDB style is to do automatic data type conversion.
So perhaps it's good that MySQL fixed it.

However, the fix is probably incomplete:

MySQL [test]> select hex(1 collate ucs2_general_ci);
+--------------------------------+
| hex(1 collate ucs2_general_ci) |
+--------------------------------+
| 31                             |
+--------------------------------+

Notice, 0x31 is not a valid UCS2 string.

Comment by Alexander Barkov [ 2021-11-03 ]

sanja, can you please review:

https://github.com/MariaDB/server/tree/bb-10.5-bar-MDEV-24584

?

Thanks.

Comment by Oleksandr Byelkin [ 2021-11-05 ]

The past variant is OK to push

Generated at Thu Feb 08 09:31:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.