Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
None
Description
In the code we use two things to compare identifiers:
- my_strcasecmp(system_charset_info, ...), or just system_charset_info->strcasecmp(..) when used directly without the macro. This comparison way is used when the two strings are known to be 0-terminated.
- system_charset_info->strnncoll(...). This way is used when we know the lengths of the operands and/or the strings are not necessarily 0-terminated.
There is a problem with this choice:
- strcasecmp() performs accent sensitive case insensitive comparison
- strnncoll() performs accent insensitive case insensitive comparison
This mismatch causes troubles (see examples below).
To avoid this mismatch, we need a dedicated collation for identifiers, whose strnncoll() compares in the accent sensitive case insensitive way, to have both strcasecmp() and strnncoll() compare in the same style.
A tentative name is utf8_general_as_ci, or even utf8_identifier_ci.
Under terms of this task, it may not even be necessary to expose this collation to the SQL level. But later, we should revise INFORMATION_SCHEMA to use this collation for certain columns.
The following scripts demonstrate the difference:
- Column names are accent sensitive (correct)
CREATE OR REPLACE TABLE t1 (a INT, ä INT);
SHOW CREATE TABLE t1;
+-------+--------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+--------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
`ä` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+--------------------------------------------------------------------------------------------------------------------+
1 row in set (0.002 sec)
Notice, the two columns co-exist.
- Index names demonstrate both accent sensitive (correct) and accent insensitive (wrong) behavior
CREATE OR REPLACE TABLE t1 (a INT, KEY a(a), KEY ä(a));
SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------+
| Note | 1831 | Duplicate index `ä`. This is deprecated and will be disallowed in a future release |
+-------+------+-------------------------------------------------------------------------------------+
Notice, index names were compared in accent insensitive way to display the warning.
However, on ALTER, they are compared in accent sensitive way as expected:ALTER TABLE t1 DROP INDEX ä;
SHOW CREATE TABLE t1;
+-------+--------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+--------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+--------------------------------------------------------------------------------------------------------+
The second ALTER does not drop anything:
ALTER TABLE t1 DROP INDEX ä;
ERROR 1091 (42000): Can't DROP INDEX `ä`; check that it exists
- SP variables are accent insensitive (wrong)
DELIMITER $$
BEGIN NOT ATOMIC
DECLARE a INT;
DECLARE ä INT;
END;
$$
DELIMITER ;
ERROR 1331 (42000): Duplicate variable: ä
- ROW fields are accent sensitive (correct) on explicit declaration but accent insensitive (wrong) on anchored declaration
DELIMITER $$
BEGIN NOT ATOMIC
DECLARE rec ROW(a INT, ä INT) DEFAULT (1,2);
SELECT rec.a, rec.ä;
END;
$$
DELIMITER ;
+-------+--------+
| rec.a | rec.ä |
+-------+--------+
| 1 | 2 |
+-------+--------+
The above is correct.
CREATE OR REPLACE TABLE t1 (a INT, ä INT);
INSERT INTO t1 VALUES (1,2);
DELIMITER $$
BEGIN NOT ATOMIC
DECLARE rec ROW TYPE OF t1 DEFAULT (1,2);
SELECT rec.a, rec.ä;
END;
$$
DELIMITER ;
+-------+--------+
| rec.a | rec.ä |
+-------+--------+
| 1 | 1 |
+-------+--------+
The above is wrong. Notice, the expected value for rec.ä should be 2, like in the first example.
Attachments
Issue Links
- duplicates
-
MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp()
- Closed
- is blocked by
-
MDEV-27009 Add UCA-14.0.0 collations
- Closed
- relates to
-
MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp()
- Closed
-
MDEV-19123 Change default charset from latin1 to utf8mb4
- Closed