Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23465

Implement a collation for identifiers

Details

    Description

      In the code we use two things to compare identifiers:

      • my_strcasecmp(system_charset_info, ...), or just system_charset_info->strcasecmp(..) when used directly without the macro. This comparison way is used when the two strings are known to be 0-terminated.
      • system_charset_info->strnncoll(...). This way is used when we know the lengths of the operands and/or the strings are not necessarily 0-terminated.

      There is a problem with this choice:

      • strcasecmp() performs accent sensitive case insensitive comparison
      • strnncoll() performs accent insensitive case insensitive comparison

      This mismatch causes troubles (see examples below).

      To avoid this mismatch, we need a dedicated collation for identifiers, whose strnncoll() compares in the accent sensitive case insensitive way, to have both strcasecmp() and strnncoll() compare in the same style.

      A tentative name is utf8_general_as_ci, or even utf8_identifier_ci.
      Under terms of this task, it may not even be necessary to expose this collation to the SQL level. But later, we should revise INFORMATION_SCHEMA to use this collation for certain columns.

      The following scripts demonstrate the difference:

      • Column names are accent sensitive (correct)

        CREATE OR REPLACE TABLE t1 (a INT, ä INT);
        

        SHOW CREATE TABLE t1;
        +-------+--------------------------------------------------------------------------------------------------------------------+
        | Table | Create Table                                                                                                       |
        +-------+--------------------------------------------------------------------------------------------------------------------+
        | t1    | CREATE TABLE `t1` (
          `a` int(11) DEFAULT NULL,
          `ä` int(11) DEFAULT NULL
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1  |
        +-------+--------------------------------------------------------------------------------------------------------------------+
        1 row in set (0.002 sec)
        

        Notice, the two columns co-exist.

      • Index names demonstrate both accent sensitive (correct) and accent insensitive (wrong) behavior

        CREATE OR REPLACE TABLE t1 (a INT, KEY a(a), KEY ä(a));
        SHOW WARNINGS;
        

        +-------+------+-------------------------------------------------------------------------------------+
        | Level | Code | Message                                                                             |
        +-------+------+-------------------------------------------------------------------------------------+
        | Note  | 1831 | Duplicate index `ä`. This is deprecated and will be disallowed in a future release  |
        +-------+------+-------------------------------------------------------------------------------------+
        

        Notice, index names were compared in accent insensitive way to display the warning.
        However, on ALTER, they are compared in accent sensitive way as expected:

        ALTER TABLE t1 DROP INDEX ä;
        SHOW CREATE TABLE t1;
        

        +-------+--------------------------------------------------------------------------------------------------------+
        | Table | Create Table                                                                                           |
        +-------+--------------------------------------------------------------------------------------------------------+
        | t1    | CREATE TABLE `t1` (
          `a` int(11) DEFAULT NULL,
          KEY `a` (`a`)
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
        +-------+--------------------------------------------------------------------------------------------------------+
        

        The second ALTER does not drop anything:

        ALTER TABLE t1 DROP INDEX ä;
        

        ERROR 1091 (42000): Can't DROP INDEX `ä`; check that it exists
        

      • SP variables are accent insensitive (wrong)

        DELIMITER $$
        BEGIN NOT ATOMIC
          DECLARE a INT;
          DECLARE ä INT;
        END;
        $$
        DELIMITER ;
        

        ERROR 1331 (42000): Duplicate variable: ä
        

      • ROW fields are accent sensitive (correct) on explicit declaration but accent insensitive (wrong) on anchored declaration

        DELIMITER $$
        BEGIN NOT ATOMIC
          DECLARE rec ROW(a INT, ä INT) DEFAULT (1,2);
          SELECT rec.a, rec.ä;
        END;
        $$
        DELIMITER ;
        

        +-------+--------+
        | rec.a | rec.ä  |
        +-------+--------+
        |     1 |      2 |
        +-------+--------+
        

        The above is correct.

        CREATE OR REPLACE TABLE t1 (a INT, ä INT);
        INSERT INTO t1 VALUES (1,2);
        DELIMITER $$
        BEGIN NOT ATOMIC
          DECLARE rec ROW TYPE OF t1 DEFAULT (1,2);
          SELECT rec.a, rec.ä;
        END;
        $$
        DELIMITER ;
        

        +-------+--------+
        | rec.a | rec.ä  |
        +-------+--------+
        |     1 |      1 |
        +-------+--------+
        

        The above is wrong. Notice, the expected value for rec.ä should be 2, like in the first example.

      Attachments

        Issue Links

          Activity

            Transition Time In Source Status Execution Times
            Alexander Barkov made transition -
            Open Closed
            1426d 4h 42m 1

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.