Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23465

Implement a collation for identifiers

    XMLWordPrintable

Details

    Description

      In the code we use two things to compare identifiers:

      • my_strcasecmp(system_charset_info, ...), or just system_charset_info->strcasecmp(..) when used directly without the macro. This comparison way is used when the two strings are known to be 0-terminated.
      • system_charset_info->strnncoll(...). This way is used when we know the lengths of the operands and/or the strings are not necessarily 0-terminated.

      There is a problem with this choice:

      • strcasecmp() performs accent sensitive case insensitive comparison
      • strnncoll() performs accent insensitive case insensitive comparison

      This mismatch causes troubles (see examples below).

      To avoid this mismatch, we need a dedicated collation for identifiers, whose strnncoll() compares in the accent sensitive case insensitive way, to have both strcasecmp() and strnncoll() compare in the same style.

      A tentative name is utf8_general_as_ci, or even utf8_identifier_ci.
      Under terms of this task, it may not even be necessary to expose this collation to the SQL level. But later, we should revise INFORMATION_SCHEMA to use this collation for certain columns.

      The following scripts demonstrate the difference:

      • Column names are accent sensitive (correct)

        CREATE OR REPLACE TABLE t1 (a INT, ä INT);
        

        SHOW CREATE TABLE t1;
        +-------+--------------------------------------------------------------------------------------------------------------------+
        | Table | Create Table                                                                                                       |
        +-------+--------------------------------------------------------------------------------------------------------------------+
        | t1    | CREATE TABLE `t1` (
          `a` int(11) DEFAULT NULL,
          `ä` int(11) DEFAULT NULL
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1  |
        +-------+--------------------------------------------------------------------------------------------------------------------+
        1 row in set (0.002 sec)
        

        Notice, the two columns co-exist.

      • Index names demonstrate both accent sensitive (correct) and accent insensitive (wrong) behavior

        CREATE OR REPLACE TABLE t1 (a INT, KEY a(a), KEY ä(a));
        SHOW WARNINGS;
        

        +-------+------+-------------------------------------------------------------------------------------+
        | Level | Code | Message                                                                             |
        +-------+------+-------------------------------------------------------------------------------------+
        | Note  | 1831 | Duplicate index `ä`. This is deprecated and will be disallowed in a future release  |
        +-------+------+-------------------------------------------------------------------------------------+
        

        Notice, index names were compared in accent insensitive way to display the warning.
        However, on ALTER, they are compared in accent sensitive way as expected:

        ALTER TABLE t1 DROP INDEX ä;
        SHOW CREATE TABLE t1;
        

        +-------+--------------------------------------------------------------------------------------------------------+
        | Table | Create Table                                                                                           |
        +-------+--------------------------------------------------------------------------------------------------------+
        | t1    | CREATE TABLE `t1` (
          `a` int(11) DEFAULT NULL,
          KEY `a` (`a`)
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
        +-------+--------------------------------------------------------------------------------------------------------+
        

        The second ALTER does not drop anything:

        ALTER TABLE t1 DROP INDEX ä;
        

        ERROR 1091 (42000): Can't DROP INDEX `ä`; check that it exists
        

      • SP variables are accent insensitive (wrong)

        DELIMITER $$
        BEGIN NOT ATOMIC
          DECLARE a INT;
          DECLARE ä INT;
        END;
        $$
        DELIMITER ;
        

        ERROR 1331 (42000): Duplicate variable: ä
        

      • ROW fields are accent sensitive (correct) on explicit declaration but accent insensitive (wrong) on anchored declaration

        DELIMITER $$
        BEGIN NOT ATOMIC
          DECLARE rec ROW(a INT, ä INT) DEFAULT (1,2);
          SELECT rec.a, rec.ä;
        END;
        $$
        DELIMITER ;
        

        +-------+--------+
        | rec.a | rec.ä  |
        +-------+--------+
        |     1 |      2 |
        +-------+--------+
        

        The above is correct.

        CREATE OR REPLACE TABLE t1 (a INT, ä INT);
        INSERT INTO t1 VALUES (1,2);
        DELIMITER $$
        BEGIN NOT ATOMIC
          DECLARE rec ROW TYPE OF t1 DEFAULT (1,2);
          SELECT rec.a, rec.ä;
        END;
        $$
        DELIMITER ;
        

        +-------+--------+
        | rec.a | rec.ä  |
        +-------+--------+
        |     1 |      1 |
        +-------+--------+
        

        The above is wrong. Notice, the expected value for rec.ä should be 2, like in the first example.

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.