Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25041

CAST produces warnings and different results for formally invalid Unicode, while MySQL's accepts them

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.4.13, 10.4.18, 10.5.6, 10.5.10
    • N/A
    • Character Sets
    • None

    Description

      MariaDB's implementation of CAST is different (more "strict") than MySQL's, and this creates additional compatibility problems.

      Consider two examples, MySQL 8.0.23 first:

      openxs@ao756:~/dbs/8.0$ bin/mysql -uroot --socket=/tmp/mysql8.sock
      Welcome to the MySQL monitor.  Commands end with ; or \g.
      Your MySQL connection id is 10
      Server version: 8.0.23 Source distribution
       
      Copyright (c) 2000, 2021, Oracle and/or its affiliates.
       
      Oracle is a registered trademark of Oracle Corporation and/or its
      affiliates. Other names may be trademarks of their respective
      owners.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      mysql> select cast(unhex('FF') as char(1) charset utf8mb4);
      +----------------------------------------------+
      | cast(unhex('FF') as char(1) charset utf8mb4) |
      +----------------------------------------------+
      | ▒                                             |
      +----------------------------------------------+
      1 row in set (0.00 sec)
       
      mysql> select hex(cast(unhex('FF') as char(1) charset utf8mb4));
      +---------------------------------------------------+
      | hex(cast(unhex('FF') as char(1) charset utf8mb4)) |
      +---------------------------------------------------+
      | FF                                                |
      +---------------------------------------------------+
      1 row in set (0.00 sec)
       
      mysql> select CAST(UNHEX('D800AC01') AS CHAR(3072) CHARACTER SET utf16);
      +-----------------------------------------------------------+
      | CAST(UNHEX('D800AC01') AS CHAR(3072) CHARACTER SET utf16) |
      +-----------------------------------------------------------+
      | ?¬                                                        |
      +-----------------------------------------------------------+
      1 row in set (0.00 sec)
       
      mysql> select hex(CAST(UNHEX('D800AC01') AS CHAR(3072) CHARACTER SET utf16));
      +----------------------------------------------------------------+
      | hex(CAST(UNHEX('D800AC01') AS CHAR(3072) CHARACTER SET utf16)) |
      +----------------------------------------------------------------+
      | D800AC01                                                       |
      +----------------------------------------------------------------+
      1 row in set (0.00 sec)
      

      In the former we see invalid UTF8 byte value, 0xFF, accepted. In the latter we see "unpaired surrogate" bytes in UTF16, both accepted silently. Now let's consider the same statements in MariaDB:

      openxs@ao756:~/dbs/maria10.5$ bin/mysql --socket=/tmp/mariadb.sock
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 3
      Server version: 10.5.10-MariaDB MariaDB Server
       
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [(none)]> select cast(unhex('FF') as char(1) charset utf8mb4);
      +----------------------------------------------+
      | cast(unhex('FF') as char(1) charset utf8mb4) |
      +----------------------------------------------+
      | ?                                            |
      +----------------------------------------------+
      1 row in set, 1 warning (0,034 sec)
       
      MariaDB [(none)]> show warnings\G
      *************************** 1. row ***************************
        Level: Warning
         Code: 1300
      Message: Invalid utf8mb4 character string: '\xFF'
      1 row in set (0,000 sec)
       
      MariaDB [(none)]> select CAST(UNHEX('D800AC01') AS CHAR(3072) CHARACTER SET utf16);
      +-----------------------------------------------------------+
      | CAST(UNHEX('D800AC01') AS CHAR(3072) CHARACTER SET utf16) |
      +-----------------------------------------------------------+
      | ?¬?                                                       |
      +-----------------------------------------------------------+
      1 row in set, 1 warning (0,000 sec)
       
      MariaDB [(none)]> show warnings\G
      *************************** 1. row ***************************
        Level: Warning
         Code: 1300
      Message: Invalid utf16 character string: '\xD8\x00\xAC\x01'
      1 row in set (0,000 sec)
       
      MariaDB [(none)]> select hex(CAST(UNHEX('D800AC01') AS CHAR(3072) CHARACTER SET utf16));
      +----------------------------------------------------------------+
      | hex(CAST(UNHEX('D800AC01') AS CHAR(3072) CHARACTER SET utf16)) |
      +----------------------------------------------------------------+
      | 003F00AC003F                                                   |
      +----------------------------------------------------------------+
      1 row in set, 1 warning (0,000 sec)
       
      MariaDB [(none)]> show warnings\G                                               *************************** 1. row ***************************
        Level: Warning
         Code: 1300
      Message: Invalid utf16 character string: '\xD8\x00\xAC\x01'
      1 row in set (0,000 sec)
      

      We see warnings that in DML or DDL case will be errors in strict SQL mode. Moreover, in the latter case with warning a different hex value is produced.

      IMHO this is a bug, but if intended, I'd ask for some mode/setting that would make CAST work the same way as in MySQL, for compatibility of existing applications.

      Attachments

        Activity

          People

            serg Sergei Golubchik
            valerii Valerii Kravchuk
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.