Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6218

Wrong result of CHAR_LENGTH(non-BMP-character) with 3-byte utf8

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.0.10, 10.2(EOL), 10.3(EOL)
    • 10.5
    • Character Sets

    Description

      mysql> SET NAMES utf8;
      Query OK, 0 rows affected (0.00 sec)
       
      mysql> SELECT hex('��'), char_length('��'), octet_length('��');
      +-------------+---------------------+----------------------+
      | hex('��')     | char_length('��')     | octet_length('��')     |
      +-------------+---------------------+----------------------+
      | F09F9881    |                   4 |                    4 |
      +-------------+---------------------+----------------------+
      1 row in set (0.00 sec)

      Notice, I use "SET NAMES utf8" (which is a 3-byte character set
      and supports only BMP characters), but then input a 4-byte character.
      The result of CHAR_LENGTH() is wrong.

      0xF09F9881 is a wrong byte sequence of utf8 (it's correct for utf8mb4 only)

      The expected result would be:

      • either return error for the entire query
      • or replace the character to '?' and thus make CHAR_LENGTH() return 1.

      Attachments

        Issue Links

          Activity

            Repeatable on mysql-5.6.17

            psergei Sergei Petrunia added a comment - Repeatable on mysql-5.6.17

            LEFT also returns a wrong result:

            MariaDB [test]> SELECT hex(left('��',2));
            +---------------------+
            | hex(left('��',2))     |
            +---------------------+
            | F09F                |
            +---------------------+
            1 row in set (0.00 sec)

            bar Alexander Barkov added a comment - LEFT also returns a wrong result: MariaDB [test]> SELECT hex(left('��',2)); +---------------------+ | hex(left('��',2)) | +---------------------+ | F09F | +---------------------+ 1 row in set (0.00 sec)

            RIGHT returns a wrong result:

            MariaDB [test]> SELECT hex(right('��',2));
            +----------------------+
            | hex(right('��',2))     |
            +----------------------+
            | 9881                 |
            +----------------------+

            bar Alexander Barkov added a comment - RIGHT returns a wrong result: MariaDB [test]> SELECT hex(right('��',2)); +----------------------+ | hex(right('��',2)) | +----------------------+ | 9881 | +----------------------+

            SUBSTRING returns a wrong result:

            MariaDB [test]> SELECT hex(substring('��',2,1));
            +----------------------------+
            | hex(substring('��',2,1))     |
            +----------------------------+
            | 9F                         |
            +----------------------------+

            bar Alexander Barkov added a comment - SUBSTRING returns a wrong result: MariaDB [test]> SELECT hex(substring('��',2,1)); +----------------------------+ | hex(substring('��',2,1)) | +----------------------------+ | 9F | +----------------------------+

            In this example, the returned string is also bad formed:

            MariaDB [test]> SELECT '11��2��22222';
            +------------------+
            | 11��2��22222         |
            +------------------+
            | 11��2��22222         |
            +------------------+

            It should probably replace unknown bytes to question marks.

            bar Alexander Barkov added a comment - In this example, the returned string is also bad formed: MariaDB [test]> SELECT '11��2��22222'; +------------------+ | 11��2��22222 | +------------------+ | 11��2��22222 | +------------------+ It should probably replace unknown bytes to question marks.

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.