Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6218

Wrong result of CHAR_LENGTH(non-BMP-character) with 3-byte utf8

    XMLWordPrintable

    Details

      Description

      mysql> SET NAMES utf8;
      Query OK, 0 rows affected (0.00 sec)
       
      mysql> SELECT hex('��'), char_length('��'), octet_length('��');
      +-------------+---------------------+----------------------+
      | hex('��')     | char_length('��')     | octet_length('��')     |
      +-------------+---------------------+----------------------+
      | F09F9881    |                   4 |                    4 |
      +-------------+---------------------+----------------------+
      1 row in set (0.00 sec)

      Notice, I use "SET NAMES utf8" (which is a 3-byte character set
      and supports only BMP characters), but then input a 4-byte character.
      The result of CHAR_LENGTH() is wrong.

      0xF09F9881 is a wrong byte sequence of utf8 (it's correct for utf8mb4 only)

      The expected result would be:

      • either return error for the entire query
      • or replace the character to '?' and thus make CHAR_LENGTH() return 1.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bar Alexander Barkov
              Reporter:
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: