Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35496

JSON_UNQUOTE produces utf8mb3 output while JSON can be utf8mb4

    XMLWordPrintable

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.5, 11.7
    • 10.5, 10.6, 10.11, 11.4, 11.7
    • JSON
    • None

    Description

      JSON functions seem be fine with processing UTF8MB4 characters. The only exception is JSON_UNQUOTE which tries to produce UTF8MB3 and so destroys any non-mb3 characters.

      Testcase: (the smiley is this character: https://www.compart.com/en/unicode/U+1F60A)

      set names utf8mb4;
      set @v='{ "color":"๐Ÿ˜Š" }';
      select @v, collation(@v);
      select json_valid(@v);
      select json_extract(@v,'$.color'), collation(json_extract(@v,'$.color'));
      select json_unquote(json_extract(@v,'$.color')), collation(json_unquote(json_extract(@v,'$.color')));
      

      Run it and:

      MariaDB [test]> set names utf8mb4;
      Query OK, 0 rows affected (0.000 sec)
      ย 
      MariaDB [test]> set @v='{ "color":"๐Ÿ˜Š" }';
      Query OK, 0 rows affected (0.000 sec)
      ย 
      MariaDB [test]> select @v, collation(@v);
      +--------------------+-----------------------+
      | @v                 | collation(@v)         |
      +--------------------+-----------------------+
      | { "color":"๐Ÿ˜Š" }     | utf8mb4_uca1400_ai_ci |
      +--------------------+-----------------------+
      1 row in set (0.001 sec)
      

      Note that JSON_VALID and JSON_EXTRACT work:

      MariaDB [test]> select json_valid(@v);
      +----------------+
      | json_valid(@v) |
      +----------------+
      |              1 |
      +----------------+
      1 row in set (0.001 sec)
      ย 
      MariaDB [test]> select json_extract(@v,'$.color'), collation(json_extract(@v,'$.color'));
      +----------------------------+---------------------------------------+
      | json_extract(@v,'$.color') | collation(json_extract(@v,'$.color')) |
      +----------------------------+---------------------------------------+
      | "๐Ÿ˜Š"                         | utf8mb4_uca1400_ai_ci                 |
      +----------------------------+---------------------------------------+
      1 row in set (0.001 sec)
      

      But try removing the quotes and the smiley is lost:

      MariaDB [test]> select json_unquote(json_extract(@v,'$.color')), collation(json_unquote(json_extract(@v,'$.color')));
      +------------------------------------------+-----------------------------------------------------+
      | json_unquote(json_extract(@v,'$.color')) | collation(json_unquote(json_extract(@v,'$.color'))) |
      +------------------------------------------+-----------------------------------------------------+
      | ?                                        | utf8mb3_general_ci                                  |
      +------------------------------------------+-----------------------------------------------------+
      1 row in set (0.001 sec)
      

      Attachments

        Issue Links

          Activity

            People

              rucha174 Rucha Deodhar
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.