Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16750

JSON_SET mishandles unicode every second pair of arguments

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.2, 10.3, 10.2.15
    • Fix Version/s: 10.2.17
    • Component/s: JSON
    • Labels:
      None
    • Environment:
      openSUSE Tumbleweed 20180530

      Description

      +underlined text+

      > show variables like "char%";
      +--------------------------+------------------------------+
      | Variable_name            | Value                        |
      +--------------------------+------------------------------+
      | character_set_client     | utf8mb4                      |
      | character_set_connection | utf8mb4                      |
      | character_set_database   | utf8mb4                      |
      | character_set_filesystem | binary                       |
      | character_set_results    | utf8mb4                      |
      | character_set_server     | utf8mb4                      |
      | character_set_system     | utf8                         |
      | character_sets_dir       | /usr/share/mariadb/charsets/ |
      +--------------------------+------------------------------+
      

      ö in the terminal should be in UTF-8:

      $ echo -n "ö" | od -c
      0000000 303 266
      0000002
      

      JSON_SET with only one pair of arguments works fine

      $ mysql -N -e "SELECT JSON_SET('{}', '$.a', 'ö');" | od -c
      0000000   {   "   a   "   :       " 303 266   "   }  \n
      0000014
      

      with two pairs the second unicode character produces a LATIN1 ö

      $ mysql -N -e "SELECT JSON_SET('{}', '$.a', 'ö', '$.b', 'ö');" | od -c
      0000000   {   "   a   "   :       " 303 266   "   ,       "   b   "   :
      0000020       " 366   "   }  \n
      0000026
      

      which results into this

      > SELECT JSON_SET('{}', '$.a', 'ö', '$.b', 'ö', '$.c', 1);
      +----------------------------------------------------+
      | JSON_SET('{}', '$.a', 'ö', '$.b', 'ö', '$.c', 1)   |
      +----------------------------------------------------+
      | NULL                                               |
      +----------------------------------------------------+
      1 row in set, 1 warning (0.00 sec)
       
      > show warnings;
      +---------+------+------------------------------------------------------------------------+
      | Level   | Code | Message                                                                |
      +---------+------+------------------------------------------------------------------------+
      | Warning | 4035 | Broken JSON string in argument 1 to function 'json_set' at position 16 |
      +---------+------+------------------------------------------------------------------------+
      1 row in set (0.00 sec)
      

      however if we only use special unicode characters every second pair

      > SELECT JSON_SET('{}', '$.a', 'ö', '$.c', 1, '$.b', 'ö');
      +----------------------------------------------------+
      | JSON_SET('{}', '$.a', 'ö', '$.c', 1, '$.b', 'ö')   |
      +----------------------------------------------------+
      | {"a": "ö", "c": 1, "b": "ö"}                       |
      +----------------------------------------------------+
      1 row in set (0.00 sec)
      

      something similar happens with mb4 characters (example with U+1F62B TIRED FACE, JIRA doesn't handle those characters as well so i replaced it with XXXX)

      $ echo -n "XXXX" | od -c
      0000000 360 237 230 253
      0000004
      $ mysql -N -e "SELECT JSON_SET('{}', '$.a', 'XXXX');" | od -c
      0000000   {   "   a   "   :       " 360 237 230 253   "   }  \n
      0000016
      

      however

      > SELECT JSON_SET('{}', '$.a', 'XXXX', '$.b', 'XXXX');
      +----------------------------------------+
      | JSON_SET('{}', '$.a', '?', '$.b', '?') |
      +----------------------------------------+
      | NULL                                   |
      +----------------------------------------+
      1 row in set, 1 warning (0.00 sec)
       
      > show warnings;
      +---------+------+-------------------------------------------------------------------------------+
      | Level   | Code | Message                                                                       |
      +---------+------+-------------------------------------------------------------------------------+
      | Warning | 4038 | Syntax error in JSON text in argument 1 to function 'json_set' at position 24 |
      +---------+------+-------------------------------------------------------------------------------+
      1 row in set (0.00 sec)
       
      > SELECT JSON_SET('{}', '$.a', 'XXXX', '$.b', 1, '$.c', 'XXXX');
      +--------------------------------------------------+
      | JSON_SET('{}', '$.a', '?', '$.b', 1, '$.c', '?') |
      +--------------------------------------------------+
      | {"a": "XXXX", "b": 1, "c": "XXXX"}                     |
      +--------------------------------------------------+
      1 row in set (0.00 sec)
      

        Attachments

          Activity

            People

            • Assignee:
              holyfoot Alexey Botchkov
              Reporter:
              PerjuringSchoolmarms David Flatz
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: