Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16869

String functions don't respect character set of JSON_VALUE

    Details

      Description

      Let's say that we have the following table:

      CREATE TABLE json_test (
      json_col TEXT
      ) DEFAULT CHARSET=latin1;
       
      INSERT INTO json_test VALUES ('{"key1":"Sí"}');
      

      We can get the value of key1 by executing the following:

      SELECT JSON_VALUE(json_col, '$.key1') FROM json_test;
      

      And this works fine:

      MariaDB [db1]> SELECT JSON_VALUE(json_col, '$.key1') FROM json_test;
      +--------------------------------+
      | JSON_VALUE(json_col, '$.key1') |
      +--------------------------------+
      | Sí                             |
      +--------------------------------+
      1 row in set (0.00 sec)
      

      But if we combine this with string functions, then the character set gets messed up. For example, run the following queries instead:

      SELECT REPLACE(JSON_VALUE(json_col, '$.key1'), 'null', '') FROM json_test;
      SELECT CONCAT(json_value(json_col, '$.key1'), 'a') FROM json_test;
      

      And we get something unexpected:

      MariaDB [db1]> SELECT REPLACE(JSON_VALUE(json_col, '$.key1'), 'null', '') FROM json_test;
      +-----------------------------------------------------+
      | REPLACE(JSON_VALUE(json_col, '$.key1'), 'null', '') |
      +-----------------------------------------------------+
      | Sí                                                 |
      +-----------------------------------------------------+
      1 row in set (0.00 sec)
       
      MariaDB [db1]> SELECT CONCAT(json_value(json_col, '$.key1'), 'a') FROM json_test;
      +---------------------------------------------+
      | CONCAT(json_value(json_col, '$.key1'), 'a') |
      +---------------------------------------------+
      | Sía                                        |
      +---------------------------------------------+
      1 row in set (0.00 sec)
      

      We can workaround this problem by explicitly converting the JSON_VALUE return value to a given character set:

      MariaDB [db1]> SELECT REPLACE(CONVERT(JSON_VALUE(json_col, '$.key1') USING utf8), 'null', '') FROM json_test;
      +-------------------------------------------------------------------------+
      | REPLACE(CONVERT(JSON_VALUE(json_col, '$.key1') USING utf8), 'null', '') |
      +-------------------------------------------------------------------------+
      | Sí                                                                      |
      +-------------------------------------------------------------------------+
      1 row in set (0.00 sec)
      

        Attachments

          Activity

            People

            • Assignee:
              holyfoot Alexey Botchkov
              Reporter:
              GeoffMontee Geoff Montee
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: