[MDEV-16869] String functions don't respect character set of JSON_VALUE Created: 2018-07-31  Updated: 2020-08-25  Resolved: 2018-08-05

Status: Closed
Project: MariaDB Server
Component/s: Character Sets, JSON
Affects Version/s: 10.2.16, 10.2, 10.3
Fix Version/s: 10.2.17

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Alexey Botchkov
Resolution: Fixed Votes: 0
Labels: None


 Description   

Let's say that we have the following table:

CREATE TABLE json_test (
json_col TEXT
) DEFAULT CHARSET=latin1;
 
INSERT INTO json_test VALUES ('{"key1":"Sí"}');

We can get the value of key1 by executing the following:

SELECT JSON_VALUE(json_col, '$.key1') FROM json_test;

And this works fine:

MariaDB [db1]> SELECT JSON_VALUE(json_col, '$.key1') FROM json_test;
+--------------------------------+
| JSON_VALUE(json_col, '$.key1') |
+--------------------------------+
| Sí                             |
+--------------------------------+
1 row in set (0.00 sec)

But if we combine this with string functions, then the character set gets messed up. For example, run the following queries instead:

SELECT REPLACE(JSON_VALUE(json_col, '$.key1'), 'null', '') FROM json_test;
SELECT CONCAT(json_value(json_col, '$.key1'), 'a') FROM json_test;

And we get something unexpected:

MariaDB [db1]> SELECT REPLACE(JSON_VALUE(json_col, '$.key1'), 'null', '') FROM json_test;
+-----------------------------------------------------+
| REPLACE(JSON_VALUE(json_col, '$.key1'), 'null', '') |
+-----------------------------------------------------+
| Sí                                                 |
+-----------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [db1]> SELECT CONCAT(json_value(json_col, '$.key1'), 'a') FROM json_test;
+---------------------------------------------+
| CONCAT(json_value(json_col, '$.key1'), 'a') |
+---------------------------------------------+
| Sía                                        |
+---------------------------------------------+
1 row in set (0.00 sec)

We can workaround this problem by explicitly converting the JSON_VALUE return value to a given character set:

MariaDB [db1]> SELECT REPLACE(CONVERT(JSON_VALUE(json_col, '$.key1') USING utf8), 'null', '') FROM json_test;
+-------------------------------------------------------------------------+
| REPLACE(CONVERT(JSON_VALUE(json_col, '$.key1') USING utf8), 'null', '') |
+-------------------------------------------------------------------------+
| Sí                                                                      |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)



 Comments   
Comment by Elena Stepanova [ 2018-07-31 ]

JSON_EXTRACT, for example, works okay.

Comment by Alexander Barkov [ 2018-08-03 ]

This script demonstrates the problem:

CREATE OR REPLACE TABLE t1 (j TEXT) DEFAULT CHARSET=latin1;
INSERT INTO t1 VALUES ('{"key1":"Sí"}');
SELECT CONCAT(json_value(j, '$.key1'), 'a') FROM t1;

+--------------------------------------+
| CONCAT(json_value(j, '$.key1'), 'a') |
+--------------------------------------+
| Sía                                 |
+--------------------------------------+

Note, if I change the character set of the column to utf8mb4, it returns the expected value:

CREATE OR REPLACE TABLE t1 (j TEXT CHARACTER SET utf8mb4) DEFAULT CHARSET=latin1;
INSERT INTO t1 VALUES ('{"key1":"Sí"}');
SELECT CONCAT(json_value(j, '$.key1'), 'a') FROM t1;

+--------------------------------------+
| CONCAT(json_value(j, '$.key1'), 'a') |
+--------------------------------------+
| Sía                                  |
+--------------------------------------+

Comment by Alexey Botchkov [ 2018-08-05 ]

http://lists.askmonty.org/pipermail/commits/2018-August/012774.html

Generated at Thu Feb 08 08:32:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.