[MDEV-13633] JSON_ARRAY() - bad output with some UTF8 characters Created: 2017-08-23  Updated: 2020-10-14  Resolved: 2017-09-13

Status: Closed
Project: MariaDB Server
Component/s: Character Sets, JSON
Affects Version/s: 10.2.8
Fix Version/s: 10.2.9

Type: Bug Priority: Major
Reporter: Michal Hucik Assignee: Alexey Botchkov
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS Linux 7.3


Issue Links:
Relates
relates to MDEV-23890 JSON_ARRAY utf8mb4 encoding error Open

 Description   

I compared the output from JSON_ARRAY () in mariaDB and json_encode () in php-5.6.31.

I have found that some of the Czech accent characters are not properly converted to number code (connection setting: SET NAMES UTF8):

$row = $this->database->query("SELECT JSON_ARRAY('1. ě 2. š 3. č 4. ř 5. ž 6. ý 7. á 8. í 9. é 10. ů 11. ú') AS json_data")->fetch();
 
$json_test = [
    'mysql' => $row->json_data,
     'php' => json_encode('1. ě 2. š 3. č 4. ř 5. ž 6. ý 7. á 8. í 9. é 10. ů 11. ú'),
];
 
die(var_dump($json_test));
 

And here is output:

array(2) {
  ["mysql"]=>
  string(90) "["1. \u011B 2. \u0161 3. \u010D 4. \u0159 5. \u017E 6. ? 7. ? 8. ? 9. ? 10. \u016F 11. ?"]"
  ["php"]=>
  string(113) ""1. \u011b 2. \u0161 3. \u010d 4. \u0159 5. \u017e 6. \u00fd 7. \u00e1 8. \u00ed 9. \u00e9 10. \u016f 11. \u00fa""
}



 Comments   
Comment by Elena Stepanova [ 2017-08-24 ]

Thanks for the bug report.

MariaDB 10.2

MariaDB [test]> SELECT JSON_ARRAY('1. ě 2. š 3. č 4. ř 5. ž 6. ý 7. á 8. í 9. é 10. ů 11. ú') AS json_data;
+--------------------------------------------------------------------------------------------+
| json_data                                                                                  |
+--------------------------------------------------------------------------------------------+
| ["1. \u011B 2. \u0161 3. \u010D 4. \u0159 5. \u017E 6. � 7. � 8. � 9. � 10. \u016F 11. �"]      |
+--------------------------------------------------------------------------------------------+
 
MariaDB [test]> show variables like '%char%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | utf8                           |
| character_set_connection | utf8                           |
| character_set_database   | latin1                         |
| character_set_filesystem | binary                         |
| character_set_results    | utf8                           |
| character_set_server     | latin1                         |
| character_set_system     | utf8                           |
| character_sets_dir       | /data/bld/10.2/share/charsets/ |
+--------------------------+--------------------------------+
8 rows in set (0.01 sec)
 
MariaDB [test]> select @@version, @@version_comment;
+----------------------+---------------------+
| @@version            | @@version_comment   |
+----------------------+---------------------+
| 10.2.9-MariaDB-debug | Source distribution |
+----------------------+---------------------+
1 row in set (0.00 sec)

MySQL 5.7

MySQL [test]> SELECT JSON_ARRAY('1. ě 2. š 3. č 4. ř 5. ž 6. ý 7. á 8. í 9. é 10. ů 11. ú') AS json_data;
+-------------------------------------------------------------------------+
| json_data                                                               |
+-------------------------------------------------------------------------+
| ["1. ě 2. š 3. č 4. ř 5. ž 6. ý 7. á 8. í 9. é 10. ů 11. ú"]            |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)
 
MySQL [test]> show variables like '%char%';
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | utf8                                |
| character_set_connection | utf8                                |
| character_set_database   | latin1                              |
| character_set_filesystem | binary                              |
| character_set_results    | utf8                                |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |
| character_sets_dir       | /data/bld/mysql-5.7/share/charsets/ |
+--------------------------+-------------------------------------+
8 rows in set (0.03 sec)
 
MySQL [test]> select @@version, @@version_comment;
+--------------+---------------------+
| @@version    | @@version_comment   |
+--------------+---------------------+
| 5.7.18-debug | Source distribution |
+--------------+---------------------+
1 row in set (0.00 sec)

Comment by Elena Stepanova [ 2017-08-29 ]

See also complaint at stackoverflow about JSON_OBJECT:
https://stackoverflow.com/questions/45932774/mariadb-json-object-bad-encoding

Comment by Rick James (Inactive) [ 2017-08-29 ]

You mentioned PHP's json_encode. Is that being used by the client code? Or MariaDB? In any case, use this as a second argument: JSON_UNESCAPED_UNICODE whenever calling json_encode. (This requires PHP 5.4.0.)

Comment by Alexey Botchkov [ 2017-09-13 ]

I think it's fixed, but it's good if Elena doublechecks it.

Generated at Thu Feb 08 08:07:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.