[MDEV-14301] JSON_UNQUOTE returns incorrect results depending on the collation Created: 2017-11-07  Updated: 2023-12-05

Status: Stalled
Project: MariaDB Server
Component/s: JSON
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: William Chiquito Assignee: Rucha Deodhar
Resolution: Unresolved Votes: 2
Labels: None

Issue Links:
Relates
relates to MDEV-21124 JSON_UNQUOTE fails to process surroga... Open
relates to MDEV-25131 using COLLATE utf8mb4_unicode_ci with... Open
Sprint: 10.2.12

 Description   

MariaDB

MariaDB [(none)]> SELECT VERSION();
+-----------------+
| VERSION()       |
+-----------------+
| 10.2.10-MariaDB |
+-----------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> SET @`json` := '{"sv":"Börk"}';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [(none)]> SELECT
    ->   JSON_UNQUOTE(
    ->     JSON_EXTRACT(CAST(@`json` AS CHAR CHARACTER SET latin1), '$.sv')
    ->   ) `latin1`,
    ->   JSON_UNQUOTE(
    ->     JSON_EXTRACT(CAST(@`json` AS CHAR CHARACTER SET utf8mb4), '$.sv')
    -> ) `utf8mb4`;
+---------+---------+
| latin1  | utf8mb4 |
+---------+---------+
| "Börk"  | Börk    |
+---------+---------+
1 row in set (0.00 sec)

MySQL

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.7.20    |
+-----------+
1 row in set (0.00 sec)
 
mysql> SET @`json` := '{"sv":"Börk"}';
Query OK, 0 rows affected (0.00 sec)
 
mysql> SELECT
    ->   JSON_UNQUOTE(
    ->     JSON_EXTRACT(CAST(@`json` AS CHAR CHARACTER SET latin1), '$.sv')
    ->   ) `latin1`,
    ->   JSON_UNQUOTE(
    ->     JSON_EXTRACT(CAST(@`json` AS CHAR CHARACTER SET utf8mb4), '$.sv')
    ->   ) `utf8mb4`;
+--------+---------+
| latin1 | utf8mb4 |
+--------+---------+
| Börk   | Börk    |
+--------+---------+
1 row in set (0.00 sec)



 Comments   
Comment by Jordan Lane [ 2020-07-06 ]

I think I found a bug related to this issue, let me know if I should create it separately. All of my tables and columns are set to utf8_unicode_ci.

This works:

set names 'utf8' collate 'utf8_unicode_ci';
select * from `tags` where JSON_UNQUOTE(
    JSON_EXTRACT('name', '$."en"')
) collate utf8_unicode_ci like '%fish%' ;

This doesn't:

set names 'utf8' collate 'utf8_unicode_ci';
select * from `tags` where JSON_UNQUOTE(
    JSON_EXTRACT('name', '$."en"')
) like '%fish%' ;
#1267 - Illegal mix of collations (utf8_general_ci,COERCIBLE) and (utf8_unicode_ci,COERCIBLE) for operation 'like'

Comment by Roman Stingler (Inactive) [ 2021-02-04 ]

@Jordan thanks for the examples

Generated at Thu Feb 08 08:12:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.