[MDEV-21124] JSON_UNQUOTE fails to process surrogate pairs Created: 2019-11-22  Updated: 2022-06-01

Status: Open
Project: MariaDB Server
Component/s: JSON
Affects Version/s: 10.4.10, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: Dongsung Kim Assignee: Rucha Deodhar
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File 스크린샷 2019-11-22 오후 2.27.22.png    
Issue Links:
Relates
relates to MDEV-14301 JSON_UNQUOTE returns incorrect result... Stalled
relates to MDEV-19760 json escaped unicode parse error Closed

 Description   

While it is common to use MySQL's `JSON_UNQUOTE(JSON_EXTRACT())` combination, MariaDB additionally offers `JSON_VALUE` function to basically perform the same task. The problem is the results from the two methods differ when it comes to surrogate pairs, which are coincidentally what JSON uses to encode extended characters, e.g., emojis.

It seems the linked issue MDEV-19760 addressed a similar problem in `JSON_VALID` — The fix did resolve a different issue that `JSON`-type fields rejects any surrogate pair because of the default `CHECK(JSON_VALID())` constraint.

Here are simple test cases:

SET @test1 = '"\\u00e4\\u00f6\"';
SET @test2 = '"\\ud83d\\udc6c\\ud83c\\udfff"';
SET @test3 = '"\\ud83d\\ude0a\"';
SELECT JSON_VALUE(@test1, '$') AS VALUE, JSON_UNQUOTE(@test1) as UNQUOTE
UNION
SELECT JSON_VALUE(@test2, '$') AS VALUE, JSON_UNQUOTE(@test2) as UNQUOTE
UNION
SELECT JSON_VALUE(@test3, '$') AS VALUE, JSON_UNQUOTE(@test3) as UNQUOTE;

And the result is attached. (Apparently JIRA does not support emojis.)


Generated at Thu Feb 08 09:04:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.