Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.8
Description
Let's try to use MDEV-6017 (actually MDEV-35616) to do indexed searches in string field in JSON data.
A naive user would use JSON "datatype" and VARCHAR without COLLATION clause for string attribute:
create table t1 ( |
doc json
|
);
|
alter table t1 |
add name varchar(30) as (json_value(doc, '$.name')), |
add index(name); |
insert into t1 (doc) values |
('{"name":"row1"}'), |
('{"name":"row2"}'); |
Alas, this won't work:
explain select * from t1 where json_value(doc, '$.name')='row1'; |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+ |
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+ |
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 2 | Using where | |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+ |
1 row in set, 1 warning (0.000 sec) |
show warnings;
|
+-------+------+---------------------------------------------------------------------------------------------------------------+
|
| Level | Code | Message |
|
+-------+------+---------------------------------------------------------------------------------------------------------------+
|
| Note | 1105 | Cannot substitute virtual column expression json_value(`t1`.`doc`,'$.name') -> name due to collation mismatch |
|
+-------+------+---------------------------------------------------------------------------------------------------------------+
|
The collation mismatch is this:
select collation(doc), collation(json_value(doc, '$.name')), collation(name) from t1 limit 1;
|
+----------------+--------------------------------------+-----------------------+
|
| collation(doc) | collation(json_value(doc, '$.name')) | collation(name) |
|
+----------------+--------------------------------------+-----------------------+
|
| utf8mb4_bin | utf8mb4_bin | utf8mb4_uca1400_ai_ci |
|
+----------------+--------------------------------------+-----------------------+
|
Workaround 1: declare columns as utf8mb4_bin:
create table t1 ( |
doc json
|
);
|
alter table t1 |
add name varchar(30) collate utf8mb4_bin as (json_value(doc, '$.name')), |
add index(name); |
insert into t1 (doc) values |
('{"name":"row1"}'), |
('{"name":"row2"}'); |
|
-- No warning:
|
explain select * from t1 where json_value(doc, '$.name')='row1'; |
However, the search will compare the values in utf8mb4_bin. That is, it will be case-sensitive, ignore any collation's rules about accented characters, etc. It is unlikely that the user wants this?
Workaround 2: Don't use JSON "data-type", use LONGTEXT:
create table t1 ( |
doc longtext CHECK (json_valid(doc)) |
);
|
alter table t1 |
add name varchar(30) as (json_value(doc, '$.name')), |
add index(name); |
Here, the doc, JSON_VALUE(...) and name all use utf8mb4_uca1400_ai_ci, so there's no mismatch.
insert into t1 (doc) values |
('{"name":"row1"}'), |
('{"name":"row2"}'); |
-- No warning:
|
explain select * from t1 where json_value(doc, '$.name')='row1'; |
Solutions
Changing JSON datatype to use some other collation is hard: compatibility concerns, which collation to use?
What if we relaxed the rewrite criteria in MDEV-35616 ?
Attachments
Issue Links
- relates to
-
MDEV-35616 Add basic optimizer support for virtual columns
-
- Closed
-
-
MDEV-6017 Add support for Indexes on Expressions
-
- Stalled
-