Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36299

Issues with collations when using indexed JSON attribute search

Details

    Description

      Let's try to use MDEV-6017 (actually MDEV-35616) to do indexed searches in string field in JSON data.
      A naive user would use JSON "datatype" and VARCHAR without COLLATION clause for string attribute:

      create table t1 (
        doc json
      );
      alter table t1 
        add name varchar(30) as (json_value(doc, '$.name')),
        add index(name);
      

      insert into t1 (doc) values
       ('{"name":"row1"}'),
       ('{"name":"row2"}');
      

      Alas, this won't work:

      explain select * from t1 where json_value(doc, '$.name')='row1';
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 2    | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      1 row in set, 1 warning (0.000 sec)
      

      show warnings;
      +-------+------+---------------------------------------------------------------------------------------------------------------+
      | Level | Code | Message                                                                                                       |
      +-------+------+---------------------------------------------------------------------------------------------------------------+
      | Note  | 1105 | Cannot substitute virtual column expression json_value(`t1`.`doc`,'$.name') -> name due to collation mismatch |
      +-------+------+---------------------------------------------------------------------------------------------------------------+
      

      The collation mismatch is this:

      select collation(doc), collation(json_value(doc, '$.name')), collation(name) from t1 limit 1;
      +----------------+--------------------------------------+-----------------------+
      | collation(doc) | collation(json_value(doc, '$.name')) | collation(name)       |
      +----------------+--------------------------------------+-----------------------+
      | utf8mb4_bin    | utf8mb4_bin                          | utf8mb4_uca1400_ai_ci |
      +----------------+--------------------------------------+-----------------------+
      

      Workaround 1: declare columns as utf8mb4_bin:

      create table t1 (
        doc json
      );
      alter table t1 
        add name varchar(30) collate utf8mb4_bin as (json_value(doc, '$.name')),
        add index(name);
      insert into t1 (doc) values
       ('{"name":"row1"}'),
       ('{"name":"row2"}');
       
      -- No warning:
      explain select * from t1 where json_value(doc, '$.name')='row1';
      

      However, the search will compare the values in utf8mb4_bin. That is, it will be case-sensitive, ignore any collation's rules about accented characters, etc. It is unlikely that the user wants this?

      Workaround 2: Don't use JSON "data-type", use LONGTEXT:

      create table t1 (
        doc longtext CHECK (json_valid(doc))
      );
      alter table t1
        add name varchar(30) as (json_value(doc, '$.name')),
        add index(name);
      

      Here, the doc, JSON_VALUE(...) and name all use utf8mb4_uca1400_ai_ci, so there's no mismatch.

      insert into t1 (doc) values
       ('{"name":"row1"}'),
       ('{"name":"row2"}');
      

      -- No warning:
      explain select * from t1 where json_value(doc, '$.name')='row1';
      

      Solutions

      Changing JSON datatype to use some other collation is hard: compatibility concerns, which collation to use?

      What if we relaxed the rewrite criteria in MDEV-35616 ?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.