Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8433

Make field<'broken-string' use indexes

Details

    • 10.1.7-2, 10.1.8-1

    Description

      Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

      SET NAMES 'utf8';
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (
        id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
        KEY(a)
      ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
      INSERT INTO t1 (a) VALUES ('admin');
      INSERT INTO t1 (a) VALUES ('admin;');
      INSERT INTO t1 (a) VALUES ('adminx');
      EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';

      returns

      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | a             | NULL | NULL    | NULL |    3 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+

      Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

      Using indexes was disabled to have consistent result set with and without index.

      Now with MDEV-8036 done we can enable using indexes again.

      get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
      This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

      Possible solutions:

      • Add a new parameter to Field::store()
      • Add a new method, say Field::store_for_search()
      • Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to handle bad bytes differently (on search).
      • Add new classes Field_string_key, Field_varstring_key, Field_blob_key and overwrite their store() methods

      As soon as we're able to store bad values into the Field buffer in a way suitable for search (instead of just replacing to '?'), the collation should be able to do index scan in exactly the same way with a non-indexed search.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Rank Ranked higher
            bar Alexander Barkov made changes -
            Rank Ranked higher
            bar Alexander Barkov made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            Sprint 10.1.7-2 [ 12 ]
            bar Alexander Barkov made changes -
            Rank Ranked higher
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            bar Alexander Barkov made changes -
            Description Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 (almost) done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to preserve bad bytes as is (on search).

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to preserve bad bytes as is (on search).

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            bar Alexander Barkov made changes -
            Description Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to preserve bad bytes as is (on search).

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to preserve bad bytes as is (on search).

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            bar Alexander Barkov made changes -
            Description Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to preserve bad bytes as is (on search).

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to preserve bad bytes as is (on search).
            - Add new classes Field_string_key, Field_varstring_key, Field_blob_key and overwrite their store() methods

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            bar Alexander Barkov made changes -
            Description Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to preserve bad bytes as is (on search).
            - Add new classes Field_string_key, Field_varstring_key, Field_blob_key and overwrite their store() methods

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to handle bad bytes differently (on search).
            - Add new classes Field_string_key, Field_varstring_key, Field_blob_key and overwrite their store() methods

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            bar Alexander Barkov made changes -
            Description Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to handle bad bytes differently (on search).
            - Add new classes Field_string_key, Field_varstring_key, Field_blob_key and overwrite their store() methods

            As soon as we're able to store bad values into the Field buffer, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to handle bad bytes differently (on search).
            - Add new classes Field_string_key, Field_varstring_key, Field_blob_key and overwrite their store() methods

            As soon as we're able to store bad values into the Field buffer in a way suitable for search, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            bar Alexander Barkov made changes -
            Description Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to handle bad bytes differently (on search).
            - Add new classes Field_string_key, Field_varstring_key, Field_blob_key and overwrite their store() methods

            As soon as we're able to store bad values into the Field buffer in a way suitable for search, the collation should be able to do index scan in exactly the same way with a non-indexed search.
            Under terms of MDEV-7649 we disabled using indexes when comparing a field to a broken character string when processing non-equality operations <, >, <=, =>, <> , for example:

            {code}
            SET NAMES 'utf8';
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              a varchar(60) COLLATE utf8_general_ci NOT NULL DEFAULT '',
              KEY(a)
            ) ENGINE=MyISAM CHARSET=utf8 COLLATE=utf8_general_ci;
            INSERT INTO t1 (a) VALUES ('admin');
            INSERT INTO t1 (a) VALUES ('admin;');
            INSERT INTO t1 (a) VALUES ('adminx');
            EXPLAIN SELECT * FROM t1 WHERE a < 'admin��';
            {code}
            returns
            {noformat}
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 3 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}

            Notice, '��' is a broken utf8 character (it's a valid utf8mb4 character).

            Using indexes was disabled to have consistent result set with and without index.

            Now with MDEV-8036 done we can enable using indexes again.

            get_mm_leaf() calls Field::store() to put the value to search into the Field buffer.
            This task will need changes in Fields, which should be able to store broken byte sequences somehow for index search purposes, as the normal behavior of Field::store() to replace bad bytes to question marks won't work for search optimization purposes.

            Possible solutions:
            - Add a new parameter to Field::store()
            - Add a new method, say Field::store_for_search()
            - Add a new flag into Field::flags and make Field::store() check this flag to decide whether to replace bad bytes to question marks (on INSERT/UPDATE), or to handle bad bytes differently (on search).
            - Add new classes Field_string_key, Field_varstring_key, Field_blob_key and overwrite their store() methods

            As soon as we're able to store bad values into the Field buffer in a way suitable for search (instead of just replacing to '?'), the collation should be able to do index scan in exactly the same way with a non-indexed search.
            serg Sergei Golubchik made changes -
            Rank Ranked higher
            ratzpo Rasmus Johansson (Inactive) made changes -
            Sprint 10.1.7-2 [ 12 ] 10.1.7-2, 10.1.8-1 [ 12, 13 ]
            serg Sergei Golubchik made changes -
            Sprint 10.1.7-2, 10.1.8-1 [ 12, 13 ] 10.1.7-2, 10.1.8-1, 10.1.8-2 [ 12, 13, 14 ]
            serg Sergei Golubchik made changes -
            Rank Ranked higher
            ratzpo Rasmus Johansson (Inactive) made changes -
            Sprint 10.1.7-2, 10.1.8-1, 10.1.8-2 [ 12, 13, 14 ] 10.1.7-2, 10.1.8-1, 10.1.8-3 [ 12, 13, 15 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Rank Ranked lower
            serg Sergei Golubchik made changes -
            Sprint 10.1.7-2, 10.1.8-1, 10.1.8-3 [ 12, 13, 15 ] 10.1.7-2, 10.1.8-1 [ 12, 13 ]
            serg Sergei Golubchik made changes -
            Sprint 10.1.7-2, 10.1.8-1 [ 12, 13 ] 10.1.7-2, 10.1.8-1, 10.1.10 [ 12, 13, 24 ]
            serg Sergei Golubchik made changes -
            Sprint 10.1.7-2, 10.1.8-1, 10.1.10 [ 12, 13, 24 ] 10.1.7-2, 10.1.8-1 [ 12, 13 ]
            bar Alexander Barkov made changes -
            Labels performance
            julien.fritsch Julien Fritsch made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 70510 ] MariaDB v4 [ 131762 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 10.1 [ 16100 ]
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.5 [ 29506 ]
            Fix Version/s 11.4 [ 29301 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.6 [ 29515 ]
            Fix Version/s 11.5 [ 29506 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.7 [ 29815 ]
            Fix Version/s 11.6 [ 29515 ]
            serg Sergei Golubchik made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.8 [ 29921 ]
            Fix Version/s 11.7 [ 29815 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.8 [ 29921 ]

            People

              serg Sergei Golubchik
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.