Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27172

Prefix indices on Spider tables may lead to wrong query results

    XMLWordPrintable

    Details

      Description

      How to reproduce

      Create a table with a prefix index on a TEXT field, given a small prefix length (in my example, 5). And then, insert a record which contain characters more than the given prefix length.

      MariaDB [mytest]> show create table test_table\G
      *************************** 1. row ***************************
             Table: test_table
      Create Table: CREATE TABLE `test_table` (
        `id` int(11) NOT NULL,
        `t` text DEFAULT NULL,
        PRIMARY KEY (`id`),
        KEY `k_t` (`t`(5))
      ) ENGINE=SPIDER DEFAULT CHARSET=latin1
       PARTITION BY LIST (`id` MOD 2)
      (PARTITION `pt0` VALUES IN (0) COMMENT = 'database "mytest_2", table "test_table", server "SPT2"' ENGINE = SPIDER,
       PARTITION `pt1` VALUES IN (1) COMMENT = 'database "mytest_3", table "test_table", server "SPT3"' ENGINE = SPIDER)
      1 row in set (0.00 sec)
       
      MariaDB [mytest]> insert into test_table values(1, "aloha"), (2, "hello world");
      Query OK, 2 rows affected (0.01 sec)
      Records: 2  Duplicates: 0  Warnings: 0
       
      MariaDB [mytest]> select * from test_table;
      +----+-------------+
      | id | t           |
      +----+-------------+
      |  2 | hello world |
      |  1 | aloha       |
      +----+-------------+
      2 rows in set (0.00 sec)
      

      Then we get this:

      MariaDB [mytest]> select * from test_table where t="aloha";
      +----+-------+
      | id | t     |
      +----+-------+
      |  1 | aloha |
      +----+-------+
      1 row in set (0.01 sec)
       
      MariaDB [mytest]> select * from test_table where t="hello world";
      Empty set (0.00 sec)
      

      The general log output can give us some clues, indicating that the string value is truncated to the length of 5 when performing an index read:

      select `id`,`t` from `mytest_2`.`test_table` where `t` = 'hello' and (`t` = 'hello world')
      select `id`,`t` from `mytest_3`.`test_table` where `t` = 'hello' and (`t` = 'hello world')
      

      It also affects JOIN operations

      Create a similar table, having the TEXT field values equal to the one above.

      MariaDB [mytest]> show create table another_test_table\G
      *************************** 1. row ***************************
             Table: another_test_table
      Create Table: CREATE TABLE `another_test_table` (
        `id` int(11) NOT NULL,
        `t` text DEFAULT NULL,
        PRIMARY KEY (`id`),
        KEY `k_t` (`t`(5))
      ) ENGINE=SPIDER DEFAULT CHARSET=latin1
       PARTITION BY LIST (`id` MOD 2)
      (PARTITION `pt0` VALUES IN (0) COMMENT = 'database "mytest_2", table "another_test_table", server "SPT2"' ENGINE = SPIDER,
       PARTITION `pt1` VALUES IN (1) COMMENT = 'database "mytest_3", table "another_test_table", server "SPT3"' ENGINE = SPIDER)
      1 row in set (0.00 sec)
       
      MariaDB [mytest]> select * from another_test_table;
      +----+-------------+
      | id | t           |
      +----+-------------+
      | 44 | hello world |
      | 33 | aloha       |
      +----+-------------+
      2 rows in set (0.01 sec)
      

      Perform a JOIN between these two tables, on the TEXT field. The result set has only 1 row, instead of 2 as expected, obviously for the same reason.

      MariaDB [mytest]> select * from test_table a straight_join another_test_table b on a.t=b.t;
      +----+-------+----+-------+
      | id | t     | id | t     |
      +----+-------+----+-------+
      |  1 | aloha | 33 | aloha |
      +----+-------+----+-------+
      1 row in set (0.00 sec)
      

      Suggested fix

      If we consider this a bug, instead of an "expected behavior", I have some humble opinion on the possible fix.

      After looking into the code, I found the code where Spider prints the truncated value into the queries resides in spider_db_mysql_util::append_column_value, where the value passed by the server layer is already truncated to the length of the prefix key. Seems like there is not a lot we can do to fix this without changing the server's behavior.

      So, I suggest we can treat prefix indices differently in the "append key where" stage, when an EQUAL matching is performed on a prefix index. That is, in the queries sent to backends, replace "<field_name>='<truncated_str_val>'" with "<field_name> LIKE '<truncated_str_val>%'". In my example above, "t='hello'" should be replaced with "t LIKE 'hello%'". This way, the desired records could be matched.

      There are also some cons I can think of for this hack. First is the performance issue, since an EQUAL reference is turned into a RANGE operation. Second is the backends possibly returning more results than desired. For this matter, we do not need to worry about the correctness of results returned to the client, since the server does the evaluation of records itself. However, the issue of more records needing to process is also a performance influencer though.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              nayuta-yanagisawa Nayuta Yanagisawa
              Reporter:
              DanielYe133 Daniel YE
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.