Details
Description
How to reproduce
Create a table with a prefix index on a TEXT field, given a small prefix length (in my example, 5). And then, insert a record which contain characters more than the given prefix length.
MariaDB [mytest]> show create table test_table\G
|
*************************** 1. row ***************************
|
Table: test_table
|
Create Table: CREATE TABLE `test_table` (
|
`id` int(11) NOT NULL,
|
`t` text DEFAULT NULL,
|
PRIMARY KEY (`id`),
|
KEY `k_t` (`t`(5))
|
) ENGINE=SPIDER DEFAULT CHARSET=latin1
|
PARTITION BY LIST (`id` MOD 2)
|
(PARTITION `pt0` VALUES IN (0) COMMENT = 'database "mytest_2", table "test_table", server "SPT2"' ENGINE = SPIDER,
|
PARTITION `pt1` VALUES IN (1) COMMENT = 'database "mytest_3", table "test_table", server "SPT3"' ENGINE = SPIDER)
|
1 row in set (0.00 sec)
|
|
MariaDB [mytest]> insert into test_table values(1, "aloha"), (2, "hello world");
|
Query OK, 2 rows affected (0.01 sec)
|
Records: 2 Duplicates: 0 Warnings: 0
|
|
MariaDB [mytest]> select * from test_table;
|
+----+-------------+
|
| id | t |
|
+----+-------------+
|
| 2 | hello world |
|
| 1 | aloha |
|
+----+-------------+
|
2 rows in set (0.00 sec)
|
Then we get this:
MariaDB [mytest]> select * from test_table where t="aloha";
|
+----+-------+
|
| id | t |
|
+----+-------+
|
| 1 | aloha |
|
+----+-------+
|
1 row in set (0.01 sec)
|
|
MariaDB [mytest]> select * from test_table where t="hello world";
|
Empty set (0.00 sec)
|
The general log output can give us some clues, indicating that the string value is truncated to the length of 5 when performing an index read:
select `id`,`t` from `mytest_2`.`test_table` where `t` = 'hello' and (`t` = 'hello world')
|
select `id`,`t` from `mytest_3`.`test_table` where `t` = 'hello' and (`t` = 'hello world')
|
It also affects JOIN operations
Create a similar table, having the TEXT field values equal to the one above.
MariaDB [mytest]> show create table another_test_table\G
|
*************************** 1. row ***************************
|
Table: another_test_table
|
Create Table: CREATE TABLE `another_test_table` (
|
`id` int(11) NOT NULL,
|
`t` text DEFAULT NULL,
|
PRIMARY KEY (`id`),
|
KEY `k_t` (`t`(5))
|
) ENGINE=SPIDER DEFAULT CHARSET=latin1
|
PARTITION BY LIST (`id` MOD 2)
|
(PARTITION `pt0` VALUES IN (0) COMMENT = 'database "mytest_2", table "another_test_table", server "SPT2"' ENGINE = SPIDER,
|
PARTITION `pt1` VALUES IN (1) COMMENT = 'database "mytest_3", table "another_test_table", server "SPT3"' ENGINE = SPIDER)
|
1 row in set (0.00 sec)
|
|
MariaDB [mytest]> select * from another_test_table;
|
+----+-------------+
|
| id | t |
|
+----+-------------+
|
| 44 | hello world |
|
| 33 | aloha |
|
+----+-------------+
|
2 rows in set (0.01 sec)
|
Perform a JOIN between these two tables, on the TEXT field. The result set has only 1 row, instead of 2 as expected, obviously for the same reason.
MariaDB [mytest]> select * from test_table a straight_join another_test_table b on a.t=b.t;
|
+----+-------+----+-------+
|
| id | t | id | t |
|
+----+-------+----+-------+
|
| 1 | aloha | 33 | aloha |
|
+----+-------+----+-------+
|
1 row in set (0.00 sec)
|
Suggested fix
If we consider this a bug, instead of an "expected behavior", I have some humble opinion on the possible fix.
After looking into the code, I found the code where Spider prints the truncated value into the queries resides in spider_db_mysql_util::append_column_value, where the value passed by the server layer is already truncated to the length of the prefix key. Seems like there is not a lot we can do to fix this without changing the server's behavior.
So, I suggest we can treat prefix indices differently in the "append key where" stage, when an EQUAL matching is performed on a prefix index. That is, in the queries sent to backends, replace "<field_name>='<truncated_str_val>'" with "<field_name> LIKE '<truncated_str_val>%'". In my example above, "t='hello'" should be replaced with "t LIKE 'hello%'". This way, the desired records could be matched.
There are also some cons I can think of for this hack. First is the performance issue, since an EQUAL reference is turned into a RANGE operation. Second is the backends possibly returning more results than desired. For this matter, we do not need to worry about the correctness of results returned to the client, since the server does the evaluation of records itself. However, the issue of more records needing to process is also a performance influencer though.
Attachments
Issue Links
- relates to
-
MDEV-26287 Refactor spider_db_append_key_where_internal()
- Stalled