[MDEV-29728] Wrong result for FULLTEXT InnoDB search Created: 2022-10-06  Updated: 2023-11-28

Status: Confirmed
Project: MariaDB Server
Component/s: Full-text Search, Storage Engine - InnoDB
Affects Version/s: 10.6.9, 10.9.3, 10.5, 10.6, 10.7, 10.8, 10.9
Fix Version/s: 10.5, 10.6

Type: Bug Priority: Major
Reporter: Stepan Assignee: Thirunarayanan Balathandayuthapani
Resolution: Unresolved Votes: 0
Labels: regression-10.5


 Description   

Here's a description of what I'm observing:

1. Set up tables, add a few rows into the database.
2. Execute a select query using fulltext search: it returns 0 rows (which is incorrect, it should return 1).
3. Execute another select query (which is a modified version of (2))
4. Execute the same select query from (2): now it returns 2 rows (which is also incorrect, it should return 1).

Here's SQL script to reproduce (queries in question are 3 SELECTs at the bottom):

SET autocommit = 1;
CREATE TABLE `users_docs_0` ( `internal_table_id` binary(16) NOT NULL, `record_id` binary(16) NOT NULL, `created_at` bigint NOT NULL, `updated_at` bigint NOT NULL, `doc` text NOT NULL, PRIMARY KEY (`internal_table_id`, `record_id`) ) DEFAULT CHARSET=utf8mb4 ENGINE=InnoDB;
CREATE INDEX `users_docs_0_on_created_at` ON `users_docs_0` (`internal_table_id`, `created_at`);
CREATE INDEX `users_docs_0_on_updated_at` ON `users_docs_0` (`internal_table_id`, `updated_at`);
CREATE TABLE `users_keys_0` ( `internal_table_id` binary(16) NOT NULL, `record_id` binary(16) NOT NULL, `field_id` binary(16) NOT NULL, `string` varchar(255), `int` bigint, `double` double, PRIMARY KEY (`internal_table_id`, `record_id`, `field_id`) ) DEFAULT CHARSET=UTF8 ENGINE=InnoDB;
CREATE INDEX `user_0_on_string` ON `users_keys_0` (`field_id`, `string`, `record_id`);
CREATE INDEX `user_0_on_integer` ON `users_keys_0` (`field_id`, `int`, `record_id`);
CREATE INDEX `user_0_on_double` ON `users_keys_0` (`field_id`, `double`, `record_id`);
CREATE FULLTEXT INDEX `user_0_fulltext` ON `users_keys_0` (`string`);
INSERT INTO
  `users_docs_0` (
    `doc`,
    `record_id`,
    `internal_table_id`,
    `created_at`,
    `updated_at`
  )
VALUES
  ('', UNHEX('1315f677da2343ebb668867397b07214'), UNHEX('fed68cf1227f4806b7a2f77f32243563'), 1664361776639022000, 1664361776639022000);
INSERT INTO
  `users_keys_0` (
    `internal_table_id`,
    `record_id`,
    `field_id`,
    `string`,
    `int`,
    `double`
  )
VALUES
  (UNHEX('fed68cf1227f4806b7a2f77f32243563'), UNHEX('1315f677da2343ebb668867397b07214'), UNHEX('89427ccd256843d0be8bd466c6853960'), 'Tolkien', NULL, NULL),
  (UNHEX('fed68cf1227f4806b7a2f77f32243563'), UNHEX('1315f677da2343ebb668867397b07214'), UNHEX('73c5867ffab34e12bb9040b443d7d9e2'), 'Previous Title', NULL, NULL);
INSERT INTO
  `users_docs_0` (
    `doc`,
    `record_id`,
    `internal_table_id`,
    `created_at`,
    `updated_at`
  )
VALUES
  ('', UNHEX('c25722a6bc754e2db5d64b0d7532aca0'), UNHEX('fed68cf1227f4806b7a2f77f32243563'), 1664361776690081000, 1664361776690081000);
INSERT INTO
  `users_keys_0` (
    `internal_table_id`,
    `record_id`,
    `field_id`,
    `string`,
    `int`,
    `double`
  )
VALUES
  (UNHEX('fed68cf1227f4806b7a2f77f32243563'), UNHEX('c25722a6bc754e2db5d64b0d7532aca0'), UNHEX('89427ccd256843d0be8bd466c6853960'), 'JK Rowling', NULL, NULL),
  (UNHEX('fed68cf1227f4806b7a2f77f32243563'), UNHEX('c25722a6bc754e2db5d64b0d7532aca0'), UNHEX('73c5867ffab34e12bb9040b443d7d9e2'), 'Harry Potter', NULL, NULL);
 
-- Run test query. It returns empty set (bug?).
\! echo "Running test query (expected 1 record with title='Harry Potter')";
SELECT
  `o`.`record_id`,
  `o`.`string` AS author,
  `c0`.`string` AS title
FROM
  `users_keys_0` AS `o` USE INDEX (user_0_on_string)
  INNER JOIN `users_keys_0` AS `c0` ON `c0`.`record_id` = `o`.`record_id`
  AND `c0`.`internal_table_id` = `o`.`internal_table_id`
  AND `c0`.`field_id` = UNHEX('73c5867ffab34e12bb9040b443d7d9e2')
  AND MATCH (c0.string) AGAINST ('potter')
WHERE
  `o`.`internal_table_id` = UNHEX('fed68cf1227f4806b7a2f77f32243563')
  AND `o`.`field_id` = UNHEX('89427ccd256843d0be8bd466c6853960')
ORDER BY
  `o`.`string` ASC,
  `o`.`record_id` ASC
LIMIT
  1000;
 
-- Running same query, with `field_id` join condition removed.
\! echo "Running modified query";
SELECT
  `o`.`record_id`,
  `o`.`string` AS author,
  `c0`.`string` AS title
FROM
  `users_keys_0` AS `o` USE INDEX (user_0_on_string)
  INNER JOIN `users_keys_0` AS `c0` ON `c0`.`record_id` = `o`.`record_id`
  AND `c0`.`internal_table_id` = `o`.`internal_table_id`
  -- AND `c0`.`field_id` = UNHEX('73c5867ffab34e12bb9040b443d7d9e2')
  AND MATCH (c0.string) AGAINST ('potter')
WHERE
  `o`.`internal_table_id` = UNHEX('fed68cf1227f4806b7a2f77f32243563')
  AND `o`.`field_id` = UNHEX('89427ccd256843d0be8bd466c6853960')
ORDER BY
  `o`.`string` ASC,
  `o`.`record_id` ASC
LIMIT
  1000;
 
-- Running test query again. This time it returns 2 records. Which is also incorrect.
\! echo "Running test query (expected 1 record with title='Harry Potter')";
SELECT
  `o`.`record_id`,
  `o`.`string` AS author,
  `c0`.`string` AS title
FROM
  `users_keys_0` AS `o` USE INDEX (user_0_on_string)
  INNER JOIN `users_keys_0` AS `c0` ON `c0`.`record_id` = `o`.`record_id`
  AND `c0`.`internal_table_id` = `o`.`internal_table_id`
  AND `c0`.`field_id` = UNHEX('73c5867ffab34e12bb9040b443d7d9e2')
  AND MATCH (c0.string) AGAINST ('potter')
WHERE
  `o`.`internal_table_id` = UNHEX('fed68cf1227f4806b7a2f77f32243563')
  AND `o`.`field_id` = UNHEX('89427ccd256843d0be8bd466c6853960')
ORDER BY
  `o`.`string` ASC,
  `o`.`record_id` ASC
LIMIT
  1000;

Locally I can reproduce this quickly with Docker. Save SQL above as "statements.sql" and then:

docker run -d -e MARIADB_ROOT_PASSWORD=test -e MARIADB_DATABASE=test -p 3306:3306 mariadb:latest
mysql -u root -ptest --protocol=tcp test
mysql> source statements.sql



 Comments   
Comment by Alice Sherepa [ 2022-10-10 ]

Thank you for the report and the test case!
I repeated on versions 10.5-10.10, 10.3 and 10.4 returned correct results.

--source include/have_innodb.inc 
 
CREATE TABLE t1 ( id int, a int, b int, s1 varchar(255),
  PRIMARY KEY (id,a,b),
  KEY ind1 (b,s1),
  FULLTEXT KEY (s1)
) ENGINE=InnoDB ;
 
 
INSERT INTO t1 VALUES (1, 1, 1, 'Previous Title'), (1, 2, 1, 'Harry Potter');
analyze table t1;
 
SELECT c0.s1
FROM t1 
JOIN t1 AS c0 ON c0.id = t1.id
AND c0.b = 1
AND MATCH (c0.s1) against ('potter') ;
 
SELECT c0.s1
FROM t1 
JOIN t1 AS c0 ON c0.id = t1.id
#AND c0.b = 1
AND MATCH (c0.s1) against ('potter') ;
 
SELECT c0.s1
FROM t1 
JOIN t1 AS c0 ON c0.id = t1.id
AND c0.b = 1
AND MATCH (c0.s1) against ('potter') ;
 
drop table t1;

MariaDB [f]> SELECT c0.s1
    -> FROM t1 
    -> JOIN t1 AS c0 ON c0.id = t1.id
    -> AND c0.b = 1
    -> AND MATCH (c0.s1) against ('potter') ;
Empty set (0,002 sec)
 
MariaDB [f]> analyze
    -> SELECT c0.s1
    -> FROM t1 
    -> JOIN t1 AS c0 ON c0.id = t1.id
    -> AND c0.b = 1
    -> AND MATCH (c0.s1) against ('potter') ;
+------+-------------+-------+-------+-----------------+---------+---------+-------+------+--------+----------+------------+--------------------------+
| id   | select_type | table | type  | possible_keys   | key     | key_len | ref   | rows | r_rows | filtered | r_filtered | Extra                    |
+------+-------------+-------+-------+-----------------+---------+---------+-------+------+--------+----------+------------+--------------------------+
|    1 | SIMPLE      | t1    | index | PRIMARY         | PRIMARY | 12      | NULL  | 2    | 2.00   |   100.00 |     100.00 | Using index              |
|    1 | SIMPLE      | c0    | ref   | PRIMARY,ind1,s1 | ind1    | 4       | const | 2    | 2.00   |   100.00 |       0.00 | Using where; Using index |
+------+-------------+-------+-------+-----------------+---------+---------+-------+------+--------+----------+------------+--------------------------+
2 rows in set (0,002 sec)
 
MariaDB [f]> SELECT c0.s1
    -> FROM t1 
    -> JOIN t1 AS c0 ON c0.id = t1.id
    -> #AND c0.b = 1
    -> AND MATCH (c0.s1) against ('potter') ;
+--------------+
| s1           |
+--------------+
| Harry Potter |
| Harry Potter |
+--------------+
2 rows in set (0,001 sec)
 
MariaDB [f]> SELECT c0.s1
    -> FROM t1 
    -> JOIN t1 AS c0 ON c0.id = t1.id
    -> AND c0.b = 1
    -> AND MATCH (c0.s1) against ('potter') ;
+----------------+
| s1             |
+----------------+
| Harry Potter   |
| Previous Title |
| Harry Potter   |
| Previous Title |
+----------------+
4 rows in set (0,001 sec)
 
MariaDB [f]> analyze
    -> SELECT c0.s1
    -> FROM t1 
    -> JOIN t1 AS c0 ON c0.id = t1.id
    -> AND c0.b = 1
    -> AND MATCH (c0.s1) against ('potter') ;
+------+-------------+-------+-------+-----------------+---------+---------+-------+------+--------+----------+------------+--------------------------+
| id   | select_type | table | type  | possible_keys   | key     | key_len | ref   | rows | r_rows | filtered | r_filtered | Extra                    |
+------+-------------+-------+-------+-----------------+---------+---------+-------+------+--------+----------+------------+--------------------------+
|    1 | SIMPLE      | t1    | index | PRIMARY         | PRIMARY | 12      | NULL  | 2    | 2.00   |   100.00 |     100.00 | Using index              |
|    1 | SIMPLE      | c0    | ref   | PRIMARY,ind1,s1 | ind1    | 4       | const | 2    | 2.00   |   100.00 |     100.00 | Using where; Using index |
+------+-------------+-------+-------+-----------------+---------+---------+-------+------+--------+----------+------------+--------------------------+
2 rows in set (0,001 sec)

Comment by Andrey Razumovsky [ 2022-10-20 ]

Hi Team, is there any update/plans on this? Seems like this is quite a major issue, and a regression as well

Generated at Thu Feb 08 10:10:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.