Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35172

FULLTEXT INDEX works incorrectly with words that are equal in the collation

Details

    Description

      MySQL 8.0.40 includes the following change, which seems to be applicable to MariaDB Server:
      Bug#36652127 InnoDB: FTS table optimization not working correctly with collation
      No test case is included, but source code comments mention Chinese, Japanese, Korean (CJK) collations.

      Attachments

        Issue Links

          Activity

            This patch just fixes the optimization during fulltext optimization. Basically, it avoid reading the words that need optimization multiple times.
            This way it can avoid calling fts_optimize_write() multiple times.

            Wrote a test case to check how collation behaves in fulltext index:

            SET NAMES utf8;
            CREATE TABLE t1 (
            f1 INT NOT NULL PRIMARY KEY,
            f2 VARCHAR(64) COLLATE utf8mb4_general_ci,
            FULLTEXT ft(f2)) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (1,'antenna');
            ALTER TABLE t1 FORCE;
            INSERT INTO t1 VALUES (2, 'äntenna'), (3, 'äntenna');
            DELETE FROM t1 WHERE f1 > 2;
            SELECT * FROM t1;
            f1	f2
            1	antenna
            2	äntenna
            SET GLOBAL innodb_optimize_fulltext_only=ON;
            OPTIMIZE TABLE t1;
            Table	Op	Msg_type	Msg_text
            test.t1	optimize	status	OK
            SELECT * FROM t1 WHERE MATCH(f2) AGAINST("antenna" IN BOOLEAN MODE);
            f1	f2
            2	äntenna
            DROP TABLE t1;
            SET GLOBAL innodb_optimize_fulltext_only=OFF;
            

            This happens only in MariaDB. As discussed with marko, reduce the priority of the issue.

            thiru Thirunarayanan Balathandayuthapani added a comment - This patch just fixes the optimization during fulltext optimization. Basically, it avoid reading the words that need optimization multiple times. This way it can avoid calling fts_optimize_write() multiple times. Wrote a test case to check how collation behaves in fulltext index: SET NAMES utf8; CREATE TABLE t1 ( f1 INT NOT NULL PRIMARY KEY, f2 VARCHAR(64) COLLATE utf8mb4_general_ci, FULLTEXT ft(f2)) ENGINE=InnoDB; INSERT INTO t1 VALUES (1,'antenna'); ALTER TABLE t1 FORCE; INSERT INTO t1 VALUES (2, 'äntenna'), (3, 'äntenna'); DELETE FROM t1 WHERE f1 > 2; SELECT * FROM t1; f1 f2 1 antenna 2 äntenna SET GLOBAL innodb_optimize_fulltext_only=ON; OPTIMIZE TABLE t1; Table Op Msg_type Msg_text test.t1 optimize status OK SELECT * FROM t1 WHERE MATCH(f2) AGAINST("antenna" IN BOOLEAN MODE); f1 f2 2 äntenna DROP TABLE t1; SET GLOBAL innodb_optimize_fulltext_only=OFF; This happens only in MariaDB. As discussed with marko , reduce the priority of the issue.

            In fts_optimize_write_word(), InnoDB deletes the existing word in the auxiliary table and adds the optimized entry back into the table.
            Since it is collation equivalent, delete both the entries from the table and insert only one back into the table. This leads to missing
            row in optimize table of fulltext index.

            thiru Thirunarayanan Balathandayuthapani added a comment - In fts_optimize_write_word() , InnoDB deletes the existing word in the auxiliary table and adds the optimized entry back into the table. Since it is collation equivalent, delete both the entries from the table and insert only one back into the table. This leads to missing row in optimize table of fulltext index.

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.