Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27582

Fulltext search DDL decrements FTS_DOC_ID

Details

    Description

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (id INT, f VARCHAR(64), PRIMARY KEY (id DESC), FULLTEXT ft(f)) ENGINE=InnoDB;
      INSERT INTO t1 VALUES (1,'foo'),(2,'bar');
      DELETE FROM t1 WHERE id = 2;
      ALTER TABLE t1 DROP INDEX ft;
      ALTER TABLE t1 ADD FULLTEXT INDEX ft (f);
      INSERT INTO t1 values (3,'innodb ft search');
      ANALYZE TABLE t1;
      SELECT * FROM t1 WHERE MATCH(f) AGAINST("+innodb +search" IN BOOLEAN MODE);
       
      DROP TABLE t1;
      

      SELECT returns an empty result set

      preview-10.8-MDEV-13756-desc-indexes c10e10c6

      SELECT * FROM t1 WHERE MATCH(f) AGAINST("+innodb +search" IN BOOLEAN MODE);
      id	f
      

      Normally it should return a row:

      SELECT * FROM t1 WHERE MATCH(f) AGAINST("+innodb +search" IN BOOLEAN MODE);
      id	f
      3	innodb ft search
      

      Not reproducible with an ascending PK.
      Not reproducible with otherwise the same test case without re-creating the key.
      Not reproducible on MySQL 8.0.

      Attachments

        Issue Links

          Activity

            Our simplistic attempt of fixing this by making the secondary index records always use ascending order for the primary key columns likely broke index range scans and caused MDEV-27623. So, we have to revert that and reopen this bug.

            The immediate root cause of this wrong result is that the row_merge_read_clustered_index() that is executed during the ADD FULLTEXT INDEX is rewinding cache->next_doc_id from 3 to 2. The FTS_DOC_ID is a hidden column in the table, and in the test case we would have assigned the values 1 and 2 in the past. The value 2 has been delete-marked and possibly purged, but it should not be reassigned when creating a fulltext index, because there could be other fulltext indexes in the table. Thus, I believe that the last INSERT should be assigned FTS_DOC_ID=3 instead of the value 2 that is being assigned.

            As far as I understand, FTS_DOC_ID values must never be reused. The purpose of FTS_DOC_ID is to be a unique row identifier that will be changed whenever a fulltext indexed column is updated.

            In the hidden FTS_DOC_ID_INDEX we have the following (FTS_DOC_ID,id) records at the time we are executing fts_add_doc_by_id() for the INSERT: (1,1),(2,3),(2,2). The last record is delete-marked but had not been purged yet. That is the record that fts_add_doc_by_id() is finding when searching for the (wrongly reused) value FTS_DOC_ID=2.

            I suspect that somewhere in row_merge_read_clustered_index() or row0ftsort.cc we are wrongly assuming that the first PRIMARY KEY column is in ascending order. To fix this wrong result, we should probably fix that.

            marko Marko Mäkelä added a comment - Our simplistic attempt of fixing this by making the secondary index records always use ascending order for the primary key columns likely broke index range scans and caused MDEV-27623 . So, we have to revert that and reopen this bug. The immediate root cause of this wrong result is that the row_merge_read_clustered_index() that is executed during the ADD FULLTEXT INDEX is rewinding cache->next_doc_id from 3 to 2. The FTS_DOC_ID is a hidden column in the table, and in the test case we would have assigned the values 1 and 2 in the past. The value 2 has been delete-marked and possibly purged, but it should not be reassigned when creating a fulltext index, because there could be other fulltext indexes in the table. Thus, I believe that the last INSERT should be assigned FTS_DOC_ID=3 instead of the value 2 that is being assigned. As far as I understand, FTS_DOC_ID values must never be reused. The purpose of FTS_DOC_ID is to be a unique row identifier that will be changed whenever a fulltext indexed column is updated. In the hidden FTS_DOC_ID_INDEX we have the following (FTS_DOC_ID,id) records at the time we are executing fts_add_doc_by_id() for the INSERT : (1,1),(2,3),(2,2). The last record is delete-marked but had not been purged yet. That is the record that fts_add_doc_by_id() is finding when searching for the (wrongly reused) value FTS_DOC_ID=2 . I suspect that somewhere in row_merge_read_clustered_index() or row0ftsort.cc we are wrongly assuming that the first PRIMARY KEY column is in ascending order. To fix this wrong result, we should probably fix that.

            I think that the fix will have to be validated with some result validation testing with a descending PRIMARY KEY, so that we can be confident that the rest of the FULLTEXT search code is free from similar incorrect assumptions.

            marko Marko Mäkelä added a comment - I think that the fix will have to be validated with some result validation testing with a descending PRIMARY KEY , so that we can be confident that the rest of the FULLTEXT search code is free from similar incorrect assumptions.

            Patch is in bb-10.2-MDEV-27582

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.2- MDEV-27582

            I like the simplicity of the fix. However, the innodb_safe_truncate=OFF variant of 10.2 TRUNCATE TABLE would seem to expect to be able to rewind the FTS_DOC_ID to 0.

            Either this should not be fixed in 10.2, or we need two variants of the function fts_update_next_doc_id().

            marko Marko Mäkelä added a comment - I like the simplicity of the fix. However, the innodb_safe_truncate=OFF variant of 10.2 TRUNCATE TABLE would seem to expect to be able to rewind the FTS_DOC_ID to 0. Either this should not be fixed in 10.2, or we need two variants of the function fts_update_next_doc_id() .

            Patch is in bb-10.3-MDEV-27582

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.3- MDEV-27582

            Thank you. If it is possible, I would suggest to merge the function fts_update_next_doc_id() to its only caller, or declare the function as static in the same compilation unit with the caller.

            The call to fts_update_sync_doc_id() inside fts_update_next_doc_id() looks misplaced. It would seem to be better to update the CONFIG table using the internal transaction of ha_innobase::commit_inplace_alter_table(). This is not directly related to the original failure, but a separate implementation quality issue.

            marko Marko Mäkelä added a comment - Thank you. If it is possible, I would suggest to merge the function fts_update_next_doc_id() to its only caller, or declare the function as static in the same compilation unit with the caller. The call to fts_update_sync_doc_id() inside fts_update_next_doc_id() looks misplaced. It would seem to be better to update the CONFIG table using the internal transaction of ha_innobase::commit_inplace_alter_table() . This is not directly related to the original failure, but a separate implementation quality issue.

            People

              thiru Thirunarayanan Balathandayuthapani
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.