Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.2.14, 10.2.22, 10.5.8
    • N/A

    Description

      I'm having oodles of issues with MariaDB 10.2.14 and Fulltext Search. As you can see in the below log line items, a FTS index error is breaking replication. When restarting replication at the very point of breakage, things pick up without issue, as if the FTS issue has been self corrected.

      Searching through Maria Jira, and elsewhere, I'm not finding people with like problems. Any ideas out there?

      Thanks,

      Mike

      2019-03-08  7:02:06 140095257917184 [ERROR] InnoDB: Duplicate FTS_DOC_ID value on table `ddx_practice_7884`.`patients`
      2019-03-08  7:02:06 140095257917184 [ERROR] Cannot find index FTS_DOC_ID_INDEX in InnoDB index translation table.
      2019-03-08  7:02:06 140095257917184 [Warning] Found index FTS_DOC_ID_INDEX in InnoDB index list but not its MariaDB index number. It could be an InnoDB internal index.
      2019-03-08  7:02:06 140095257917184 [ERROR] Cannot find index FTS_DOC_ID_INDEX in InnoDB index translation table.
      2019-03-08  7:02:06 140095257917184 [Warning] Found index FTS_DOC_ID_INDEX in InnoDB index list but not its MariaDB index number. It could be an InnoDB internal index.
      2019-03-08  7:02:06 140095257917184 [ERROR] Slave SQL: Error 'Can't write; duplicate key in table 'patients'' on query. Default database: 'ddx_identities'. Query: 'UPDATE `ddx_practice_7884`.`patients` SET `external_id` = '70231', `update_date` = '2019-03-08 13:02:06' WHERE (id = '962')', Gtid 0-3-486813306, Internal MariaDB error code: 1022
      2019-03-08  7:02:06 140095257917184 [Warning] Slave: Can't write; duplicate key in table 'patients' Error_code: 1022
      2019-03-08  7:02:06 140095257917184 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.005733' position 1060804351
      2019-03-08  7:02:06 140095257917184 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.005733' at position 1060804351
      2019-03-08  7:06:53 140095257917184 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.005733' at position 1060804351, relay log '/var/lib/mysql/relay-bin.004018' position: 1060804650
      

      Attachments

        1. globalstatus.txt
          57 kB
        2. globalvars.txt
          75 kB
        3. innodbstatus.txt
          12 kB
        4. my.cnf
          2 kB

        Issue Links

          Activity

            Thanks, michaelcaplan. I have lowered the priority accordingly. I think that the fulltext search code in InnoDB is lacking clear rules for protecting concurrent access. There could be many subtle race conditions. A support customer also experienced hangs in the past.

            marko Marko Mäkelä added a comment - Thanks, michaelcaplan . I have lowered the priority accordingly. I think that the fulltext search code in InnoDB is lacking clear rules for protecting concurrent access. There could be many subtle race conditions. A support customer also experienced hangs in the past.

            Gianni, we went low tech brute force: `LIKE`

            It is not ideal, but firing up Elastic Search or Sphinx was overkill (and complex) for what we needed.

            michaelcaplan Michael Caplan added a comment - Gianni, we went low tech brute force: `LIKE` It is not ideal, but firing up Elastic Search or Sphinx was overkill (and complex) for what we needed.

            2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.

            ccalender Chris Calender (Inactive) added a comment - 2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
            marko Marko Mäkelä added a comment - - edited

            The upcoming releases (10.2.38, 10.3.29, 10.4.19, 10.5.10) will include better diagnostic messages, so that it should be possible to identify the problematic index. We have been unable to reproduce this problem internally, but hopefully identifying the index will lead to getting more information from a user, and ultimately fixing the bug.

            marko Marko Mäkelä added a comment - - edited The upcoming releases (10.2.38, 10.3.29, 10.4.19, 10.5.10) will include better diagnostic messages, so that it should be possible to identify the problematic index. We have been unable to reproduce this problem internally, but hopefully identifying the index will lead to getting more information from a user, and ultimately fixing the bug.

            In MDEV-24088 I suggested a potential root cause of this failure. This is only a hypothesis until we have more information so that this can be reproduced in-house.

            The fts_commit() call during InnoDB transaction commit seems to be an ACID violation. Replication internally uses the two-phase commit mechanism (XA 2PC). The design constraint is that after XA PREPARE, the only allowed subsequent actions on that transaction are XA COMMIT or XA ROLLBACK. The fts_commit() is acquiring locks and modifying data while the transaction would already be in the XA PREPARE state. Furthermore, if an error occurs during that step, it will be ignored.

            Fixing this design problem would seem to involve substantial code refactoring, and avoiding performance regressions could be challenging. My current understanding is that there is no ‘pre-prepare’ hook in the storage engine handler API that would be able to report a failure. XA PREPARE itself cannot fail. Also a normal 1-phase commit currently does not allow returning errors.

            marko Marko Mäkelä added a comment - In MDEV-24088 I suggested a potential root cause of this failure. This is only a hypothesis until we have more information so that this can be reproduced in-house. The fts_commit() call during InnoDB transaction commit seems to be an ACID violation. Replication internally uses the two-phase commit mechanism (XA 2PC). The design constraint is that after XA PREPARE , the only allowed subsequent actions on that transaction are XA COMMIT or XA ROLLBACK . The fts_commit() is acquiring locks and modifying data while the transaction would already be in the XA PREPARE state. Furthermore, if an error occurs during that step, it will be ignored. Fixing this design problem would seem to involve substantial code refactoring, and avoiding performance regressions could be challenging. My current understanding is that there is no ‘pre-prepare’ hook in the storage engine handler API that would be able to report a failure. XA PREPARE itself cannot fail. Also a normal 1-phase commit currently does not allow returning errors.

            People

              allen.lee@mariadb.com Allen Lee (Inactive)
              michaelcaplan Michael Caplan
              Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.