[MDEV-18868] FTS Breaking Replication Created: 2019-03-09  Updated: 2021-08-02  Resolved: 2021-08-02

Status: Closed
Project: MariaDB Server
Component/s: Full-text Search, Storage Engine - InnoDB
Affects Version/s: 10.2.14, 10.2.22, 10.5.8
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Michael Caplan Assignee: Allen Lee (Inactive)
Resolution: Incomplete Votes: 3
Labels: need_feedback
Environment:

Linux 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I'm running 10.2.14, with roughly 300GB data (1000K +/- tables). 95% tables are innodb. I have 64GB RAM, with INNODB buffer pool size set to 46GB (full my.cnf attached). The OS is Ubuntu 16.04.4. This is a dedicated MariaDB server.


Attachments: Text File globalstatus.txt     Text File globalvars.txt     Text File innodbstatus.txt     File my.cnf    
Issue Links:
Relates
relates to MDEV-15237 "Can't write; duplicate key in table"... Closed

 Description   

I'm having oodles of issues with MariaDB 10.2.14 and Fulltext Search. As you can see in the below log line items, a FTS index error is breaking replication. When restarting replication at the very point of breakage, things pick up without issue, as if the FTS issue has been self corrected.

Searching through Maria Jira, and elsewhere, I'm not finding people with like problems. Any ideas out there?

Thanks,

Mike

2019-03-08  7:02:06 140095257917184 [ERROR] InnoDB: Duplicate FTS_DOC_ID value on table `ddx_practice_7884`.`patients`
2019-03-08  7:02:06 140095257917184 [ERROR] Cannot find index FTS_DOC_ID_INDEX in InnoDB index translation table.
2019-03-08  7:02:06 140095257917184 [Warning] Found index FTS_DOC_ID_INDEX in InnoDB index list but not its MariaDB index number. It could be an InnoDB internal index.
2019-03-08  7:02:06 140095257917184 [ERROR] Cannot find index FTS_DOC_ID_INDEX in InnoDB index translation table.
2019-03-08  7:02:06 140095257917184 [Warning] Found index FTS_DOC_ID_INDEX in InnoDB index list but not its MariaDB index number. It could be an InnoDB internal index.
2019-03-08  7:02:06 140095257917184 [ERROR] Slave SQL: Error 'Can't write; duplicate key in table 'patients'' on query. Default database: 'ddx_identities'. Query: 'UPDATE `ddx_practice_7884`.`patients` SET `external_id` = '70231', `update_date` = '2019-03-08 13:02:06' WHERE (id = '962')', Gtid 0-3-486813306, Internal MariaDB error code: 1022
2019-03-08  7:02:06 140095257917184 [Warning] Slave: Can't write; duplicate key in table 'patients' Error_code: 1022
2019-03-08  7:02:06 140095257917184 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.005733' position 1060804351
2019-03-08  7:02:06 140095257917184 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.005733' at position 1060804351
2019-03-08  7:06:53 140095257917184 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.005733' at position 1060804351, relay log '/var/lib/mysql/relay-bin.004018' position: 1060804650



 Comments   
Comment by Michael Caplan [ 2019-03-11 ]

For what it is worth, I went through a rather long and painful update to 10.2.22 in my salve to see if there are goodies in this newer version that solve this issue. Unfortunately, replication still is breaking for me.

Comment by Marko Mäkelä [ 2019-06-25 ]

The FTS_DOC_ID values are generated internally by InnoDB and not replicated, so I do not think that this is a replication bug. You are merely witnessing that due to an InnoDB bug, some operations are not deterministic.

There have been many issues with the FULLTEXT INDEX implementation of InnoDB, and we are not prioritizing the fixes. The likelihood of duplicate FTS_DOC_ID should have reduced after MDEV-12676, but I think that I have seen such errors after merging that fix. If we get a reasonably repeatable test case, we can try to fix it.

I would not recommend using the InnoDB FULLTEXT INDEX in production. Only recently, we fixed two serious race conditions (MDEV-19445 and MDEV-18220) because they were reproducing easily in our internal testing.

Comment by Michael Caplan [ 2019-06-25 ]

Thanks Marko. We have since jettisoned FTS from our environment. My ability to help create a reproducible test case is no longer.

Thinking back to the issue, it only showed up, at least originally, in our production environment where we have 45K+ databases. In our staging environment, we are dealing with much, much, smaller numbers of databases. I'm not sure if that is a factor in reproducing this, but thought I'd at least put it out there.

Thanks,

Mike

Comment by Gianni Angelozzi [ 2019-06-25 ]

Hi,
I've reported the bug MDEV-15237
So there is no hope for the near future to have FTS support in innodb tables?
What alternatives are there? Do we need to dump the text data into Aria tables for search purposes?
michaelcaplan what did you end up using?
Thanks

Comment by Marko Mäkelä [ 2019-06-25 ]

Thanks, michaelcaplan. I have lowered the priority accordingly. I think that the fulltext search code in InnoDB is lacking clear rules for protecting concurrent access. There could be many subtle race conditions. A support customer also experienced hangs in the past.

Comment by Michael Caplan [ 2019-06-25 ]

Gianni, we went low tech brute force: `LIKE`

It is not ideal, but firing up Elastic Search or Sphinx was overkill (and complex) for what we needed.

Comment by Chris Calender (Inactive) [ 2021-04-08 ]

2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.

Comment by Marko Mäkelä [ 2021-04-13 ]

The upcoming releases (10.2.38, 10.3.29, 10.4.19, 10.5.10) will include better diagnostic messages, so that it should be possible to identify the problematic index. We have been unable to reproduce this problem internally, but hopefully identifying the index will lead to getting more information from a user, and ultimately fixing the bug.

Comment by Marko Mäkelä [ 2021-06-17 ]

In MDEV-24088 I suggested a potential root cause of this failure. This is only a hypothesis until we have more information so that this can be reproduced in-house.

The fts_commit() call during InnoDB transaction commit seems to be an ACID violation. Replication internally uses the two-phase commit mechanism (XA 2PC). The design constraint is that after XA PREPARE, the only allowed subsequent actions on that transaction are XA COMMIT or XA ROLLBACK. The fts_commit() is acquiring locks and modifying data while the transaction would already be in the XA PREPARE state. Furthermore, if an error occurs during that step, it will be ignored.

Fixing this design problem would seem to involve substantial code refactoring, and avoiding performance regressions could be challenging. My current understanding is that there is no ‘pre-prepare’ hook in the storage engine handler API that would be able to report a failure. XA PREPARE itself cannot fail. Also a normal 1-phase commit currently does not allow returning errors.

Generated at Thu Feb 08 08:47:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.