[MDEV-18868] FTS Breaking Replication Created: 2019-03-09 Updated: 2021-08-02 Resolved: 2021-08-02 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Full-text Search, Storage Engine - InnoDB |
| Affects Version/s: | 10.2.14, 10.2.22, 10.5.8 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Michael Caplan | Assignee: | Allen Lee (Inactive) |
| Resolution: | Incomplete | Votes: | 3 |
| Labels: | need_feedback | ||
| Environment: |
Linux 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux I'm running 10.2.14, with roughly 300GB data (1000K +/- tables). 95% tables are innodb. I have 64GB RAM, with INNODB buffer pool size set to 46GB (full my.cnf attached). The OS is Ubuntu 16.04.4. This is a dedicated MariaDB server. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
I'm having oodles of issues with MariaDB 10.2.14 and Fulltext Search. As you can see in the below log line items, a FTS index error is breaking replication. When restarting replication at the very point of breakage, things pick up without issue, as if the FTS issue has been self corrected. Searching through Maria Jira, and elsewhere, I'm not finding people with like problems. Any ideas out there? Thanks, Mike
|
| Comments |
| Comment by Michael Caplan [ 2019-03-11 ] |
|
For what it is worth, I went through a rather long and painful update to 10.2.22 in my salve to see if there are goodies in this newer version that solve this issue. Unfortunately, replication still is breaking for me. |
| Comment by Marko Mäkelä [ 2019-06-25 ] |
|
The FTS_DOC_ID values are generated internally by InnoDB and not replicated, so I do not think that this is a replication bug. You are merely witnessing that due to an InnoDB bug, some operations are not deterministic. There have been many issues with the FULLTEXT INDEX implementation of InnoDB, and we are not prioritizing the fixes. The likelihood of duplicate FTS_DOC_ID should have reduced after I would not recommend using the InnoDB FULLTEXT INDEX in production. Only recently, we fixed two serious race conditions ( |
| Comment by Michael Caplan [ 2019-06-25 ] |
|
Thanks Marko. We have since jettisoned FTS from our environment. My ability to help create a reproducible test case is no longer. Thinking back to the issue, it only showed up, at least originally, in our production environment where we have 45K+ databases. In our staging environment, we are dealing with much, much, smaller numbers of databases. I'm not sure if that is a factor in reproducing this, but thought I'd at least put it out there. Thanks, Mike |
| Comment by Gianni Angelozzi [ 2019-06-25 ] |
|
Hi, |
| Comment by Marko Mäkelä [ 2019-06-25 ] |
|
Thanks, michaelcaplan. I have lowered the priority accordingly. I think that the fulltext search code in InnoDB is lacking clear rules for protecting concurrent access. There could be many subtle race conditions. A support customer also experienced hangs in the past. |
| Comment by Michael Caplan [ 2019-06-25 ] |
|
Gianni, we went low tech brute force: `LIKE` It is not ideal, but firing up Elastic Search or Sphinx was overkill (and complex) for what we needed. |
| Comment by Chris Calender (Inactive) [ 2021-04-08 ] |
|
2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. |
| Comment by Marko Mäkelä [ 2021-04-13 ] |
|
The upcoming releases (10.2.38, 10.3.29, 10.4.19, 10.5.10) will include better diagnostic messages, so that it should be possible to identify the problematic index. We have been unable to reproduce this problem internally, but hopefully identifying the index will lead to getting more information from a user, and ultimately fixing the bug. |
| Comment by Marko Mäkelä [ 2021-06-17 ] |
|
In The fts_commit() call during InnoDB transaction commit seems to be an ACID violation. Replication internally uses the two-phase commit mechanism (XA 2PC). The design constraint is that after XA PREPARE, the only allowed subsequent actions on that transaction are XA COMMIT or XA ROLLBACK. The fts_commit() is acquiring locks and modifying data while the transaction would already be in the XA PREPARE state. Furthermore, if an error occurs during that step, it will be ignored. Fixing this design problem would seem to involve substantial code refactoring, and avoiding performance regressions could be challenging. My current understanding is that there is no ‘pre-prepare’ hook in the storage engine handler API that would be able to report a failure. XA PREPARE itself cannot fail. Also a normal 1-phase commit currently does not allow returning errors. |