[MDEV-18868] FTS Breaking Replication - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 10.2.14, 10.2.22, 10.5.8
Fix Version/s: N/A
Component/s: Full-text Search, Storage Engine - InnoDB
Labels:
- need_feedback
Environment:

Hide
Linux 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I'm running 10.2.14, with roughly 300GB data (1000K +/- tables). 95% tables are innodb. I have 64GB RAM, with INNODB buffer pool size set to 46GB (full my.cnf attached). The OS is Ubuntu 16.04.4. This is a dedicated MariaDB server.

Show
Linux 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux I'm running 10.2.14, with roughly 300GB data (1000K +/- tables). 95% tables are innodb. I have 64GB RAM, with INNODB buffer pool size set to 46GB (full my.cnf attached). The OS is Ubuntu 16.04.4. This is a dedicated MariaDB server.

Description

I'm having oodles of issues with MariaDB 10.2.14 and Fulltext Search. As you can see in the below log line items, a FTS index error is breaking replication. When restarting replication at the very point of breakage, things pick up without issue, as if the FTS issue has been self corrected.

Searching through Maria Jira, and elsewhere, I'm not finding people with like problems. Any ideas out there?

Thanks,

Mike

2019-03-08  7:02:06 140095257917184 [ERROR] InnoDB: Duplicate FTS_DOC_ID value on table `ddx_practice_7884`.`patients`

2019-03-08  7:02:06 140095257917184 [ERROR] Cannot find index FTS_DOC_ID_INDEX in InnoDB index translation table.

2019-03-08  7:02:06 140095257917184 [Warning] Found index FTS_DOC_ID_INDEX in InnoDB index list but not its MariaDB index number. It could be an InnoDB internal index.

2019-03-08  7:02:06 140095257917184 [ERROR] Cannot find index FTS_DOC_ID_INDEX in InnoDB index translation table.

2019-03-08  7:02:06 140095257917184 [Warning] Found index FTS_DOC_ID_INDEX in InnoDB index list but not its MariaDB index number. It could be an InnoDB internal index.

2019-03-08  7:02:06 140095257917184 [ERROR] Slave SQL: Error 'Can't write; duplicate key in table 'patients'' on query. Default database: 'ddx_identities'. Query: 'UPDATE `ddx_practice_7884`.`patients` SET `external_id` = '70231', `update_date` = '2019-03-08 13:02:06' WHERE (id = '962')', Gtid 0-3-486813306, Internal MariaDB error code: 1022

2019-03-08  7:02:06 140095257917184 [Warning] Slave: Can't write; duplicate key in table 'patients' Error_code: 1022

2019-03-08  7:02:06 140095257917184 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.005733' position 1060804351

2019-03-08  7:02:06 140095257917184 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.005733' at position 1060804351

2019-03-08  7:06:53 140095257917184 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.005733' at position 1060804351, relay log '/var/lib/mysql/relay-bin.004018' position: 1060804650

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

globalstatus.txt
57 kB
2019-03-09 03:08
globalvars.txt
75 kB
2019-03-09 03:08
innodbstatus.txt
12 kB
2019-03-09 03:08
my.cnf
2 kB
2019-03-09 03:08

Issue Links

relates to

MDEV-15237 "Can't write; duplicate key in table" when updating some rows in a transaction

Closed

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Marko Mäkelä added a comment - 2019-06-25 14:31

Thanks, michaelcaplan. I have lowered the priority accordingly. I think that the fulltext search code in InnoDB is lacking clear rules for protecting concurrent access. There could be many subtle race conditions. A support customer also experienced hangs in the past.

Marko Mäkelä added a comment - 2019-06-25 14:31 Thanks, michaelcaplan . I have lowered the priority accordingly. I think that the fulltext search code in InnoDB is lacking clear rules for protecting concurrent access. There could be many subtle race conditions. A support customer also experienced hangs in the past.

Michael Caplan added a comment - 2019-06-25 14:35

Gianni, we went low tech brute force: `LIKE`

It is not ideal, but firing up Elastic Search or Sphinx was overkill (and complex) for what we needed.

Michael Caplan added a comment - 2019-06-25 14:35 Gianni, we went low tech brute force: `LIKE` It is not ideal, but firing up Elastic Search or Sphinx was overkill (and complex) for what we needed.

Chris Calender (Inactive) added a comment - 2021-04-08 02:18

2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.
2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.

Chris Calender (Inactive) added a comment - 2021-04-08 02:18 2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:05:36 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:05:53 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:07:01 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:08:02 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table. 2021-04-04 23:11:46 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.

Marko Mäkelä added a comment - 2021-04-13 14:24 - edited

The upcoming releases (10.2.38, 10.3.29, 10.4.19, 10.5.10) will include better diagnostic messages, so that it should be possible to identify the problematic index. We have been unable to reproduce this problem internally, but hopefully identifying the index will lead to getting more information from a user, and ultimately fixing the bug.

Marko Mäkelä added a comment - 2021-04-13 14:24 - edited The upcoming releases (10.2.38, 10.3.29, 10.4.19, 10.5.10) will include better diagnostic messages, so that it should be possible to identify the problematic index. We have been unable to reproduce this problem internally, but hopefully identifying the index will lead to getting more information from a user, and ultimately fixing the bug.

Marko Mäkelä added a comment - 2021-06-17 05:00

In ~~MDEV-24088~~ I suggested a potential root cause of this failure. This is only a hypothesis until we have more information so that this can be reproduced in-house.

The fts_commit() call during InnoDB transaction commit seems to be an ACID violation. Replication internally uses the two-phase commit mechanism (XA 2PC). The design constraint is that after XA PREPARE, the only allowed subsequent actions on that transaction are XA COMMIT or XA ROLLBACK. The fts_commit() is acquiring locks and modifying data while the transaction would already be in the XA PREPARE state. Furthermore, if an error occurs during that step, it will be ignored.

Fixing this design problem would seem to involve substantial code refactoring, and avoiding performance regressions could be challenging. My current understanding is that there is no ‘pre-prepare’ hook in the storage engine handler API that would be able to report a failure. XA PREPARE itself cannot fail. Also a normal 1-phase commit currently does not allow returning errors.

Marko Mäkelä added a comment - 2021-06-17 05:00 In MDEV-24088 I suggested a potential root cause of this failure. This is only a hypothesis until we have more information so that this can be reproduced in-house. The fts_commit() call during InnoDB transaction commit seems to be an ACID violation. Replication internally uses the two-phase commit mechanism (XA 2PC). The design constraint is that after XA PREPARE , the only allowed subsequent actions on that transaction are XA COMMIT or XA ROLLBACK . The fts_commit() is acquiring locks and modifying data while the transaction would already be in the XA PREPARE state. Furthermore, if an error occurs during that step, it will be ignored. Fixing this design problem would seem to involve substantial code refactoring, and avoiding performance regressions could be challenging. My current understanding is that there is no ‘pre-prepare’ hook in the storage engine handler API that would be able to report a failure. XA PREPARE itself cannot fail. Also a normal 1-phase commit currently does not allow returning errors.

MariaDB Server

FTS Breaking Replication

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration