[MDEV-20621] FULLTEXT INDEX activity causes InnoDB hang - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.1.41, 10.2(EOL), 10.3(EOL), 10.4(EOL)
Fix Version/s: 10.2.28, 10.1.42, 10.3.19, 10.4.9
Component/s: Server, Storage Engine - InnoDB
Labels:
None
Environment:
Cloudlinux 7.7
Google Cloud Compute Engine

Description

We are experiencing technical difficulties with the latest MariaDB 10.1.41-MariaDB.
This is only happening on one server while we have more with the same system package versions.

The database is freezing and does not accept new connections.
The error_log shows so much error data eg:

InnoDB: Warning: a long semaphore wait:

--Thread 140300680931072 has waited at dict0dict.cc line 984 for 241.00 seconds the semaphore:

Mutex at 0x7f9e26c112e8 '&dict_sys->mutex', lock var 1

Last time reserved by thread 140300697716480 in file not yet reserved line 0, waiters flag 1

InnoDB: Warning: semaphore wait:

--Thread 140300680931072 has waited at dict0dict.cc line 984 for 241.00 seconds the semaphore:

Mutex at 0x7f9e26c112e8 '&dict_sys->mutex', lock var 1

Last time reserved by thread 140300697716480 in file not yet reserved line 0, waiters flag 1

We can provide more error log data but not in a public.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

gdb
829 kB
2019-09-19 12:27

Issue Links

causes

MDEV-20987 InnoDB fails to start when fts table has FK relation

Closed

MDEV-23856 fts_optimize_wq accessed after shutdown of FTS Optimize thread

Closed

relates to

MDEV-19529 InnoDB hang on DROP FULLTEXT INDEX

Closed

Activity

Ascending order - Click to sort in descending order

View 8 older comments

Matthias Leich added a comment - 2019-10-16 10:52 - edited

Results of RQG testing on bb-10.2-thiru commit 0b91f74906c8dcbcc1dac486fcc66c1e9c0c603a

- > 1500 RQG tests were executed

There was some surprising low fraction of failing tests.

All asserts/crashes are already covered by open bugs in JIRA except one

- mysqld: sql/sql_list.h:684: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.

  happening during shutdown of the server

- per Thiru: Unlikely that its caused by the changes in bb-10.3-thiru

- occuring only once ==  Attempts to replay that on actual 10.2 have a too low chance

https://jira.mariadb.org/browse/MDEV-20843

Matthias Leich added a comment - 2019-10-16 10:52 - edited Results of RQG testing on bb-10.2-thiru commit 0b91f74906c8dcbcc1dac486fcc66c1e9c0c603a - > 1500 RQG tests were executed There was some surprising low fraction of failing tests. All asserts/crashes are already covered by open bugs in JIRA except one - mysqld: sql/sql_list.h:684: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed. happening during shutdown of the server - per Thiru: Unlikely that its caused by the changes in bb-10.3-thiru - occuring only once == Attempts to replay that on actual 10.2 have a too low chance https://jira.mariadb.org/browse/MDEV-20843

Marko Mäkelä added a comment - 2019-10-17 11:29

This is a welcome step to the right direction, but I think that this needs some more work.

First of all, the in_queue should not be stored in a bit-field that is shared with other bit-fields that are protected by a different mutex.

I would suggest to use bool, and to document the possible state transitions carefully. We might consider using atomic memory access.

Second, in 10.1, fts_optimize_init() is not adding tables to the queue, while in 10.2 it is doing that. I’d like to see a 10.1 patch that does this. It should also avoid the unnecessary use of std::vector.

Third, fts_optimize_remove_table() should assert !table->fts->in_queue in the end.

Marko Mäkelä added a comment - 2019-10-17 11:29 This is a welcome step to the right direction, but I think that this needs some more work. First of all, the in_queue should not be stored in a bit-field that is shared with other bit-fields that are protected by a different mutex. I would suggest to use bool , and to document the possible state transitions carefully. We might consider using atomic memory access. Second, in 10.1, fts_optimize_init() is not adding tables to the queue, while in 10.2 it is doing that. I’d like to see a 10.1 patch that does this. It should also avoid the unnecessary use of std::vector . Third, fts_optimize_remove_table() should assert !table->fts->in_queue in the end.

Marko Mäkelä added a comment - 2019-10-18 12:06

At the end of fts_optimize_remove_table(), the fts_optimize_wq->mutex acquisition and release around the debug assertion should be inside ut_d(), to avoid unnecessary operations on the release build.

I saw a redundant sync_table = mem_heap_alloc(…) call whose result was immediately overwritten by {{sync_table=table;}

In fts_optimize_new_table() the assignment slot->running = false is redundant because of a preceding memset() call.

If fts_slots can be accessed by multiple threads, then we should extend some mutex hold time. It could be that it is only being accessed by a single thread.

Should we call fts_init_index() already on ha_innobase::open()? Otherwise, it seems that FTS-indexed columns could be updated before any fulltext search is performed (and ha_innobase::ft_init_ext() is called). Could that lead to some updates being missed by the fulltext indexes?

Finally, please check the following for differences in white-space or comments, and try to fix those:

diff -I^@@ <(git show origin/bb-10.1-thiru storage/innobase) <(git show origin/bb-10.1-thiru storage/xtradb/)

git show origin/bb-10.2-thiru|diff -^@@ - <(git show origin/bb-10.1-thiru storage/innobase)

Marko Mäkelä added a comment - 2019-10-18 12:06 At the end of fts_optimize_remove_table() , the fts_optimize_wq->mutex acquisition and release around the debug assertion should be inside ut_d() , to avoid unnecessary operations on the release build. I saw a redundant sync_table = mem_heap_alloc(…) call whose result was immediately overwritten by {{sync_table=table;} In fts_optimize_new_table() the assignment slot->running = false is redundant because of a preceding memset() call. If fts_slots can be accessed by multiple threads, then we should extend some mutex hold time. It could be that it is only being accessed by a single thread. Should we call fts_init_index() already on ha_innobase::open() ? Otherwise, it seems that FTS-indexed columns could be updated before any fulltext search is performed (and ha_innobase::ft_init_ext() is called). Could that lead to some updates being missed by the fulltext indexes? Finally, please check the following for differences in white-space or comments, and try to fix those: diff -I^@@ <(git show origin/bb-10.1-thiru storage/innobase) <(git show origin/bb-10.1-thiru storage/xtradb/) git show origin/bb-10.2-thiru|diff -^@@ - <(git show origin/bb-10.1-thiru storage/innobase)

Marko Mäkelä added a comment - 2019-10-22 13:22

Thanks, this looks OK. I made a suggestion to declare fts_optimize_wq) without static scope, to avoid having to add trivial non-inline accessor functions.

Marko Mäkelä added a comment - 2019-10-22 13:22 Thanks, this looks OK. I made a suggestion to declare fts_optimize_wq ) without static scope, to avoid having to add trivial non- inline accessor functions.

Matthias Leich added a comment - 2019-10-25 10:29

I tested the tree bb-10.2-thiru commit ce813ca178e499ab2171978bf0140537cb9ca612 which contains

patches for the current MDEV.

There were no asserts/crashes which do not occur in actual

10.2 commit 28098420317bc2efe082df799c917babde879242

too.

So from my point of view the MDEV-20621 patch is ok.

Matthias Leich added a comment - 2019-10-25 10:29 I tested the tree bb-10.2-thiru commit ce813ca178e499ab2171978bf0140537cb9ca612 which contains patches for the current MDEV. There were no asserts/crashes which do not occur in actual 10.2 commit 28098420317bc2efe082df799c917babde879242 too. So from my point of view the MDEV-20621 patch is ok.

MariaDB Server

FULLTEXT INDEX activity causes InnoDB hang

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration