[MDEV-20621] FULLTEXT INDEX activity causes InnoDB hang Created: 2019-09-18 Updated: 2020-09-30 Resolved: 2019-10-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server, Storage Engine - InnoDB |
| Affects Version/s: | 10.1.41, 10.2, 10.3, 10.4 |
| Fix Version/s: | 10.2.28, 10.1.42, 10.3.19, 10.4.9 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Stevo | Assignee: | Thirunarayanan Balathandayuthapani |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Cloudlinux 7.7 |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
We are experiencing technical difficulties with the latest MariaDB 10.1.41-MariaDB. The database is freezing and does not accept new connections.
We can provide more error log data but not in a public. |
| Comments |
| Comment by Stevo [ 2019-09-18 ] | |||||||||
|
Bonus tip: Looks like is always happening approximately after 1 hour and 30 minutes after the service is started. | |||||||||
| Comment by Marko Mäkelä [ 2019-09-19 ] | |||||||||
|
Novkovski, I would like to see the stack traces of all threads when this happens.
I do not have any idea what could cause this to happen 1 hour and 30 minutes after the service has been started. Could it be due to some external monitoring or maintenance activity? In | |||||||||
| Comment by Stevo [ 2019-09-19 ] | |||||||||
|
It happened again after approx. 1 hour and 30-40 minutes. | |||||||||
| Comment by Marko Mäkelä [ 2019-09-19 ] | |||||||||
|
Has gdb It would be helpful if you could save a core dump that you could analyze based on commands provided by me. We need to find out which thread is holding dict_sys->mutex. That should be relatively easy: print/x *dict_sys->mutex should reveal the thread identifier in hexadecimal. You can use thread find 0x… to find the thread. Finally, switch to that thread and issue backtrace. But, I would probably likely still need a complete backtrace of all threads. Note that you can upload any confidential files to ftp.mariadb.com. | |||||||||
| Comment by Stevo [ 2019-09-20 ] | |||||||||
|
I`m not so Linux technical person so please guide me whatever I need to make. | |||||||||
| Comment by Stevo [ 2019-09-20 ] | |||||||||
|
Isnt the latest change a fix to this issue? | |||||||||
| Comment by Marko Mäkelä [ 2019-09-30 ] | |||||||||
|
Novkovski, thank you for noticing. Yes, your gdb There are many problems with the InnoDB fulltext search implementation, and there are not many useful regression tests. We also have some other fixes in the works that have not gone through stress tests (or code review) yet. | |||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2019-10-01 ] | |||||||||
|
It is different issue from | |||||||||
| Comment by Matthias Leich [ 2019-10-16 ] | |||||||||
|
| |||||||||
| Comment by Marko Mäkelä [ 2019-10-17 ] | |||||||||
|
This is a welcome step to the right direction, but I think that this needs some more work. First of all, the in_queue should not be stored in a bit-field that is shared with other bit-fields that are protected by a different mutex. I would suggest to use bool, and to document the possible state transitions carefully. We might consider using atomic memory access. Second, in 10.1, fts_optimize_init() is not adding tables to the queue, while in 10.2 it is doing that. I’d like to see a 10.1 patch that does this. It should also avoid the unnecessary use of std::vector. Third, fts_optimize_remove_table() should assert !table->fts->in_queue in the end. | |||||||||
| Comment by Marko Mäkelä [ 2019-10-18 ] | |||||||||
|
At the end of fts_optimize_remove_table(), the fts_optimize_wq->mutex acquisition and release around the debug assertion should be inside ut_d(), to avoid unnecessary operations on the release build. I saw a redundant sync_table = mem_heap_alloc(…) call whose result was immediately overwritten by {{sync_table=table;} In fts_optimize_new_table() the assignment slot->running = false is redundant because of a preceding memset() call. If fts_slots can be accessed by multiple threads, then we should extend some mutex hold time. It could be that it is only being accessed by a single thread. Should we call fts_init_index() already on ha_innobase::open()? Otherwise, it seems that FTS-indexed columns could be updated before any fulltext search is performed (and ha_innobase::ft_init_ext() is called). Could that lead to some updates being missed by the fulltext indexes? Finally, please check the following for differences in white-space or comments, and try to fix those:
| |||||||||
| Comment by Marko Mäkelä [ 2019-10-22 ] | |||||||||
|
Thanks, this looks OK. I made a suggestion to declare fts_optimize_wq) without static scope, to avoid having to add trivial non-inline accessor functions. | |||||||||
| Comment by Matthias Leich [ 2019-10-25 ] | |||||||||
|
|