[MDEV-17190] MariaDB Server deadlock Created: 2018-09-13 Updated: 2019-09-02 Resolved: 2019-03-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 5.5.54 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Michael Graf | Assignee: | Vladislav Vaintroub |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | need_feedback | ||
| Environment: |
Windows Server 2008 R2 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Many threads in our application hanged up in waiting on response of MariaDB server. |
| Comments |
| Comment by Vladislav Vaintroub [ 2018-09-18 ] | |||||||||||||||||||||||||||||||
|
Deadlock in Innodb. | |||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-09-18 ] | |||||||||||||||||||||||||||||||
|
Some fruther investigation of https://jira.mariadb.org/secure/attachment/46481/uniqstack.txt shows that 1. a lot of threads are waiting for mutex sync_array_reserve_cell , but this is the lock help by the thread doing sync_array_print_long_waits, so these can be excluded. 2. There are only few threads left, once you leave out the ones mentioned in 1., as well as threads that are already there (handle_connections, IO threads, couple of others) .For all the remaining "interesting" threads, I put their cleaned-up callstacks along with corresponding innodb deadlock detector output from the error log into https://jira.mariadb.org/secure/attachment/46485/annotated_interesting_stacks.txt I speculate there is something fishy about &rseg->mutex mutex,
marko, if you're free, maybe you can have a look, or reassign to another Innodb expert. | |||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-09-18 ] | |||||||||||||||||||||||||||||||
|
I think that we must find out which thread is holding the rollback segment block lock in S mode. I am afraid that the easiest way to do so is to review the code. Explicit information about threads holding S-latches is only available in debug builds. | |||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-09-19 ] | |||||||||||||||||||||||||||||||
|
migra, I believe that I may have found the explanation for the hang. Please confirm if you are doing massive DELETE or UPDATE (or implicit or explicit ROLLBACK of large INSERT) on tables that reside in the InnoDB system tablespace (were created without SET GLOBAL innodb_file_per_table=1). If my analysis is correct, the workaround would be to move all tables to .ibd files. If you can confirm my analysis, I will confirm the bug. Long versionAccording to the diagnostic output, one thread is holding an S-latch on a rollback segment header page, blocking an X-latch request from trx_rsegf_get(). Which callers could possibly S-latch a rollback segment header page? In 5.5, normally, the rollback segment header pages are exclusively accessed through trx_rsegf_get() or trx_rsegf_get_new(). Both these functions acquire X-latch, so the reported wait would look different. btr_estimate_n_rows_in_range_on_level() could do this if a table is located in the system tablespace, and a table page was freed and subsequently allocated for a rollback segment header. This function is executed as part of ha_innobase::records_in_range() when computing query plans. (Before MySQL 5.6 or MariaDB Server 10.0 introduced the configuration parameter innodb_undo_tablespaces, the undo logs are always located in the system tablespace. If innodb_undo_tablespaces>1, this conflict should not be possible.) This is the only caller that is passing the debug-check-thwarting flag BUF_GET_POSSIBLY_FREED outside debug code. In debug builds, lock_rec_block_validate() contains such a call, but it is using a page X-latch, so it could not cause this exact type of a hang (pending X-lock request being blocked by an existing S-latch holder). In uniqstack.txt
Side note on scalabilityI would like to remove the sync_array at some point. It could be that it simply does not scale to this many concurrent threads. (But that is a performance issue, not the reason for this hang.) You might want to experiment with innodb_thread_concurrency. | |||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-09-20 ] | |||||||||||||||||||||||||||||||
|
There was a flaw in my analysis. None of the threads are executing btr_estimate_n_rows_in_range_on_level(). The B-tree dives invoked from 2 mini-transactions in btr_estimate_n_rows_in_range() are properly protected. But, between the first dive and the completion of btr_estimate_n_rows_in_range(), the index tree may be subject to page split or merge, meaning that the results given to the query planner can be inaccurate or incorrect. The function btr_estimate_n_rows_in_range_on_level() is more problematic, because it is accessing pages based on possibly stale page numbers. On the leaf level of the index tree, we could do a better job and simply do a forward scan of the leaf pages (acquiring the right sibling page latch before releasing the current page latch). On the upper levels, it is not easy to change the logic. To be able to hold multiple non-leaf page latches, we should acquire dict_index_t::lock first, and that would block any operations that would allocate or free pages. This could seriously reduce performance. Apart from the access to possibly freed and repurposed page in btr_estimate_n_rows_in_range_on_level(), the only way how we should end up acquiring an undo log header page S-latch in a B-tree operation is that the B-tree index is corrupted. This kind of corruption ought to be detected by CHECK TABLE. To help us find the hang, we can improve the diagnostic output, so that it indicates which thread is holding a mutex or X-latch that is being waited for. This could help us find the hang. Information about S-latch holders is only available in debug builds. In non-debug builds, we could record and report the identity of one S-latch holder, which should hopefully be enough in this case. | |||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-09-26 ] | |||||||||||||||||||||||||||||||
|
In Q100SPW-XYZ-H1.err
The output indicates that a request to exclusively latch an undo log header page in trx_rsegf_get() is blocked, because some thread is holding an S-latch on the same page. The output also says that the last S-latch request was from buffer pool flushing. It does not say if the last remaining S-latch is from that last request:
This code should only be invoked as part of buf_flush_list(), which is called from many places. One possible explanation is that the write request never completed, for some reason. | |||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-09-26 ] | |||||||||||||||||||||||||||||||
|
I built and uploaded current 5.5 with some more instrumentation for mutexes and rwlocks here https://1drv.ms/u/s!AqT4tBcElQz_j4llYSbKvKgMJ9Llag the patch is https://github.com/mariadb/server/commit/07094220fe3e07297255598804738817bfba6255 It only helps if the deadlocks reoccurs, and there is a dmp taken. Ideally, the .dmp would be a little larger (procdump -mp maybe?), because this one only contained stacks, but all variables were zeroed. migra, Maybe you could use default setting for the thread-handling( which is pool-of-threads), so there are less threads in dump, it will be easier to analyze (and should be generally faster, better scalable than one-thread-per-connection. If that does not work for you, I'd like to know why not | |||||||||||||||||||||||||||||||
| Comment by Michael Graf [ 2018-09-27 ] | |||||||||||||||||||||||||||||||
|
Thanks for the fix. >If that does not work for you, I'd like to know why not It seemed to me that if there are not enough threads available when the following scenario occurrs:
I can try to use "thread_handling=pool-of-threads" but I will have to set the value of "thread_pool_size" so high that the problem described above won't occurr. | |||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-09-27 ] | |||||||||||||||||||||||||||||||
|
Thanks for the explanation. You need to up thread_pool_max_threads variable. The default for 5.5 was 500, which some people found to be low | |||||||||||||||||||||||||||||||
| Comment by Michael Graf [ 2019-09-02 ] | |||||||||||||||||||||||||||||||
|
Almost 1 year after the last deadlock there seems to be that a similar deadlock occurred again. The used MariaDB version ist 5.5.61. This time there is also a minidump available. All threads in our application with a database connection wait on a response of MariaDB server. 495 Id: 1e0c.21d4 Suspend: 0 Teb: 000007ff`ffbda000 Unfrozen
| |||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2019-09-02 ] | |||||||||||||||||||||||||||||||
|
migra, I collected the stacks from the minidump , https://jira.mariadb.org/secure/attachment/48868/mysqld_exe_190816_130623_uniqstack.txt rsp https://jira.mariadb.org/secure/attachment/48867/mysqld_exe_190816_130623_all_stacks.txt I see no sign of a deadlock in there, the threads are either sleeping, or doing work. There is no thread waiting for a lock in MariaDB (there is some contention in Windows heap management though, but that's outside of our reach). Maybe I miss something? | |||||||||||||||||||||||||||||||
| Comment by Michael Graf [ 2019-09-02 ] | |||||||||||||||||||||||||||||||
|
I'm sorry. I sent you the wrong files. Here are the correct files: As you can see the errorlog contains the log messages you appended in the executable for mysqld.exe 5.5.61: InnoDB: Warning: semaphore wait: | |||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2019-09-02 ] | |||||||||||||||||||||||||||||||
|
Alright, I attached https://jira.mariadb.org/secure/attachment/48875/mysqld.exe_190820_all_stacks.txt and https://jira.mariadb.org/secure/attachment/48875/mysqld.exe_190820_uniqstack.txt for the latest dump. There is no deadlock still, however there is a contention on LOCK_open. Also under this lock , files are getting opened, as can be seen e.g in
I guess increasing table_cache or table_definition_cache might help, but I'm not an expert on this specific topic. | |||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2019-09-02 ] | |||||||||||||||||||||||||||||||
|
The best recommendation I can give is upgrade. We don't have LOCK_open since 10.0, so this contention should be gone. The most scalable table cache implementation is available since 10.3. InnoDB is scaling much better in recent versions as well. Raising table_cache or table_definition_cache might help indeed, but it has very limited impact. |