[MDEV-17190] MariaDB Server deadlock Created: 2018-09-13  Updated: 2019-09-02  Resolved: 2019-03-04

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5.54
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Michael Graf Assignee: Vladislav Vaintroub
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Windows Server 2008 R2


Attachments: File Q100SPW-XYZ-H1.err     Text File ThreadsWaitingOnResponseOfMariaDBServer.txt     Text File annotated_interesting_stacks.txt     Zip Archive mariadb.zip     Zip Archive mariadb.zip     File my.ini     File my.ini     File mysqld.err     Zip Archive mysqld.exe_180912_145324.zip     Text File mysqld.exe_190820_all_stacks.txt     Text File mysqld.exe_190820_uniqstack.txt     Text File mysqld_exe_190816_130623_all_stacks.txt     Text File mysqld_exe_190816_130623_uniqstack.txt     Text File uniqstack.txt    
Issue Links:
Relates
relates to MDEV-17063 server locks up and is killed after i... Closed

 Description   

Many threads in our application hanged up in waiting on response of MariaDB server.
After waiting more than 10 minutes our application and the the MariaDB server was restarted on a different server.
A minidump of the MariaDB server was created.



 Comments   
Comment by Vladislav Vaintroub [ 2018-09-18 ]

Deadlock in Innodb.
uniq-ified callstacks are in https://jira.mariadb.org/secure/attachment/46481/uniqstack.txt

Comment by Vladislav Vaintroub [ 2018-09-18 ]

Some fruther investigation of https://jira.mariadb.org/secure/attachment/46481/uniqstack.txt shows that

1. a lot of threads are waiting for mutex sync_array_reserve_cell , but this is the lock help by the thread doing sync_array_print_long_waits, so these can be excluded.

2. There are only few threads left, once you leave out the ones mentioned in 1., as well as threads that are already there (handle_connections, IO threads, couple of others) .For all the remaining "interesting" threads, I put their cleaned-up callstacks along with corresponding innodb deadlock detector output from the error log into https://jira.mariadb.org/secure/attachment/46485/annotated_interesting_stacks.txt

I speculate there is something fishy about &rseg->mutex mutex,

  • one thread(#9120) is waiting for it, and probably this also keeps some bufferpool lock.
  • There is another thread(#18256) that holds &rseg->mutex and waits for some bufferpool lock.

marko, if you're free, maybe you can have a look, or reassign to another Innodb expert.

Comment by Marko Mäkelä [ 2018-09-18 ]

I think that we must find out which thread is holding the rollback segment block lock in S mode. I am afraid that the easiest way to do so is to review the code. Explicit information about threads holding S-latches is only available in debug builds.

Comment by Marko Mäkelä [ 2018-09-19 ]

migra, I believe that I may have found the explanation for the hang. Please confirm if you are doing massive DELETE or UPDATE (or implicit or explicit ROLLBACK of large INSERT) on tables that reside in the InnoDB system tablespace (were created without SET GLOBAL innodb_file_per_table=1). If my analysis is correct, the workaround would be to move all tables to .ibd files.

If you can confirm my analysis, I will confirm the bug.

Long version

According to the diagnostic output, one thread is holding an S-latch on a rollback segment header page, blocking an X-latch request from trx_rsegf_get().

Which callers could possibly S-latch a rollback segment header page? In 5.5, normally, the rollback segment header pages are exclusively accessed through trx_rsegf_get() or trx_rsegf_get_new(). Both these functions acquire X-latch, so the reported wait would look different.

btr_estimate_n_rows_in_range_on_level() could do this if a table is located in the system tablespace, and a table page was freed and subsequently allocated for a rollback segment header. This function is executed as part of ha_innobase::records_in_range() when computing query plans.

(Before MySQL 5.6 or MariaDB Server 10.0 introduced the configuration parameter innodb_undo_tablespaces, the undo logs are always located in the system tablespace. If innodb_undo_tablespaces>1, this conflict should not be possible.)

This is the only caller that is passing the debug-check-thwarting flag BUF_GET_POSSIBLY_FREED outside debug code. In debug builds, lock_rec_block_validate() contains such a call, but it is using a page X-latch, so it could not cause this exact type of a hang (pending X-lock request being blocked by an existing S-latch holder).

In uniqstack.txt there are a few stacks like this. We do not know the identities of the pages, but it is plausible that one of these threads is holding S-latch on a page that used to be a B-tree page but now is an undo log header page, and is waiting for some synchronization object:

00000067`6fcd5d60 00000001`3fe62cb7 mysqld!os_mutex_enter+0x12 [d:\winx64-packages\build\src\storage\xtradb\os\os0sync.c @ 779]
00000067`6fcd5d90 00000001`3fe61b75 mysqld!sync_array_reserve_cell+0x57 [d:\winx64-packages\build\src\storage\xtradb\sync\sync0arr.c @ 365]
00000067`6fcd5dd0 00000001`3fe61a2b mysqld!rw_lock_s_lock_spin+0xd5 [d:\winx64-packages\build\src\storage\xtradb\sync\sync0rw.c @ 433]
00000067`6fcd5e20 00000001`3fe613cb mysqld!rw_lock_s_lock_func+0x3b [d:\winx64-packages\build\src\storage\xtradb\include\sync0rw.ic @ 423]
00000067`6fcd5e50 00000001`3fe334ec mysqld!pfs_rw_lock_s_lock_func+0x7b [d:\winx64-packages\build\src\storage\xtradb\include\sync0rw.ic @ 786]
00000067`6fcd5ed0 00000001`3fe7f547 mysqld!buf_page_get_gen+0xdc [d:\winx64-packages\build\src\storage\xtradb\buf\buf0buf.c @ 2532]
00000067`6fcd5f70 00000001`3fe804be mysqld!btr_cur_search_to_nth_level+0x397 [d:\winx64-packages\build\src\storage\xtradb\btr\btr0cur.c @ 736]
00000067`6fcd63d0 00000001`3fe1de34 mysqld!btr_estimate_n_rows_in_range+0xae [d:\winx64-packages\build\src\storage\xtradb\btr\btr0cur.c @ 3744]
00000067`6fcda840 00000001`3fd3b409 mysqld!ha_innobase::records_in_range+0x254 [d:\winx64-packages\build\src\storage\xtradb\handler\ha_innodb.cc @ 9127]

Side note on scalability

I would like to remove the sync_array at some point. It could be that it simply does not scale to this many concurrent threads. (But that is a performance issue, not the reason for this hang.) You might want to experiment with innodb_thread_concurrency.

Comment by Marko Mäkelä [ 2018-09-20 ]

There was a flaw in my analysis. None of the threads are executing btr_estimate_n_rows_in_range_on_level().

The B-tree dives invoked from 2 mini-transactions in btr_estimate_n_rows_in_range() are properly protected. But, between the first dive and the completion of btr_estimate_n_rows_in_range(), the index tree may be subject to page split or merge, meaning that the results given to the query planner can be inaccurate or incorrect.

The function btr_estimate_n_rows_in_range_on_level() is more problematic, because it is accessing pages based on possibly stale page numbers. On the leaf level of the index tree, we could do a better job and simply do a forward scan of the leaf pages (acquiring the right sibling page latch before releasing the current page latch). On the upper levels, it is not easy to change the logic. To be able to hold multiple non-leaf page latches, we should acquire dict_index_t::lock first, and that would block any operations that would allocate or free pages. This could seriously reduce performance.

Apart from the access to possibly freed and repurposed page in btr_estimate_n_rows_in_range_on_level(), the only way how we should end up acquiring an undo log header page S-latch in a B-tree operation is that the B-tree index is corrupted. This kind of corruption ought to be detected by CHECK TABLE.

To help us find the hang, we can improve the diagnostic output, so that it indicates which thread is holding a mutex or X-latch that is being waited for. This could help us find the hang. Information about S-latch holders is only available in debug builds. In non-debug builds, we could record and report the identity of one S-latch holder, which should hopefully be enough in this case.

Comment by Marko Mäkelä [ 2018-09-26 ]

In Q100SPW-XYZ-H1.err there is this output:

InnoDB: Warning: a long semaphore wait:
--Thread 19348 has waited at trx0rseg.ic line 46 for 534.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 00000003C8D80AC0 '&block->lock'
a writer (thread id 19348) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffff
Last time read locked in file buf0flu.c line 1340
Last time write locked in file D:\winx64-packages\build\src\storage\xtradb\include\trx0rseg.ic line 46

The output indicates that a request to exclusively latch an undo log header page in trx_rsegf_get() is blocked, because some thread is holding an S-latch on the same page. The output also says that the last S-latch request was from buffer pool flushing. It does not say if the last remaining S-latch is from that last request:

	case BUF_FLUSH_LIST:
		/* If the simulated aio thread is not running, we must
		not wait for any latch, as we may end up in a deadlock:
		if buf_fix_count == 0, then we know we need not wait */
 
		is_s_latched = (bpage->buf_fix_count == 0);
		if (is_s_latched && is_uncompressed) {
			rw_lock_s_lock_gen(&((buf_block_t*) bpage)->lock,
					   BUF_IO_WRITE);
		}

This code should only be invoked as part of buf_flush_list(), which is called from many places.
There are also other S-latch requests from the page flushing. I believe that these S-latches would normally be released by an I/O handler thread, once the write to the data file has completed.

One possible explanation is that the write request never completed, for some reason.
wlad should know the InnoDB I/O subsystem better than I do.

Comment by Vladislav Vaintroub [ 2018-09-26 ]

I built and uploaded current 5.5 with some more instrumentation for mutexes and rwlocks here

https://1drv.ms/u/s!AqT4tBcElQz_j4llYSbKvKgMJ9Llag

the patch is https://github.com/mariadb/server/commit/07094220fe3e07297255598804738817bfba6255

It only helps if the deadlocks reoccurs, and there is a dmp taken. Ideally, the .dmp would be a little larger (procdump -mp maybe?), because this one only contained stacks, but all variables were zeroed. migra, Maybe you could use default setting for the thread-handling( which is pool-of-threads), so there are less threads in dump, it will be easier to analyze (and should be generally faster, better scalable than one-thread-per-connection. If that does not work for you, I'd like to know why not

Comment by Michael Graf [ 2018-09-27 ]

Thanks for the fix.

>If that does not work for you, I'd like to know why not
The reason why we not use "thread_handling=one-thread-per-connection" and not "thread_handling=pool-of-threads" is that a deadlock occurred when we used it first time.

It seemed to me that if there are not enough threads available when the following scenario occurrs:

  • An program locks a table
  • Then a lot of other programs try to lock the same table and hung
  • The first program tries to unlock the table but cannot because there is no thread left in MariaDB to execute the unlock

I can try to use "thread_handling=pool-of-threads" but I will have to set the value of "thread_pool_size" so high that the problem described above won't occurr.

Comment by Vladislav Vaintroub [ 2018-09-27 ]

Thanks for the explanation. You need to up thread_pool_max_threads variable. The default for 5.5 was 500, which some people found to be low
It is 64K in the recent versions, the limit is almost artificial. And yes, LOCK TABLE in situation you describe would up the number of threads considerably, also in threadpool.

Comment by Michael Graf [ 2019-09-02 ]

Almost 1 year after the last deadlock there seems to be that a similar deadlock occurred again.

The used MariaDB version ist 5.5.61.
>I built and uploaded current 5.5 with some more instrumentation for mutexes and rwlocks here
>https://1drv.ms/u/s!AqT4tBcElQz_j4llYSbKvKgMJ9Llag

This time there is also a minidump available.
mariadb.zip

All threads in our application with a database connection wait on a response of MariaDB server.
Example stacktrace of our application:

495 Id: 1e0c.21d4 Suspend: 0 Teb: 000007ff`ffbda000 Unfrozen

  1. Child-SP RetAddr Call Site
    00 00000004`87a3b588 000007fe`fce5ca29 ntdll!ZwWaitForSingleObject+0xa
    01 00000004`87a3b590 000007fe`fed72369 mswsock!WSPRecv+0x625
    02 00000004`87a3b6b0 000007fe`e30b10e9 ws2_32!WSARecv+0x169
    03 00000004`87a3b730 000007fe`e30b0f67 libmariadb!vio_real_read+0xa9 [c:\bb\connector-c\build\src\libmariadb\violite.c @ 335]
    04 00000004`87a3b790 000007fe`e30b1b8e libmariadb!vio_read+0x97 [c:\bb\connector-c\build\src\libmariadb\violite.c @ 371]
    05 00000004`87a3b7c0 000007fe`e30b176e libmariadb!my_real_read+0x6e [c:\bb\connector-c\build\src\libmariadb\net.c @ 581]
    06 00000004`87a3b800 000007fe`e30baac2 libmariadb!my_net_read+0x1e [c:\bb\connector-c\build\src\libmariadb\net.c @ 723]
    07 00000004`87a3b850 000007fe`e30b58fb libmariadb!net_safe_read+0x22 [c:\bb\connector-c\build\src\libmariadb\libmariadb.c @ 409]
    08 00000004`87a3b880 000007fe`e3fa0ed6 libmariadb!mthd_my_read_query_result+0x1b [c:\bb\connector-c\build\src\libmariadb\libmariadb.c @ 2432]
    09 00000004`87a3b8c0 000007fe`e3f4d006 sqlapi!ImyCursor::Execute+0x826
    0a 00000004`87a3ba40 00000000`67d09f8e sqlapi!SACommand::Execute+0x56
    0b 00000004`87a3bab0 00000001`4059fcd3 dllsqlapiext!C_SACommand::Execute+0x7e [e:\b5823031_x64\all\all_os\sqlapiext\c_sacommand.cpp @ 162]
Comment by Vladislav Vaintroub [ 2019-09-02 ]

migra, I collected the stacks from the minidump , https://jira.mariadb.org/secure/attachment/48868/mysqld_exe_190816_130623_uniqstack.txt rsp https://jira.mariadb.org/secure/attachment/48867/mysqld_exe_190816_130623_all_stacks.txt

I see no sign of a deadlock in there, the threads are either sleeping, or doing work. There is no thread waiting for a lock in MariaDB (there is some contention in Windows heap management though, but that's outside of our reach). Maybe I miss something?

Comment by Michael Graf [ 2019-09-02 ]

I'm sorry. I sent you the wrong files.

Here are the correct files:

mariadb.zip

As you can see the errorlog contains the log messages you appended in the executable for mysqld.exe 5.5.61:

InnoDB: Warning: semaphore wait:
--Thread 17860 has waited at trx0trx.c line 1348 for 340.00 seconds the semaphore:
Mutex at 000000000032ADF8 '&kernel_mutex', lock var 1
Last time reserved in file d:\work\5.5\storage\xtradb\lock\lock0lock.c line 5280 by thread 4892, waiters flag 1

Comment by Vladislav Vaintroub [ 2019-09-02 ]

Alright, I attached https://jira.mariadb.org/secure/attachment/48875/mysqld.exe_190820_all_stacks.txt and https://jira.mariadb.org/secure/attachment/48875/mysqld.exe_190820_uniqstack.txt for the latest dump.

There is no deadlock still, however there is a contention on LOCK_open. Also under this lock , files are getting opened, as can be seen e.g in
this stack. lock is acquired in open_table(), just before get_table_share_with_discover()

.296  Id: 1dd8.4574 Suspend: 0 Teb: 000007ff`ffd3e000 Unfrozen
      Start: mysqld!thread_start<unsigned int (__cdecl*)(void * __ptr64)> (00000001`3ff24ce8)
      Priority: 0  Priority class: 32  Affinity: ffffffff
 # Child-SP          RetAddr           Call Site
00 0000006a`dcb1d058 000007fe`fda66bf6 ntdll!ZwCreateFile+0xa
01 0000006a`dcb1d060 00000000`77ac17d6 KERNELBASE!CreateFileW+0x2cd
02 0000006a`dcb1d1c0 00000001`3fef6dcd kernel32!CreateFileA+0xb6
03 0000006a`dcb1d230 00000001`3fedb0d1 mysqld!my_win_sopen+0x22d [d:\work\5.5\mysys\my_winfile.c @ 251] 
04 0000006a`dcb1d2c0 00000001`3fc0396b mysqld!my_open+0x21 [d:\work\5.5\mysys\my_open.c @ 60] 
05 (Inline Function) --------`-------- mysqld!inline_mysql_file_open+0x6b [d:\work\5.5\include\mysql\psi\mysql_file.h @ 1046] 
06 0000006a`dcb1d300 00000001`3fbe3ca6 mysqld!open_table_def+0xcb [d:\work\5.5\sql\table.cc @ 621] 
07 0000006a`dcb1d640 00000001`3fbe3d2d mysqld!get_table_share+0x1a6 [d:\work\5.5\sql\sql_base.cc @ 619] 
08 0000006a`dcb1d680 00000001`3fbe7629 mysqld!get_table_share_with_discover+0x3d [d:\work\5.5\sql\sql_base.cc @ 717] 
09 0000006a`dcb1d6e0 00000001`3fbe67f5 mysqld!open_table+0x4e9 [d:\work\5.5\sql\sql_base.cc @ 3043] 
0a 0000006a`dcb1dd50 00000001`3fbe83c3 mysqld!open_and_process_table+0x215 [d:\work\5.5\sql\sql_base.cc @ 4602] 
0b 0000006a`dcb1ddd0 00000001`3fbe63fb mysqld!open_tables+0x2b3 [d:\work\5.5\sql\sql_base.cc @ 5115] 
0c 0000006a`dcb1dee0 00000001`3fbb119c mysqld!open_and_lock_tables+0x4b [d:\work\5.5\sql\sql_base.cc @ 5732] 
0d (Inline Function) --------`-------- mysqld!open_and_lock_tables+0x20 [d:\work\5.5\sql\sql_base.h @ 526] 
0e 0000006a`dcb1df40 00000001`3fbb343a mysqld!execute_sqlcom_select+0xec [d:\work\5.5\sql\sql_parse.cc @ 4636] 
0f 0000006a`dcb1e3e0 00000001`3fbb6f04 mysqld!mysql_execute_command+0x3ba [d:\work\5.5\sql\sql_parse.cc @ 2224] 
10 0000006a`dcb1ee60 00000001`3fbafa71 mysqld!mysql_parse+0x104 [d:\work\5.5\sql\sql_parse.cc @ 5928] 
11 0000006a`dcb1eec0 00000001`3fbb078f mysqld!dispatch_command+0x7f1 [d:\work\5.5\sql\sql_parse.cc @ 1068] 
12 0000006a`dcb1fc30 00000001`3fbd9849 mysqld!do_command+0xff [d:\work\5.5\sql\sql_parse.cc @ 796] 
13 0000006a`dcb1fc60 00000001`3fbdaf14 mysqld!do_handle_one_connection+0x139 [d:\work\5.5\sql\sql_connect.cc @ 1268] 
14 0000006a`dcb1fc90 00000001`3fed8ecb mysqld!handle_one_connection+0x44 [d:\work\5.5\sql\sql_connect.cc @ 1186] 
15 0000006a`dcb1fcc0 00000001`3ff24d38 mysqld!pthread_start+0x1b [d:\work\5.5\mysys\my_winthread.c @ 62] 
16 (Inline Function) --------`-------- mysqld!invoke_thread_procedure+0xd [minkernel\crts\ucrt\src\appcrt\startup\thread.cpp @ 91] 
17 0000006a`dcb1fcf0 00000000`77ab556d mysqld!thread_start<unsigned int (__cdecl*)(void * __ptr64)>+0x50 [minkernel\crts\ucrt\src\appcrt\startup\thread.cpp @ 115] 
18 0000006a`dcb1fd20 00000000`77c1385d kernel32!BaseThreadInitThunk+0xd
19 0000006a`dcb1fd50 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

I guess increasing table_cache or table_definition_cache might help, but I'm not an expert on this specific topic.
Maybe svoj, who studied table cache in all details,, can give recommendations?

Comment by Sergey Vojtovich [ 2019-09-02 ]

The best recommendation I can give is upgrade. We don't have LOCK_open since 10.0, so this contention should be gone. The most scalable table cache implementation is available since 10.3. InnoDB is scaling much better in recent versions as well.

Raising table_cache or table_definition_cache might help indeed, but it has very limited impact.

Generated at Thu Feb 08 08:34:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.