[MDEV-20243] semaphore hang and crash in 10.3.16 while load infile Created: 2019-08-02  Updated: 2020-01-06  Resolved: 2020-01-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.16
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: James Williams Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

CPU: AMD Ryzen 5 2600
Kernel: Linux 5.1.17-300.fc30.x86_64
OS: Fedora 30


Attachments: Text File crash.log    
Issue Links:
Relates
relates to MDEV-20414 "a long semaphore wait" After Upgradi... Open
relates to MDEV-21253 10.1.43 crash Closed

 Description   

While loading a large amount of data via "LOAD INFILE" I regularly trigger a hang / lockup. Eventually MariaDB crashes itself and recovers. The schema is very simple and has 3 varchar columns with no indexes.

2019-08-02  2:50:33 0 [Warning] InnoDB: A long semaphore wait:
--Thread 140625894979328 has waited at buf0flu.cc line 1231 for 241.00 seconds the semaphore:
SX-lock on RW-latch at 0x7fe60bd76dc8 created in file buf0buf.cc line 1535
a writer (thread id 140625013896960) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file trx0rseg.ic line 43
2019-08-02  2:50:33 0 [Note] InnoDB: A semaphore wait:
--Thread 140625894979328 has waited at buf0flu.cc line 1231 for 241.00 seconds the semaphore:
SX-lock on RW-latch at 0x7fe60bd76dc8 created in file buf0buf.cc line 1535
a writer (thread id 140625013896960) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file trx0rseg.ic line 43
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 0, writes 0
 
=====================================
2019-08-02 02:50:42 0x7fe5fe5f7700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 45 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 51 srv_active, 0 srv_shutdown, 603 srv_idle
srv_master_thread log flush and writes: 654
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 101760
--Thread 140625894979328 has waited at buf0flu.cc line 1231 for 250.00 seconds the semaphore:
SX-lock on RW-latch at 0x7fe60bd76dc8 created in file buf0buf.cc line 1535
a writer (thread id 140625013896960) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file trx0rseg.ic line 43
OS WAIT ARRAY INFO: signal count 720134
RW-shared spins 0, rounds 1122008, OS waits 78681
RW-sx spins 17, rounds 504, OS waits 16
Spin rounds per wait: 1122008.00 RW-shared, 948606.00 RW-excl, 29.65 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 312
Purge done for trx's n:o < 300 undo n:o < 52709166 state: running
History list length 9
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 422101336969480, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 1
209249 OS file reads, 120426 OS file writes, 11839 OS fsyncs
14.31 reads/s, 16384 avg bytes/read, 29.58 writes/s, 29.58 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 1332 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 60399452025
Log flushed up to   60399451526
Pages flushed up to 60398943961
Last checkpoint at  60398943961
0 pending log flushes, 0 pending chkp writes
3694 log i/o's done, 0.93 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 170852352
Dictionary memory allocated 31824
Buffer pool size   8191
Free buffers       0
Database pages     6859
Old database pages 2551
Modified db pages  2995
Percent of dirty pages(LRU & free pages): 43.659
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 2, single page 1
Pages made young 446, not young 107542634
0.00 youngs/s, 42.93 non-youngs/s
Pages read 200593, created 849, written 110407
14.31 reads/s, 0.00 creates/s, 14.31 writes/s
Buffer pool hit rate 889 / 1000, young-making rate 0 / 1000 not 330 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 6859, unzip_LRU len: 0
I/O sum[1428]:cur[266], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=6282, Main thread ID=140625766729472, state: sleeping
Number of rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
 
...
lots more from INNODB MONITOR OUTPUT included in crash.log attached
...
 
InnoDB: ###### Diagnostic info printed to the standard error stream
2019-08-02  9:34:31 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears
to be hung.
190802  9:34:31 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.3.16-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=2
max_threads=153
thread_count=8
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467420 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/usr/libexec/mysqld(my_print_stacktrace+0x41)[0x556cb047dcf1]
/usr/libexec/mysqld(handle_fatal_signal+0x4e5)[0x556caff653d5]
/lib64/libpthread.so.0(+0x12e80)[0x7f8621d21e80]
/lib64/libc.so.6(gsignal+0x145)[0x7f862183fe75]
/lib64/libc.so.6(abort+0x127)[0x7f862182a895]
/usr/libexec/mysqld(+0x4c1b45)[0x556cafc8eb45]
/usr/libexec/mysqld(+0xa3fd5a)[0x556cb020cd5a]
/lib64/libpthread.so.0(+0x85a2)[0x7f8621d175a2]
/lib64/libc.so.6(clone+0x43)[0x7f8621903303]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /raid/mysql
Resource Limits:
Fatal signal 11 while backtracing
2019-08-02  9:34:41 0 [Note] InnoDB: Using Linux native AIO
2019-08-02  9:34:41 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-08-02  9:34:41 0 [Note] InnoDB: Uses event mutexes
2019-08-02  9:34:41 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2019-08-02  9:34:41 0 [Note] InnoDB: Number of pools: 1
2019-08-02  9:34:41 0 [Note] InnoDB: Using SSE2 crc32 instructions
2019-08-02  9:34:41 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2019-08-02  9:34:41 0 [Note] InnoDB: Completed initialization of buffer pool
2019-08-02  9:34:41 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-08-02  9:34:41 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=65739376432
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928437] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928438] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928439] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928440] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928441] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928442] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928443] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928444] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928445] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928446] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928447] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928448] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928449] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928450] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928451] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928452] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928453] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928454] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928455] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928456] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928457] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928458] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928459] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928460] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928461] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928462] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928463] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928464] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928465] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928466] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928467] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928468] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928469] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928470] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928471] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928472] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928473] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928474] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928475] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928476] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928477] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928478] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928479] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928480] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928481] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928482] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928483] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928484] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928485] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928486] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928487] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928488] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928471] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928472] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928473] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928474] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928475] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928476] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928477] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928478] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928479] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928480] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928481] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928482] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928483] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928484] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928485] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928486] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928487] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928488] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928489] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928490] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928491] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Recovered page [page id: space=5, page number=1928492] from the doublewrite buffer.
2019-08-02  9:34:42 0 [Note] InnoDB: Starting final batch to recover 13936 pages from redo log.
2019-08-02  9:34:49 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2019-08-02  9:34:49 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2019-08-02  9:34:49 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-08-02  9:34:49 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-08-02  9:34:49 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-08-02  9:34:49 0 [Note] InnoDB: 10.3.16 started; log sequence number 65739376471; transaction id 327
2019-08-02  9:34:49 0 [Note] InnoDB: Loading buffer pool(s) from /raid/mysql/ib_buffer_pool
2019-08-02  9:34:49 0 [Note] Plugin 'FEEDBACK' is disabled.
2019-08-02  9:34:49 0 [Note] Recovering after a crash using tc.log
2019-08-02  9:34:49 0 [Note] Starting crash recovery...
2019-08-02  9:34:49 0 [Note] Crash recovery finished.
2019-08-02  9:34:49 0 [Note] Server socket created on IP: '::'.
2019-08-02  9:34:49 0 [Note] InnoDB: Buffer pool(s) load completed at 190802  9:34:49
2019-08-02  9:34:49 0 [Note] Reading of all Master_info entries succeeded
2019-08-02  9:34:49 0 [Note] Added new Master_info '' to hash table
2019-08-02  9:34:49 0 [Note] /usr/libexec/mysqld: ready for connections.
Version: '10.3.16-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
 



 Comments   
Comment by Marko Mäkelä [ 2019-11-12 ]

jwilliams44, for hangs of InnoDB threads, it would be much easier to look at the stack traces of all threads when the hang occurs. If the hang reproduces, you should be able to trigger the hang and then gather the stack traces from the mysqld process, like this:

gdb -ex "set pagination 0" -ex "thread apply all bt" -ex "print *dict_operation_lock" -ex "print *dict_sys->mutex" --batch -p $(pgrep -x mysqld)

We have some unexplained hangs where dict_operation_lock is blocking many threads.

MDEV-20414 is another report, possibly sharing a common cause with this one. Unfortunately, for that one we did not get the output for the two print statements yet. Even from the stack traces, it can be hard to determine the cause of deadlock. Seeing the internal state of these two latches would greatly ease the analysis.

Generated at Thu Feb 08 08:57:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.