[MDEV-19803] Long semaphore wait error on galera.MW-388 Created: 2019-06-19  Updated: 2022-01-03  Resolved: 2021-12-27

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Jan Lindström (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-18464 Port kill_one_trx fixes from 10.4 to ... Closed

 Description   

galera.MW-388 'innodb'                   w1 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2019-06-18 03:46:15
line
2019-06-18  3:45:15 140319937001216 [Warning] InnoDB: A long semaphore wait:
2019-06-18  3:45:46 140319937001216 [Warning] InnoDB: A long semaphore wait:
^ Found warnings in /dev/shm/var/1/log/mysqld.1.err
ok
 
 - saving '/dev/shm/var/1/log/galera.MW-388-innodb/' to '/dev/shm/var/log/galera.MW-388-innodb/'
 
Retrying test galera.MW-388, attempt(2/3)...
 
worker[1] > Restart  - not started
worker[1] > Restart  - not started
galera.galera_as_master 'innodb'         w2 [ pass ]   2871
worker[2] > Restart [mysqld.1 - pid: 9007, winpid: 9007] - using different config file
worker[2] > Restart [mysqld.2 - pid: 9009, winpid: 9009] - using different config file
worker[2] > Restart [mysqld.3 - pid: 9011, winpid: 9011] - using different config file
galera.MW-388 'innodb'                   w1 [ retry-pass ]   2063



 Comments   
Comment by Jan Lindström (Inactive) [ 2019-06-19 ]
Comment by Jan Lindström (Inactive) [ 2019-06-19 ]

CURRENT_TEST: galera.MW-388
2019-06-18  3:45:15 140319937001216 [Warning] InnoDB: A long semaphore wait:
--Thread 140320104756992 has waited at buf0flu.cc line 1229 for 241.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f9ee1bf72f0 created in file buf0buf.cc line 1496
a writer (thread id 140320477054720) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 2599
Last time write locked in file row0ins.cc line 2599
2019-06-18  3:45:15 140319937001216 [Note] InnoDB: A semaphore wait:
--Thread 140320104756992 has waited at buf0flu.cc line 1229 for 241.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f9ee1bf72f0 created in file buf0buf.cc line 1496
a writer (thread id 140320477054720) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 2599
Last time write locked in file row0ins.cc line 2599
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 0, writes 0
 
=====================================
2019-06-18 03:45:23 0x7f9ed17fa700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 60 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 18 srv_active, 0 srv_shutdown, 301 srv_idle
srv_master_thread log flush and writes: 319
----------
SEMAPHORES
----------
-------------
RW-LATCH INFO
-------------
RW-LOCK: 0x7f9ee1bf7418 
Locked: thread 140320477054720 file row0ins.cc line 2599  S-LOCK
RW-LOCK: 0x7f9ee1bf72f0  Waiters for the lock exist
Locked: thread 140320477054720 file row0ins.cc line 2599  X-LOCK
Total number of rw-locks 1075
OS WAIT ARRAY INFO: reservation count 285
--Thread 140320104756992 has waited at buf0flu.cc line 1229 for 249.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f9ee1bf72f0 created in file buf0buf.cc line 1496
a writer (thread id 140320477054720) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 2599
Last time write locked in file row0ins.cc line 2599
OS WAIT ARRAY INFO: signal count 201
RW-shared spins 0, rounds 229, OS waits 97
RW-excl spins 0, rounds 288, OS waits 9
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 229.00 RW-shared, 288.00 RW-excl, 0.00 RW-sx
FAIL TO OBTAIN LOCK MUTEX, SKIP LOCK INFO PRINTING

Comment by Jan Lindström (Inactive) [ 2019-06-19 ]

marko I do not immediately see how this could be related only to Galera.

Comment by Marko Mäkelä [ 2019-06-19 ]

jplindst, please post the stack traces of all threads that participate in the hang, or show a non-Galera test where InnoDB hangs in this way.

Which exact versions are affected? Did you really see this in all of 10.2, 10.3, 10.4? Please always identify the exact source code revisions.

I believe that such hangs in Galera can occur due to MDEV-18464 not having been fixed.

Comment by Jan Lindström (Inactive) [ 2019-06-19 ]

Unfortunately, I do not have full stack trace. I can't repeat this locally.

Repeats at:

Test does not use KILL command and semaphores on report are not lock_sys or trx_mutex so MDEV-18464 is not likely the cause.

Comment by George Diamantopoulos [ 2019-09-24 ]

Hello,

I've had something similar happen after an upgrade from 10.3.16 -> 10.3.18. The first galera node was brought up without problems, but the remaining two failed to receive IST while the logs were inundated with these "semaphore" warnings. In the end, I had to kill the server and start it again several times for the IST to complete and re-join the cluster.

I'm posting excerpts from the log here in case it is of any value. Do you think it is the same issue? Or should open a new one? Thanks!

2019-09-24  1:11:06 2 [Note] WSREP: Receiving IST: 9009 writesets, seqnos 4727456747-4727465756
2019-09-24  1:11:06 0 [Note] WSREP: Receiving IST...  0.0% (   0/9009 events) complete.
2019-09-24  1:11:06 2 [Warning] InnoDB: Cannot add field `as_contact` in table `kamailio`.`topos_t` because after adding it, the row size is 8193 which is greater than maximum allowed size (8126) for a record on index leaf page.
2019-09-24  1:11:06 2 [Warning] InnoDB: Cannot add field `s_rr` in table `kamailio`.`topos_d` because after adding it, the row size is 8166 which is greater than maximum allowed size (8126) for a record on index leaf page.
2019-09-24  1:11:07 0 [Note] WSREP: (086791d9, 'tcp://172.30.20.129:4567') turning message relay requesting off
 
=====================================
2019-09-24 01:11:26 0x7fa92b456700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1 srv_active, 0 srv_shutdown, 0 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 210
--Thread 140363891283712 has waited at ibuf0ibuf.cc line 4465 for 19.00 seconds the semaphore:
S-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465
--Thread 140363899676416 has waited at ibuf0ibuf.cc line 4465 for 19.00 seconds the semaphore:
S-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465

Similar logs follow after every INNODB MONITOR OUTPUT. After a while, I start getting these in between InnoDB Monitor messages:

2019-09-24  1:15:07 0 [Warning] InnoDB: A long semaphore wait:
--Thread 140392823322368 has waited at lock0lock.cc line 6261 for 241.00 seconds the semaphore:
Mutex at 0x56329c07c500, Mutex LOCK_SYS created lock0lock.cc:460, lock var 2
 
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140363891283712 has waited at ibuf0ibuf.cc line 4465 for 240.00 seconds the semaphore:
S-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140363899676416 has waited at ibuf0ibuf.cc line 4465 for 240.00 seconds the semaphore:
S-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140363950032640 has waited at ibuf0ibuf.cc line 4465 for 240.00 seconds the semaphore:
S-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140392823322368 has waited at lock0lock.cc line 6261 for 241.00 seconds the semaphore:
Mutex at 0x56329c07c500, Mutex LOCK_SYS created lock0lock.cc:460, lock var 2
 
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140363924854528 has waited at lock0lock.cc line 3389 for 240.00 seconds the semaphore:
Mutex at 0x56329c07c500, Mutex LOCK_SYS created lock0lock.cc:460, lock var 2
 
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140363941639936 has waited at ibuf0ibuf.cc line 4465 for 240.00 seconds the semaphore:
S-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140364535383808 has waited at ibuf0ibuf.cc line 2456 for 240.00 seconds the semaphore:
S-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140363815749376 has waited at buf0flu.cc line 1231 for 240.00 seconds the semaphore:
SX-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465
2019-09-24  1:15:07 0 [Note] InnoDB: A semaphore wait:
--Thread 140364409558784 has waited at ibuf0ibuf.cc line 4465 for 238.00 seconds the semaphore:
S-lock on RW-latch at 0x7faf63b29508 created in file buf0buf.cc line 1609
a writer (thread id 140363924854528) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file ibuf0ibuf.cc line 4465
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 0, writes 0

Comment by Stepan Patryshev (Inactive) [ 2020-07-13 ]

It still fails on BB, 10.2.

Generated at Thu Feb 08 08:54:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.