[MDEV-17371] InnoDB: Warning: a long semaphore wait Created: 2018-10-04  Updated: 2020-08-07  Resolved: 2020-08-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.35
Fix Version/s: 10.3.3, 10.2.25, 10.1.41

Type: Bug Priority: Major
Reporter: Albert Chin-A-Young Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: None
Environment:

SLES 12/x86-64


Attachments: Text File web03[1].txt    
Issue Links:
Duplicate
duplicates MDEV-13983 Mariadb becomes unresponsive Closed

 Description   

We are running Drupal 8.5.6 on a SLES12 SP3 guest VM with MariaDB 10.0.35. We recently started encountering an issue where mysqld.log was filling up with:

InnoDB: Warning: a long semaphore wait:
--Thread 140299741705984 has waited at fut0fut.ic line 51 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f9a23adf680 '&block->lock'
a writer (thread id 140299742004992) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 5640
Last time write locked in file /home/abuild/rpmbuild/BUILD/mariadb-10.0.35/storage/xtradb/buf/buf0buf.cc line 3863
InnoDB: Warning: a long semaphore wait:
--Thread 140299892901632 has waited at ibuf0ibuf.cc line 2821 for 241.00 seconds the semaphore:
Mutex at 0x5559577b1980 '&ibuf_mutex', lock var 1
waiters flag 1
InnoDB: Warning: semaphore wait:
--Thread 140299741705984 has waited at fut0fut.ic line 51 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f9a23adf680 '&block->lock'
a writer (thread id 140299742004992) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 5640
Last time write locked in file /home/abuild/rpmbuild/BUILD/mariadb-10.0.35/storage/xtradb/buf/buf0buf.cc line 3863
InnoDB: Warning: Writer thread is waiting this semaphore:
--Thread 140299742004992 has waited at buf0buf.cc line 2703 for 104.00 seconds the semaphore:
S-lock on RW-latch at 0x7f9a23d4e840 '&block->lock'
a writer (thread id 140299639863040) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1037
Last time write locked in file /home/abuild/rpmbuild/BUILD/mariadb-10.0.35/storage/xtradb/buf/buf0buf.cc line 3863
InnoDB: Warning: Writer thread is waiting this semaphore:
--Thread 140299639863040 has waited at btr0cur.cc line 592 for 185.00 seconds the semaphore:
S-lock on RW-latch at 0x555958ddc438 '&new_index->lock'
a writer (thread id 140299741705984) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: fffffffffff00000
Last time read locked in file btr0cur.cc line 592
Last time write locked in file /home/abuild/rpmbuild/BUILD/mariadb-10.0.35/storage/xtradb/ibuf/ibuf0ibuf.cc line 411
InnoDB: Warning: Writer thread is waiting this semaphore:
--Thread 140299741705984 has waited at fut0fut.ic line 51 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f9a23adf680 '&block->lock'
a writer (thread id 140299742004992) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 5640
Last time write locked in file /home/abuild/rpmbuild/BUILD/mariadb-10.0.35/storage/xtradb/buf/buf0buf.cc line 3863
InnoDB: Warning: Writer thread is waiting this semaphore:
...

Apart from the stock my.cnf we have:

[mysqld]
bind-address = 127.0.0.1
max_allowed_packet = 256M
innodb_log_file_size = 281M
innodb_adaptive_hash_index = 0
innodb_purge_threads = 4
innodb_flush_neighbors = 0
innodb_lock_wait_timeout = 100
ignore-db-dir = .BackupExpressChangeJournal
ignore-db-dir = .BackupExpressJob
ignore-db-dir = lost+found

We added the following from the above after we started encountering this error:

innodb_adaptive_hash_index = 0
innodb_purge_threads = 4
innodb_flush_neighbors = 0
innodb_lock_wait_timeout = 100

Unfortunately we do not have a stack backtrace nor can we provide a SQL command that exhibits the problem. When this error occurs, there are about 150 SQL commands listed in "SHOW PROCESSLIST". When the error occurs again, we will run the following to provide a stack backtrace and provide the "SHOW PROCESSLIST" output:

# gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p <pid>



 Comments   
Comment by Elena Stepanova [ 2018-11-01 ]

Any luck with the stack trace and processlist?

Comment by Rudy Broersma [ 2019-03-13 ]

We are having the same issues on a server with Drupal 8.6.10 (other versions were not tested). When the issue comes up MySQL no longer responds. We can't connect to MySQL anymore and in the rare cases we can connect we cannot ask any questions (eg. show status hangs forever). We have to kill MySQLd and restart it to get it back online.

This issue occurs randomly, but has occured twice in the last 48 hours.

There are actually quite a few issues found on Google that relate to Drupal and semaphore wait timeouts. Also some bugreports for MariaDB as old as 2013 regarding the same issue. For example:

https://drupal.stackexchange.com/questions/104880/drupal-7-database-select-query-from-cache-bootstrap-brings-server-down
https://www.drupal.org/forum/support/post-installation/2018-09-26/innodb-warning-a-long-semaphore-wait
https://www.drupal.org/project/drupal/issues/1898204

I have attached our error log.

  1. mysql
    Welcome to the MariaDB monitor. Commands end with ; or \g.
    Your MariaDB connection id is 2752
    Server version: 10.1.37-MariaDB-0+deb9u1 Debian 9.6
Comment by Albert Chin-A-Young [ 2019-03-13 ]

We never solved this issue as we reverted to an older working configuration. However, I think they'll probably need to see a stack trace to help. Do you have a debug build of MariaDB? If so, did you try running the following:

# gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p <pid>

Also, I think another possibility is to configure memcached/redis as the Drupal cache, bypassing MariaDB for caching. We have that set up but have not moved it into production yet. If this continues to be an issue for you, I would recommend trying that to determine if it works around the problem.

Comment by Marko Mäkelä [ 2020-08-07 ]

I think that both these hang reports (on 10.0.35 and 10.1.37) could be duplicating MDEV-13983. That deadlock was introduced by MDEV-11896 in 10.0.32, 10.1.26, 10.2.8 and fixed in 10.1.41, 10.2.25, 10.3.3.

Generated at Thu Feb 08 08:35:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.