[MDEV-24375] Semaphore wait has lasted > 600 seconds Created: 2020-12-09  Updated: 2023-04-06  Resolved: 2021-04-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.7, 10.5.8
Fix Version/s: 10.2.37, 10.3.28, 10.4.18, 10.5.9

Type: Bug Priority: Major
Reporter: vbvbrj Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 10
Labels: innodb, need_feedback
Environment:

Server version: 10.5.8-MariaDB-1:10.5.8+maria~focal-log
Mariadb server in docker. mariadb:latest
Docker version 19.03.13, build 4484c46d9d
Linux 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux


Issue Links:
Duplicate
duplicates MDEV-24188 Hang in buf_page_create() after reusi... Closed
is duplicated by MDEV-24468 mariadb crashes when converting a lar... Closed
Relates
relates to MDEV-25955 InnoDB: Semaphore wait has lasted > 6... Closed

 Description   

I import a schema with all table definition and data to mariadb. File size of the file iz about 150GB. At some point, when importing a 70GB table I receive the following information in log and import is disconnected.

2020-12-09 13:19:29 0 [Note] InnoDB: A semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 607.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 1, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2020-12-09 13:19:30 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 608.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
2020-12-09 13:19:30 0 [Note] InnoDB: A semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 608.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 1, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2020-12-09 13:19:31 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 609.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
2020-12-09 13:19:31 0 [Note] InnoDB: A semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 609.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 1, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2020-12-09 13:19:32 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 610.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
2020-12-09 13:19:32 0 [Note] InnoDB: A semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 610.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 0, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2020-12-09 13:19:33 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 611.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
2020-12-09 13:19:33 0 [Note] InnoDB: A semaphore wait:
--Thread 139977950762752 has waited at btr0cur.cc line 1480 for 611.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f4b3b32c160 created in file dict0dict.cc line 2160
a writer (thread id 139963135923968) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 1, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2020-12-09 13:19:33 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
201209 13:19:33 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.5.8-MariaDB-1:10.5.8+maria~focal-log
key_buffer_size=2147483648
read_buffer_size=20971520
max_used_connections=6
max_threads=1002
thread_count=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 547980111 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
mysqld(my_print_stacktrace+0x32)[0x55ea24b5c692]
mysqld(handle_fatal_signal+0x485)[0x55ea245b3e45]

The mariadb server continues working, but importing is stopped.



 Comments   
Comment by vbvbrj [ 2020-12-13 ]

Tested!

mysqld Ver 10.5.5-MariaDB-1:10.5.5+maria~focal-log for debian-linux-gnu on x86_64 (mariadb.org binary distribution)

10.5.5 does not have this bug.

10.5.7 and 10.5.8 does have this bug.
10.5.6 - not tested.

Comment by sjon [ 2020-12-15 ]

I have experienced the same issue for a couple of weeks but been unable to consistently reproduce it. This is 10.5.8 on Archlinux

(Unless my timestamps are off) I've even seen this issue on a gtid slave that, according to the bin & slave logs, wasn't doing anything at the timestamp this semaphore was supposedly created.

What makes this issue worse is that mariadbd doesn't properly explode, it leaves a hanging process that needs to be killed manually before systemd is able to properly restart it. That process gets killed by the OOM Killer on my servers:

InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 1, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2020-12-14 16:27:02 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
201214 16:27:02 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.5.8-MariaDB-log
key_buffer_size=2097152
read_buffer_size=131072
max_used_connections=8
max_threads=102
thread_count=12
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 435461 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000

and then 5 minutes later:

Out of memory: Killed process 327619 (mariadbd) total-vm:3245288kB, anon-rss:2292320kB, file-rss:264kB, shmem-rss:0kB, UID:89 pgtables:5844kB oom_score_adj:0
oom_reaper: reaped process 327619 (mariadbd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Comment by Jon Luo [ 2020-12-17 ]

Hi, I have the same issue on 10.5.8, and also on 10.3.27 before (updated to 10.5.8 to see if it would fix it). Occurs every time I try to import a large table. Mine does crash cleanly and restarts properly. For me, it consistently crashes after around 8000 lines of my table are imported. I've resorted to uploading 8000 lines at a time...

{{2020-12-17 3:01:41 0 [Warning] InnoDB: A long semaphore wait:
--Thread 140587769321216 has waited at btr0cur.cc line 1480 for 611.00 seconds the semaphore:
SX-lock on RW-latch at 0x7fc390011670 created in file dict0dict.cc line 2161
a writer (thread id 140478806996736) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969}}

Comment by Sebastian Stamm [ 2020-12-17 ]

Same here:

InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 0, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2020-12-11 11:36:21 0 [Warning] InnoDB: A long semaphore wait:
--Thread 3408 has waited at btr0cur.cc line 1480 for 484.00 seconds the semaphore:
SX-lock on RW-latch at 000000D245E22798 created in file dict0dict.cc line 2161
a writer (thread id 6644) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969

Comment by Marko Mäkelä [ 2020-12-17 ]

I suspect that this is simply is a duplicate of MDEV-24188, whose fix unfortunately was not included in the 10.5.8 release.

Could you download a recent development snapshot package at http://hasky.askmonty.org/archive/10.5/ and see if it fixes the problem?

Comment by vbvbrj [ 2020-12-18 ]

The environment is a docker container. Is there a docker container version 10.5.9 for testing?

Comment by Mastre Pad [ 2021-01-25 ]

The same problem, mariadb 10.5.8 from centos 7.9 repositories (not a docker container):
yum list installed | grep maria
MariaDB-client.x86_64 10.5.8-1.el7.centos @mariadb-main
MariaDB-common.x86_64 10.5.8-1.el7.centos @mariadb-main
MariaDB-compat.x86_64 10.5.8-1.el7.centos @mariadb-main
MariaDB-server.x86_64 10.5.8-1.el7.centos @mariadb-main
galera-4.x86_64 26.4.6-1.el7.centos @mariadb-main

The server is used for zabbix database (~150GB)
The problem occurs 1) when I try to migrate big innodb tables from another (10.4) mariadb server (many inserts) and 2) during normal operation from time to time , perhaps on "delete from" statements.

Comment by Marko Mäkelä [ 2021-01-25 ]

Sorry, I do not know how to make a Docker container out of the latest development snapshot. The 10.5.9 release could be out next week.

Comment by Kfir Itzhak [ 2021-02-14 ]

I am also seeing this, but on 10.3.27 (from CentOS8 repository).
It happens everytime i try to load a 100GB dump file, around the 60-70GB mark.
The server is not overloaded (~2 load average) and plenty of RAM free.

I can't load this dump file, it always dies.
In the logs, i also see many like this:

--Thread 140622086665984 has waited at btr0cur.cc line 1357 for 365.00 seconds the semaphore:
SX-lock on RW-latch at 0x7fe4f8014b10 created in file dict0dict.cc line 2130
a writer (thread id 140622530246400) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969

But here is the final one

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 2880515
--Thread 139877571553024 has waited at btr0cur.cc line 1357 for 951.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f3460ba49c0 created in file dict0dict.cc line 2130
a writer (thread id 139863895107328) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
OS WAIT ARRAY INFO: signal count 13333349
RW-shared spins 18127810, rounds 113544287, OS waits 933307
RW-excl spins 19501052, rounds 89573899, OS waits 858245
RW-sx spins 106871, rounds 883699, OS waits 13202
Spin rounds per wait: 6.26 RW-shared, 4.59 RW-excl, 8.27 RW-sx
...
2021-02-12 21:55:30 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
210212 21:55:31 [ERROR] mysqld got signal 6 ;

Comment by Kfir Itzhak [ 2021-02-14 ]

I am also seeing this, but on 10.3.27 (from CentOS8 repository).
It happens everytime i try to load a 100GB dump file, around the 60-70GB mark.
The server is not overloaded (~2 load average) and plenty of RAM free.

I can't load this dump file, it always dies.
In the logs, i also see many like this:

--Thread 140622086665984 has waited at btr0cur.cc line 1357 for 365.00 seconds the semaphore:
SX-lock on RW-latch at 0x7fe4f8014b10 created in file dict0dict.cc line 2130
a writer (thread id 140622530246400) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969

But here is the final one

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 2880515
--Thread 139877571553024 has waited at btr0cur.cc line 1357 for 951.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f3460ba49c0 created in file dict0dict.cc line 2130
a writer (thread id 139863895107328) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
OS WAIT ARRAY INFO: signal count 13333349
RW-shared spins 18127810, rounds 113544287, OS waits 933307
RW-excl spins 19501052, rounds 89573899, OS waits 858245
RW-sx spins 106871, rounds 883699, OS waits 13202
Spin rounds per wait: 6.26 RW-shared, 4.59 RW-excl, 8.27 RW-sx
...
2021-02-12 21:55:30 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
210212 21:55:31 [ERROR] mysqld got signal 6 ;

Comment by Mario Hubert [ 2021-02-14 ]

For KFir itzhak ...
This has been a MAJOR roadblock to push high volume systems in production for me, due to this problem.
I was able to temporarily fix this issue disabling automatic stats (which seems to cause the problem) on our system.
See if this fix your issues ..

  1. Increase the threshold to 900, instead of 600
    innodb_fatal_semaphore_wait_threshold=900
  2. Disable Auto Stats calculations
    innodb_stats_auto_recalc=0

Try disabling auto recalc, restart your database, and load your data and see if this fixes the problem,
You can re-enable it when done.

Comment by Kfir Itzhak [ 2021-02-14 ]

@Mario Hubert

I was able to find a workaround and it worked.
I simply downgraded from 10.3.27 (CentOS 8.3 version) to 10.3.17 (CentOS 8.2 version)
I did this by changing the repository to temporary use "vault.centos.org", ran "yum downgrade mariadb-server" and then using versionlock plugin, locked all the downgraded packages.

With 10.3.17 everything is working perfectly. I don't see the warnings in the log anymore, and the 100GB dump loaded successfully.

But yep i agree, something in the new versions is definitely broken.

Comment by Pablo Guzmán [ 2021-02-17 ]

We recently upgraded from MariaDB 10.2.21 to 10.5.8 (yes, quite leap!) and we jumped straight into this bug which is now affecting our databases.

This error appears randomly, related to big queries that are being executed on the server (Appeared once for a big alter, then again for a big INSERT INTO X SELECT FROM Y).

When the error appears it can freeze the database server for a few minutes, effectively crashing our servers for those minutes.

This is a major bug and it's scary that is has been unresolved for two months now.

{{2021-02-16 22:15:19 0 [Note] InnoDB: A semaphore wait:
--Thread 140480777963264 has waited at btr0cur.cc line 1480 for 611.00 seconds the semaphore:
SX-lock on RW-latch at 0x7fc1c00031f0 created in file dict0dict.cc line 2160
a writer (thread id 140479946704640) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1969
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 1, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2021-02-16 22:15:19 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
210216 22:15:19 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.5.8-MariaDB-1:10.5.8+maria~focal-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=296
max_threads=2002
thread_count=149
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 33238918 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
mysqld(my_print_stacktrace+0x32)[0x5607f7acc692]
mysqld(handle_fatal_signal+0x485)[0x5607f7523e45]}}

Comment by Kfir Itzhak [ 2021-03-01 ]

@Mario Hubert
And all others

This seems to be related to MDEV-24275
According to this, it was fixed in in 10.2.37, 10.3.28, 10.4.18 and 10.5.9.
I haven't tested if it solves the issue for me yet, but i feel confident its this one.

Comment by Marko Mäkelä [ 2021-03-01 ]

mastertheknife, there were two regressions in the previous releases that could explain this. MDEV-24188 was an infinite loop (hang of some threads until the server is killed), while MDEV-24275 is a recoverable slowdown (statistics collection is being executed for too many iterations). Both bugs are fixed in the latest releases.

Comment by vbvbrj [ 2021-03-02 ]

Thank you for inputs. Docker repository have version 10.5.9. While because of no space left on master database on the disk, the slave could not be synchronized as binlog files on master became incomplete. I will start from scratch to replicate to the slave and test the import.

Comment by Marko Mäkelä [ 2021-03-10 ]

mastertheknife, vbvbrj, pvguzman, does 10.5.9 work for you?

Comment by Bunjil [ 2021-03-10 ]

Marko Mäkelä, I was trying to restore a dump (which has 1 TB table) and it couldn't restore it even upto %10 of table. It seems that 10.5.9 fixed the issue. Now it's %15 and still working without any errors or slowness.

Generated at Thu Feb 08 09:29:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.