[MDEV-25111] Long semaphore wait (> 800 secs), server stops responding - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.3.28
Fix Version/s: N/A
Component/s: Galera, Storage Engine - InnoDB
Labels:
- need_feedback
Environment:

Hide
CentOS 7.9.2009

Defaults: mysqld would have been started with the following arguments:
--datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1550 --slow_query_log=1 --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=5555 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://[redacted] --wsrep_node_address=[redacted] --wsrep_sst_method=mariabackup --wsrep_sst_auth=[redacted] --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON

Show
CentOS 7.9.2009 Defaults: mysqld would have been started with the following arguments: --datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1550 --slow_query_log=1 --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=5555 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://[redacted] --wsrep_node_address=[redacted] --wsrep_sst_method=mariabackup --wsrep_sst_auth=[redacted] --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON

Description

We had been running MariaDB 10.3.27 with Galera cluster in production without any issues.

Less than 26 hours after updating to 10.3.28 one of three servers stopped responding.

The following errors were logged for more than 800 seconds until I restarted MariaDB:

2021-03-10 15:10:45 0 [Warning] InnoDB: A long semaphore wait:

--Thread 140231647176448 has waited at lock0lock.cc line 3882 for 241.00 seconds the semaphore:

Mutex at 0x5562730bb380, Mutex LOCK_SYS created lock0lock.cc:461, lock var 2

2021-03-10 15:10:45 0 [Warning] InnoDB: A long semaphore wait:

--Thread 140286587229952 has waited at lock0lock.cc line 3882 for 241.00 seconds the semaphore:

Mutex at 0x5562730bb380, Mutex LOCK_SYS created lock0lock.cc:461, lock var 2

[clip]

2021-03-10 15:10:45 0 [Note] InnoDB: A semaphore wait:

--Thread 140232309577472 has waited at row0row.cc line 1133 for 239.00 seconds the semaphore:

X-lock on RW-latch at 0x7f8f0086a270 created in file buf0buf.cc line 1563

a writer (thread id 140232091498240) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file row0row.cc line 1133

[clip]

2021-03-10 15:10:45 0 [Note] InnoDB: A semaphore wait:

--Thread 140231088736000 has waited at lock0lock.cc line 3882 for 238.00 seconds the semaphore:

Mutex at 0x5562730bb380, Mutex LOCK_SYS created lock0lock.cc:461, lock var 2

2021-03-10 15:10:45 0 [Note] InnoDB: A semaphore wait:

--Thread 140232326362880 has waited at srv0srv.cc line 2026 for 238.00 seconds the semaphore:

X-lock (wait_ex) on RW-latch at 0x55627310f3e0 created in file dict0dict.cc line 920

a writer (thread id 140232326362880) has reserved it in mode  wait exclusive

number of readers 4, waiters flag 1, lock_word: fffffffc

Last time write locked in file srv0srv.cc line 2026

2021-03-10 15:10:45 0 [Note] InnoDB: A semaphore wait:

--Thread 140231088121600 has waited at lock0lock.cc line 3882 for 238.00 seconds the semaphore:

Mutex at 0x5562730bb380, Mutex LOCK_SYS created lock0lock.cc:461, lock var 2

2021-03-10 15:10:45 0 [Note] InnoDB: A semaphore wait:

--Thread 140231087814400 has waited at lock0lock.cc line 3882 for 238.00 seconds the semaphore:

Mutex at 0x5562730bb380, Mutex LOCK_SYS created lock0lock.cc:461, lock var 2

I'll attach the full log from when the problem started. The log also contains InnoDB Monitor output.

This sounds similar to ~~MDEV-24375~~, but seems to be a different issue since ~~MDEV-24275~~ should have fixed that in 10.3.28 if I got things right.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb-semaphore-wait.log
4.74 MB
2021-03-10 18:23
mysql-galera.log
366 kB
2021-03-19 15:21

Issue Links

is duplicated by

MDEV-25190 Semaphore wait has lasted > 600 seconds; stuck on bg_wsrep_kill_trx

Closed

relates to

MDEV-24275 InnoDB persistent stats analyze forces full scan forcing lock crash

Closed

MDEV-24606 InnoDB: Semaphore wait has lasted > 600 second

Closed

MDEV-24831 Galera test failure because of safe_mutex error

Closed

MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

Closed

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Ere Maijala

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2021-03-10 18:23

Updated:: 2021-04-29 21:56

Resolved:: 2021-04-29 21:56

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.