[MDEV-17829] MariaDB Hangs randomly (InnoDB long semaphore wait) Created: 2018-11-25  Updated: 2019-04-16  Resolved: 2019-03-21

Status: Closed
Project: MariaDB Server
Component/s: Platform Debian, Storage Engine - InnoDB
Affects Version/s: 10.3.11
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Markus Viitamäki Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: crash, innodb, performance
Environment:

Debian Testing (Buster)
MariaDB 10.3.11 (10.3.11+maria~sid)

2x E5-2620 v3
128Gb RAM
RAID10 SSD Storage


Attachments: HTML File trace    

 Description   

After a reboot of our database machine it started hanging and MariaDB started crashing. (MariaDB was not upgraded just before reboot, 10.3.11 has been running on the machine prior to reboot)
I have attached some logs here from when it happens.

Since the reboot this has happened 4 times now (less than 24h after reboot).

Short log from Crash: https://pastebin.com/yWq36AKb
Full log from before crash (syslog): https://pastebin.com/MEZMNUbg

The server is running the latest patches from debian for buster and latest 10.3 version from sid repo (http://ftp.ddg.lth.se/mariadb/repo/10.3/debian).


Nov 25 19:46:38 db-01 mysqld[10328]: InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
Nov 25 19:46:38 db-01 mysqld[10328]: InnoDB: Pending reads 0, writes 0
Nov 25 19:47:08 db-01 mysqld[10328]: InnoDB: ###### Diagnostic info printed to the standard error stream
Nov 25 19:47:08 db-01 mysqld[10328]: 2018-11-25 19:47:08 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
Nov 25 19:47:08 db-01 mysqld[10328]: 181125 19:47:08 [ERROR] mysqld got signal 6 ;
Nov 25 19:47:08 db-01 mysqld[10328]: This could be because you hit a bug. It is also possible that this binary
Nov 25 19:47:08 db-01 mysqld[10328]: or one of the libraries it was linked against is corrupt, improperly built,
Nov 25 19:47:08 db-01 mysqld[10328]: or misconfigured. This error can also be caused by malfunctioning hardware.
Nov 25 19:47:08 db-01 mysqld[10328]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Nov 25 19:47:08 db-01 mysqld[10328]: We will try our best to scrape up some info that will hopefully help
Nov 25 19:47:08 db-01 mysqld[10328]: diagnose the problem, but since we have already crashed,
Nov 25 19:47:08 db-01 mysqld[10328]: something is definitely wrong and this may fail.
Nov 25 19:47:08 db-01 mysqld[10328]: Server version: 10.3.11-MariaDB-1:10.3.11+maria~sid-log
Nov 25 19:47:08 db-01 mysqld[10328]: key_buffer_size=134217728
Nov 25 19:47:08 db-01 mysqld[10328]: read_buffer_size=2097152
Nov 25 19:47:08 db-01 mysqld[10328]: max_used_connections=527
Nov 25 19:47:08 db-01 mysqld[10328]: max_threads=1002
Nov 25 19:47:08 db-01 mysqld[10328]: thread_count=534
Nov 25 19:47:08 db-01 mysqld[10328]: It is possible that mysqld could use up to
Nov 25 19:47:08 db-01 mysqld[10328]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6309679 K bytes of memory
Nov 25 19:47:08 db-01 mysqld[10328]: Hope that's ok; if not, decrease some variables in the equation.
Nov 25 19:47:08 db-01 mysqld[10328]: Thread pointer: 0x0
Nov 25 19:47:08 db-01 mysqld[10328]: Attempting backtrace. You can use the following information to find out
Nov 25 19:47:08 db-01 mysqld[10328]: where mysqld died. If you see no messages after this, something went
Nov 25 19:47:08 db-01 mysqld[10328]: terribly wrong...
Nov 25 19:47:08 db-01 mysqld[10328]: stack_bottom = 0x0 thread_stack 0x49000
Nov 25 19:47:09 db-01 mysqld[10328]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x556315543dbe]
Nov 25 19:47:09 db-01 mysqld[10328]: /usr/sbin/mysqld(handle_fatal_signal+0x5a5)[0x556314fe3055]
Nov 25 19:47:09 db-01 mysqld[10328]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x128e0)[0x7f24ca1148e0]
Nov 25 19:47:09 db-01 mysqld[10328]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f24c9378f3b]
Nov 25 19:47:09 db-01 mysqld[10328]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x151)[0x7f24c937a2f1]
Nov 25 19:47:09 db-01 mysqld[10328]: /usr/sbin/mysqld(+0xa15bb1)[0x5563152b1bb1]
Nov 25 19:47:09 db-01 mysqld[10328]: /usr/sbin/mysqld(+0x9c9f47)[0x556315265f47]
Nov 25 19:47:09 db-01 mysqld[10328]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7f2a)[0x7f24ca109f2a]
Nov 25 19:47:09 db-01 mysqld[10328]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f24c943aedf]
Nov 25 19:47:09 db-01 mysqld[10328]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Nov 25 19:47:09 db-01 mysqld[10328]: information that should help you find out what is causing the crash.
Nov 25 19:50:29 db-01 systemd[1]: Started Session 52 of user suom1.
Nov 25 19:50:54 db-01 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
Nov 25 19:50:54 db-01 systemd[1]: mariadb.service: Failed with result 'signal'.
Nov 25 19:50:59 db-01 systemd[1]: mariadb.service: Service RestartSec=5s expired, scheduling restart.
Nov 25 19:50:59 db-01 systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 2.
Nov 25 19:50:59 db-01 systemd[1]: Stopped MariaDB 10.3.11 database server.
Nov 25 19:50:59 db-01 systemd[1]: Starting MariaDB 10.3.11 database server...
Nov 25 19:51:00 db-01 mysqld[12343]: 2018-11-25 19:51:00 0 [Note] /usr/sbin/mysqld (mysqld 10.3.11-MariaDB-1:10.3.11+maria~sid-log) starting as process 12343 ...
Nov 25 19:51:00 db-01 mysqld[12343]: 2018-11-25 19:51:00 0 [Note] InnoDB: Using Linux native AIO
Nov 25 19:51:00 db-01 mysqld[12343]: 2018-11-25 19:51:00 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
Nov 25 19:51:00 db-01 mysqld[12343]: 2018-11-25 19:51:00 0 [Note] InnoDB: Uses event mutexes
Nov 25 19:51:00 db-01 mysqld[12343]: 2018-11-25 19:51:00 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
Nov 25 19:51:00 db-01 mysqld[12343]: 2018-11-25 19:51:00 0 [Note] InnoDB: Number of pools: 1
Nov 25 19:51:00 db-01 mysqld[12343]: 2018-11-25 19:51:00 0 [Note] InnoDB: Using SSE2 crc32 instructions
Nov 25 19:51:00 db-01 mysqld[12343]: 2018-11-25 19:51:00 0 [Note] InnoDB: Initializing buffer pool, total size = 64G, instances = 64, chunk size = 128M
Nov 25 19:51:02 db-01 mysqld[12343]: 2018-11-25 19:51:02 0 [Note] InnoDB: Completed initialization of buffer pool
Nov 25 19:51:02 db-01 mysqld[12343]: 2018-11-25 19:51:02 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
Nov 25 19:51:02 db-01 mysqld[12343]: 2018-11-25 19:51:02 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=287284775094
Nov 25 19:51:02 db-01 mysqld[12343]: 2018-11-25 19:51:02 0 [Note] InnoDB: Starting final batch to recover 3 pages from redo log.
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: Last binlog file '/var/log/mysql/mariadb-bin.001469', position 450270
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: Creating shared tablespace for temporary tables
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: Waiting for purge to start
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: 10.3.11 started; log sequence number 287284775194; transaction id 316943429
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] Plugin 'FEEDBACK' is disabled.
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] Starting crash recovery...
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] Crash recovery finished.
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] Server socket created on IP: '::'.
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Warning] 'user' entry 'root@db-01' ignored in --skip-name-resolve mode.
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Warning] 'proxies_priv' entry '@% root@db-01' ignored in --skip-name-resolve mode.
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] Reading of all Master_info entries succeded
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] Added new Master_info '' to hash table
Nov 25 19:51:03 db-01 mysqld[12343]: 2018-11-25 19:51:03 0 [Note] /usr/sbin/mysqld: ready for connections.
Nov 25 19:51:03 db-01 mysqld[12343]: Version: '10.3.11-MariaDB-1:10.3.11+maria~sid-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
Nov 25 19:51:03 db-01 /etc/mysql/debian-start[12391]: Upgrading MySQL tables if necessary.
Nov 25 19:51:03 db-01 /etc/mysql/debian-start[12395]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Nov 25 19:51:03 db-01 /etc/mysql/debian-start[12395]: Looking for 'mysql' as: /usr/bin/mysql
Nov 25 19:51:03 db-01 /etc/mysql/debian-start[12395]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Nov 25 19:51:03 db-01 /etc/mysql/debian-start[12395]: This installation of MySQL is already upgraded to 10.3.11-MariaDB, use --force if you still need to run mysql_upgrade
Nov 25 19:51:03 db-01 /etc/mysql/debian-start[12404]: Checking for insecure root accounts.
Nov 25 19:51:03 db-01 /etc/mysql/debian-start[12409]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables
Nov 25 19:51:03 db-01 systemd[1]: Started MariaDB 10.3.11 database server.
Nov 25 19:51:33 db-01 mysqld[12343]: 2018-11-25 19:51:33 0 [Note] InnoDB: Buffer pool(s) load completed at 181125 19:51:33



 Comments   
Comment by Markus Viitamäki [ 2018-11-26 ]

Update to this. After some more studying it could be related to mysql-dump. We run an hourly backup on databases, however the crashes was not hourly.

This is how we dump data: https://pastebin.com/pPeEFbib

I have also downgraded to 10.3.10 for now, and the past 5 hours has been stable.

Comment by Marko Mäkelä [ 2019-01-31 ]

suom1, would it be possible to get the stack traces of all threads when that hang occurs, before InnoDB has committed suicide? You would need the debugging symbols installed, and then execute something like the following:

gdb -ex 'set height 0' -ex 'thread apply all backtrace' -ex detach -ex quit -p $(pgrep -x mysqld)

This should help identify the cause of the hang.

Comment by Maikel Punie [ 2019-04-16 ]

We have exactly the same problem with mariadb 10.3.14 on a centos7
attached is the debug output trace

i let the process continue, after 25 minuts the service was working, so it took a really long time

Comment by Marko Mäkelä [ 2019-04-16 ]

Maikel Punie, in trace I see that a purge thread as well as a replication applier thread are invoking single-page flushes, apparently because all pages in the buffer pool are dirty and cannot be evicted to make room for a new page. That is not really a code bug, especially given that the service was eventually resumed. The situation could be helped by improving the configuration parameters or by avoiding long-running or large transactions that will cause bursts in purge performance.

Generated at Thu Feb 08 08:39:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.