[MDEV-8098] when sysbench stress test, mariadb server hung Created: 2015-05-04  Updated: 2016-05-25  Resolved: 2016-05-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.16, 10.0.17
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: sysdljr Assignee: Axel Schwenke
Resolution: Not a Bug Votes: 0
Labels: innodb, tests
Environment:

Hardware:
Cpu ,Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
mem: 64GB
storage: PCIE 6.4TB

Software:
OS: centos 6.6
DB: MariaDB Server 10.0.17
tool : sysbench 0.5



 Description   

We used sysbench test performance, test scripts is:
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-tables-count=8 --mysql-host=localhost --mysql-port=3306 --mysql-socket=/ssd/mysql3306/data/mysql.sock --mysql-user=root --mysql-password=123456 --mysql-db=test prepare

sysbench --db-driver=mysql --test=/usr/share/doc/sysbench/tests/db/insert.lua --num-threads=8 --max-requests=0 --mysql-host=localhost --mysql-port=3306 --mysql-socket=/ssd/mysql3306/data/mysql.sock --mysql-user=root --mysql-password=123456 --mysql-db=test --max-time=0 --report-interval=1 --oltp-tables-count=8 run

after run about 50 seconds, mysql stop write and hunged, See below:
Running the test with following options:
Number of threads: 8
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored

Threads started!
[ 1s] threads: 8, tps: 0.00, reads: 0.00, writes: 47510.95, response time: 0.24ms (95%), errors: 0.00, reconnects: 0.00
[ 2s] threads: 8, tps: 0.00, reads: 0.00, writes: 49085.07, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 3s] threads: 8, tps: 0.00, reads: 0.00, writes: 49759.05, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 4s] threads: 8, tps: 0.00, reads: 0.00, writes: 49685.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 5s] threads: 8, tps: 0.00, reads: 0.00, writes: 49274.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 6s] threads: 8, tps: 0.00, reads: 0.00, writes: 49619.04, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 7s] threads: 8, tps: 0.00, reads: 0.00, writes: 49430.88, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 8s] threads: 8, tps: 0.00, reads: 0.00, writes: 49578.98, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 9s] threads: 8, tps: 0.00, reads: 0.00, writes: 49544.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 10s] threads: 8, tps: 0.00, reads: 0.00, writes: 49387.11, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 11s] threads: 8, tps: 0.00, reads: 0.00, writes: 49451.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 12s] threads: 8, tps: 0.00, reads: 0.00, writes: 49237.96, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 13s] threads: 8, tps: 0.00, reads: 0.00, writes: 49800.00, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 14s] threads: 8, tps: 0.00, reads: 0.00, writes: 49296.99, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 15s] threads: 8, tps: 0.00, reads: 0.00, writes: 49360.08, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 16s] threads: 8, tps: 0.00, reads: 0.00, writes: 49480.71, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 17s] threads: 8, tps: 0.00, reads: 0.00, writes: 49598.06, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 18s] threads: 8, tps: 0.00, reads: 0.00, writes: 49374.20, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 19s] threads: 8, tps: 0.00, reads: 0.00, writes: 49658.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 20s] threads: 8, tps: 0.00, reads: 0.00, writes: 49320.93, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 21s] threads: 8, tps: 0.00, reads: 0.00, writes: 49569.03, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 22s] threads: 8, tps: 0.00, reads: 0.00, writes: 49374.03, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 23s] threads: 8, tps: 0.00, reads: 0.00, writes: 49448.00, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 24s] threads: 8, tps: 0.00, reads: 0.00, writes: 49090.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 25s] threads: 8, tps: 0.00, reads: 0.00, writes: 49602.00, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 26s] threads: 8, tps: 0.00, reads: 0.00, writes: 49435.01, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 27s] threads: 8, tps: 0.00, reads: 0.00, writes: 49374.13, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 28s] threads: 8, tps: 0.00, reads: 0.00, writes: 49537.91, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 29s] threads: 8, tps: 0.00, reads: 0.00, writes: 49417.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 30s] threads: 8, tps: 0.00, reads: 0.00, writes: 49680.04, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 31s] threads: 8, tps: 0.00, reads: 0.00, writes: 49535.99, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 32s] threads: 8, tps: 0.00, reads: 0.00, writes: 49341.03, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 33s] threads: 8, tps: 0.00, reads: 0.00, writes: 49661.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 34s] threads: 8, tps: 0.00, reads: 0.00, writes: 49278.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 35s] threads: 8, tps: 0.00, reads: 0.00, writes: 49508.91, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 36s] threads: 8, tps: 0.00, reads: 0.00, writes: 49985.01, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 37s] threads: 8, tps: 0.00, reads: 0.00, writes: 49132.97, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 38s] threads: 8, tps: 0.00, reads: 0.00, writes: 49532.03, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 39s] threads: 8, tps: 0.00, reads: 0.00, writes: 49514.00, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 40s] threads: 8, tps: 0.00, reads: 0.00, writes: 49173.08, response time: 0.23ms (95%), errors: 0.00, reconnects: 0.00
[ 41s] threads: 8, tps: 0.00, reads: 0.00, writes: 49718.98, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 42s] threads: 8, tps: 0.00, reads: 0.00, writes: 49538.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 43s] threads: 8, tps: 0.00, reads: 0.00, writes: 49319.96, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 44s] threads: 8, tps: 0.00, reads: 0.00, writes: 49652.04, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 45s] threads: 8, tps: 0.00, reads: 0.00, writes: 49211.96, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 46s] threads: 8, tps: 0.00, reads: 0.00, writes: 49638.06, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 47s] threads: 8, tps: 0.00, reads: 0.00, writes: 46919.54, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
[ 48s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
[ 49s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
[ 50s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
[ 51s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
...............................................................

*show engine innodb status contents at below :*

BACKGROUND THREAD
-----------------
srv_master_thread loops: 49 srv_active, 0 srv_shutdown, 1084 srv_idle
srv_master_thread log flush and writes: 1132
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 7022
--Thread 139710954862336 has waited at log0log.ic line 358 for 282.00 seconds the semaphore:
Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
waiters flag 1
--Thread 139710954563328 has waited at log0log.ic line 358 for 282.00 seconds the semaphore:
Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
waiters flag 1
--Thread 139710804371200 has waited at log0log.ic line 358 for 282.00 seconds the semaphore:
Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
waiters flag 1
--Thread 139690453731072 has waited at log0log.ic line 461 for 281.00 seconds the semaphore:
Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
waiters flag 1
--Thread 139690506180352 has waited at log0log.cc line 1803 for 281.00 seconds the semaphore:
Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
waiters flag 1
OS WAIT ARRAY INFO: signal count 171294
Mutex spin waits 3491278, rounds 22606058, OS waits 3668
RW-shared spins 48111, rounds 141880, OS waits 1827
RW-excl spins 180519, rounds 780043, OS waits 657
Spin rounds per wait: 6.48 mutex, 2.95 RW-shared, 4.32 RW-excl

------------
TRANSACTIONS
------------
Trx id counter 306500602
Purge done for trx's n:o < 304158823 undo n:o < 0 state: running but idle
History list length 469
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 306500597, not started flushing log
mysql tables in use 1, locked 1
MySQL thread id 11, OS thread handle 0x7f10f4cff700, query id 2320632 localhost root query end
INSERT INTO sbtest3 (id, k, c, pad) VALUES (0, 5024, '59822754713-21504320006-70402977880-52533779811-55443850956-75663501138-32952798122-52984185374-40063414595-30990348181', '00149832870-23643902826-53824533577-44914074721-61304247725')
---TRANSACTION 306500595, not started flushing log
mysql tables in use 1, locked 1
MySQL thread id 10, OS thread handle 0x7f10f4d48700, query id 2320630 localhost root query end
INSERT INTO sbtest3 (id, k, c, pad) VALUES (0, 5240, '76543084107-59746395204-79243311540-11836470234-37353949646-01770990785-75455199727-65275840224-15238662556-35807951565', '50044502986-25226433799-47797793636-03022954115-60278688538')
---TRANSACTION 306500594, not started flushing log
mysql tables in use 1, locked 1
MySQL thread id 9, OS thread handle 0x7f10f4d91700, query id 2320629 localhost root query end
INSERT INTO sbtest5 (id, k, c, pad) VALUES (0, 5058, '37756273936-19470060108-05348482238-36788386100-65707124278-00778599700-58513907809-00382830729-81260231488-24197577440', '96968918924-48962654427-86527086044-70524670701-49888977143')
---TRANSACTION 306500596, not started flushing log
mysql tables in use 1, locked 1
MySQL thread id 6, OS thread handle 0x7f10fdccd700, query id 2320631 localhost root query end
INSERT INTO sbtest6 (id, k, c, pad) VALUES (0, 4958, '42595431418-54160031986-75414205416-45839511846-81285571692-56084379278-23293939469-87089281598-19233348060-94919714254', '30515557351-39440033873-59801946865-60824112594-14420176186')
---TRANSACTION 306500598, not started flushing log
mysql tables in use 1, locked 1
MySQL thread id 7, OS thread handle 0x7f10fdc84700, query id 2320633 localhost root query end
INSERT INTO sbtest2 (id, k, c, pad) VALUES (0, 5010, '62845186706-10817751210-53424088184-51434127394-72460917740-75469062308-83337747236-08014652804-91497207912-11117706272', '86108388613-12443483580-78147342437-57233103465-54986820553')
---TRANSACTION 306500601, ACTIVE 282 sec inserting
mysql tables in use 1, locked 1
1 lock struct(s), heap size 360, 0 row lock(s)
MySQL thread id 8, OS thread handle 0x7f10f4dda700, query id 2320636 localhost root update
INSERT INTO sbtest7 (id, k, c, pad) VALUES (0, 5012, '91772122081-60473838408-64691507610-84945227497-11920568393-31710259244-24773254948-34705940136-89996219460-36245238979', '08842506806-76703742979-33613751407-86512328179-69715058019')
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table `test`.`sbtest7` trx id 306500601 lock mode IX lock hold time 282 wait time before grant 0
---TRANSACTION 306500600, ACTIVE 282 sec inserting
mysql tables in use 1, locked 1
1 lock struct(s), heap size 360, 0 row lock(s)
MySQL thread id 4, OS thread handle 0x7f10fdd5f700, query id 2320635 localhost root update
INSERT INTO sbtest1 (id, k, c, pad) VALUES (0, 4991, '20573335662-30783472294-91392812538-64279544640-03952693010-49516267500-37153587475-29426794802-75715869054-36500529893', '64279858973-94356763024-24019976507-71917495604-46993976375')
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table `test`.`sbtest1` trx id 306500600 lock mode IX lock hold time 282 wait time before grant 0
---TRANSACTION 306500599, ACTIVE 282 sec inserting
mysql tables in use 1, locked 1
1 lock struct(s), heap size 360, 0 row lock(s)
MySQL thread id 5, OS thread handle 0x7f10fdd16700, query id 2320634 localhost root update
INSERT INTO sbtest2 (id, k, c, pad) VALUES (0, 5049, '14034900987-16912674470-19094580238-83057733456-69854491333-00792552128-12159215864-22282265446-84407740021-51783026756', '72866547564-65646294243-67915491544-80988475271-52586539055')
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table `test`.`sbtest2` trx id 306500599 lock mode IX lock hold time 282 wait time before grant 0

MySQL error log at below:
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2015-05-04 19:22:23 7fbe627fd700 InnoDB: Assertion failure in thread 140455673059072 in file srv0srv.cc line 2196
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150504 19:22:24 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.0.17-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=18
max_threads=4098
thread_count=18
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 9133140 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb743db]
/usr/sbin/mysqld(handle_fatal_signal+0x398)[0x726a38]
/lib64/libpthread.so.0[0x368960f710]
/lib64/libc.so.6(gsignal+0x35)[0x3689232625]
/lib64/libc.so.6(abort+0x175)[0x3689233e05]
/usr/sbin/mysqld[0x9376fc]
/lib64/libpthread.so.0[0x36896079d1]
/lib64/libc.so.6(clone+0x6d)[0x36892e89dd]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150504 19:22:24 mysqld_safe Number of processes running now: 0
150504 19:22:24 mysqld_safe mysqld restarted



 Comments   
Comment by Elena Stepanova [ 2015-05-04 ]

Hi Axel,

Could you please check it out since you're running sysbench anyway?

Thanks.

Comment by Axel Schwenke [ 2015-09-21 ]

Hi,

I could not reproduce this issue. I used our current mariaDB-10.0.21 and sysbench-0.5 like so:

sysbench-0.5 --test=lua/oltp.lua --oltp-tables-count=8 --mysql-socket=/tmp/mysql.sock --mysql-user=axel --mysql-db=test prepare
sysbench-0.5 --test=lua/insert.lua --num-threads=8 --max-requests=0 --mysql-socket=/tmp/mysql.sock --mysql-user=axel --mysql-db=test --max-time=0 --report-interval=1 --oltp-tables-count=8 run

This was running for more than 10 minutes without showing any anomalies.

Questions:

1. could you reproduce this problem?
2. what my.cnf did you use?
you seem to use some kind of flash storage card
3. what product is it? What driver is it using? What file system?

The reason of the crash is a long semaphore wait. More specifically the semaphore is the one protecting the redo log. So it seems a write request to the redo log was stuck for >600 seconds before MariaDB crashed. This could well be just a symptom of a problem with either hardware or drivers.

BR, Axel

Comment by sysdljr [ 2016-03-19 ]

Sorry, Long time no see this problem.
After that, We replace to HP oem Fusion-IO Card (CentOS 6.6 ext4 system) and upgrade card Driver, sysbench test no occur this problem.
Now, Mainly problem is MDEV-9387 ,According to Jan Lindstrom's advice ,we switch xtradb to innodb_plugin, at the same time , we contact the Fusion-IO manufacturer
help us to check hardware.
Thank you very much .

Comment by Jan Lindström (Inactive) [ 2016-05-25 ]

Confirmation from user that this is CentOS 6.6 bug not a MariaDB bug.

Thank you .
Recently ,We test MairaDB10.1,Percona xtradb cluster 5.6, found write stop question, google groups link:
https://groups.google.com/forum/#!topic/codership-team/Ne6WsTWixH8
Last ,We contact other company's DBA , found it is CentOS 6.6 bug.
receference link:
https://www.infoq.com/news/2015/05/redhat-futex
https://groups.google.com/forum/#!msg/mechanical-sympathy/QbmpZxp6C64/0M4_EbzSLj4J
after we replace to CentOS 6.5, mariadb 10.0, mariadb 10.1 (galera cluster) run normal
Both the bug and Mdev-8098 can closed , thanks you again

Generated at Thu Feb 08 07:24:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.