[MDEV-12413] semaphore wait has lasted > 600 seconds,innodb server is crashed Created: 2017-03-31  Updated: 2020-09-06  Resolved: 2020-09-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.28
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Jiafu Wang Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: innodb, need_feedback
Environment:

CentOS release 6.6 X64
Mariadb 10.0.28


Attachments: Text File error.log     File my.cnf    

 Description   

The Mysql server is crashed and auto restart.
In the maraidb 10.0.21 we have encountered this error before,and we upgraded to 10.0.28.but the error appeared again.please help us to resolve this problem.thx a lot.
attachments are errorlog and my configuration file.



 Comments   
Comment by Marko Mäkelä [ 2017-04-04 ]

Does this occur only with XtraDB, or also with InnoDB? There seem to be conflicting requests on at least two buffer block rw-locks: a B-tree page and an undo log page.
I wonder what was logged between the server startup and the first message that is included in the attached error.log. Did some error occur? For example, if the XtraDB configuration parameter innodb_corrupt_table_action=salvage took effect, there could be a bug in the error handling that causes a corrupted block to be stuck somehow. Some code could be forgetting mtr_commit() and thus forgetting to release a block->lock. I do not remember seeing this kind of a problem in MySQL 5.6, which the InnoDB and XtraDB of MariaDB 10.0 are based on.

Does innochecksum report any errors for any InnoDB data files?

Comment by Jiafu Wang [ 2017-04-05 ]

Sorry,I don't confirm this error occur with innodb,but it occur with xtradb definitely。
I have changed the parameter "innodb_adaptive_hash_index=0",and this error is not appeared up to now. innodb_corrupt_table_action=salvage is looked like not the reason.

Comment by sysdljr [ 2017-04-05 ]

HI, @Jiafu Wang
last year, We use centos6.6 and mariadb 10.0, have some problem too.
later , found it is centos 6.6 bug, after upgrade to centOS 6.7, utill now, mariadb 10.0/10.1 still run normal

reference link, wish to help
https://groups.google.com/forum/?hl=zh-Cn#!starred/codership-team/Ne6WsTWixH8

Comment by Marko Mäkelä [ 2017-04-05 ]

sysdljr, thanks for the link! So, it looks like the Linux kernel in CentOS 6.6 has a bug with futexes that can cause hangs.
While the InnoDB or XtraDB in MariaDB 10.0 or 10.1 are not using futex (fast user-space mutex) directly, it is possible that the POSIX threads library code is using them.
In MariaDB 10.2 (and MySQL 5.7), InnoDB can directly use Linux futex, based on build options.

Comment by Marko Mäkelä [ 2017-04-05 ]

Jiafu, by the way, MDEV-12121 introduces in MariaDB 10.2.5 the option to disable the adaptive hash index altogether, at compilation time. It should perform somewhat better than disabling the index at runtime.
Also, in MySQL 5.7 (which the InnoDB in MariaDB 10.2 is based on) the btr_search_latch is split into multiple rw-locks, and the search latch is not being ‘cached’ inside InnoDB across storage engine API calls, but instead released and acquired more frequently. The ‘caching’ behaviour that was there from the beginning of InnoDB (MySQL 3.23) is prone to hangs.

It would greatly help if you could produce a core dump for the hang. You do not need to upload it (after all, it contains the buffer pool, which could be sensitive data), but you should back it up along with the mysqld executable and all *.so files listed by "ldd mysqld". At first, I would only want to see the stack traces of all threads. You can get that as follows:

gdb /path/to/mysqld /path/to/core
set height 0
set log file threads.txt
set log on
thread apply all backtrace
quit

You can also use the gdb "attach" command to attach to the running mysqld process before the server is aborted.

Comment by Jiafu Wang [ 2017-04-05 ]

@Marko Mäkelä thanks for your suggestion,and I would dump the core if I could dump it.Actually the mysqld thread would restart automaticly before,so it couldn't be dumped at all.

@sysdljr My environments are also Haswell-based Servers with cpu of E5-2630 and I would plan to upgrade my kernel as soon as possible.thanks a lot.

Comment by Marko Mäkelä [ 2020-08-07 ]

Based on the version number, this cannot be a duplicate of MDEV-13983.

I remember that there were problems with the multi-threaded flushing code in the XtraDB storage engine of MariaDB Server 10.1, but that code is not present in the 10.0 series.

The MariaDB Server 10.0 has already reached its end of life, and the 10.1 series will follow soon.

Is this hang still reproducible in any version of MariaDB Server?

Generated at Thu Feb 08 07:57:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.