[MDEV-15506] Semaphore Wait locking database queries Created: 2018-03-07 Updated: 2020-02-13 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table, Storage Engine - InnoDB |
| Affects Version/s: | 10.2.13 |
| Fix Version/s: | 10.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | Steve Norwood | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 4 |
| Labels: | innodb, recovery, semaphore | ||
| Environment: |
Windows Server 2012 |
||
| Attachments: |
|
| Description |
|
Following on from an initial issue discussed on this stack overflow post (https://stackoverflow.com/questions/49131991/can-you-rollback-a-query-in-state-committing-alter-table-to-storage-engine) where we had an alter table statement that appeared to be stuck, we have killed the query, and had server issues since. We restarted the server, as queries were locking up in the whole server. While normally the queries would time out and die, these were all timing out, and then becoming stuck at a 'Killed' state. Restarting the server removed these hanging queries, and everything seemed to be back up and running. However, the queries built back up again, and it seemed that the rollback was still happening, despite the fact that the table that was initially being altered appeared to be in tact. We found that there was a temporary table still in the database, and thought that the issue was that the temporary table had not been removed after rolling back the alter table statement. We have since used innodb_force_recovery=3 to remove this temporary table, and the issue has persisted. From looking deeper into the error logs, it appears that there a lot of semaphore locks in the database, that may have stemmed from users still trying to query the server while an ALTER TABLE statement was locking the server. After studying the error logs, we have found a lot of semaphore waits in the server. The one that stuck out the most was this one: 2018-03-07 12:27:43 5968 [Note] InnoDB: A semaphore wait: It appears here that the thread is waiting on itself, and could be the reason the whole server is locking up. However, we're not sure about this for certain. It has been discussed a little on this mysql bug report (https://bugs.mysql.com/bug.php?id=73890) but not a lot has been provided in terms of a solution. After attempting various solutions, we have had to grab a backup of the server and run on older data while we update what we can, as this server is currently unusable outside of running in recovery mode. I've posted our error logs our my.ini (before the server went into recovery mode), and the extract of our error logs from when errors started occuring. I've had to remove database names and the like, and replace them with placeholders where necessary to protect our data as much as possible. For a bit of context, this has happened on our production server as we believed with the latest updates to MariaDB, these sort of updates would be possible. The update occured on a database which ibd file of 56.8GB. |
| Comments |
| Comment by Marko Mäkelä [ 2019-09-30 ] | |
|
I am terribly sorry for missing this report for more than a year. Usually, for hangs, I would like to see the stack traces of all threads during the hang:
This command should also display the holders of the InnoDB data dictionary latches. Those often seem to cause hangs or degraded performance. If the ALTER TABLE involved any FULLTEXT INDEX, then there are some known hangs, which we are working on, such as | |
| Comment by Bernardo Perez [ 2019-11-18 ] | |
|
Hello Marko, I know what I am going to report is from MySQL mainline but it is exactly the same behavior as described here and this is the only report I've found. I think it is sharing the same issue. This is my thread on the semaphore section locked to it self: ---------- This is the thread itself: Thread 11 (Thread 0x2b4ff7a02700 (LWP 8773)): I've run the command you requested and attached the file with the state of all the threads at the same time. To give some context of the issue. This is was a restore from a disk snapshot that after start up will always enter in a self deadlock scenario that will result in the process dying. The database was also showing at the same time https://jira.mariadb.org/browse/MDEV-14637 https://bugs.mysql.com/bug.php?id=80919 in particular the 3 purge threads where blocked in gis purge operations. I reduced the number of innodb_purge_threads to 1 (as it can be see in the attached file that only shows 1 purge thread) and still the DB was crashing itself due to this single thread latched on itself. We've seen this in our systems a few times as we run a very large deployment which just increases chances for this to happen. The system is gone so I don't have more info to provide but I am sure that we will, eventually, get another one. If you want to request us more info that we can gather on the next occurrence please let me know, we will be happy to gather it. | |
| Comment by Bernardo Perez [ 2020-02-13 ] | |
|
Regarding my previous comment. That crash is certainly related with this bug https://jira.mariadb.org/browse/MDEV-21512 that is based on this bug and fix https://github.com/mysql/mysql-server/commit/5cdbb22b51cf2b35dbdf5666a251ffbec2f84dec |