Following on from an initial issue discussed on this stack overflow post (https://stackoverflow.com/questions/49131991/can-you-rollback-a-query-in-state-committing-alter-table-to-storage-engine) where we had an alter table statement that appeared to be stuck, we have killed the query, and had server issues since.
We restarted the server, as queries were locking up in the whole server. While normally the queries would time out and die, these were all timing out, and then becoming stuck at a 'Killed' state. Restarting the server removed these hanging queries, and everything seemed to be back up and running. However, the queries built back up again, and it seemed that the rollback was still happening, despite the fact that the table that was initially being altered appeared to be in tact. We found that there was a temporary table still in the database, and thought that the issue was that the temporary table had not been removed after rolling back the alter table statement.
We have since used innodb_force_recovery=3 to remove this temporary table, and the issue has persisted. From looking deeper into the error logs, it appears that there a lot of semaphore locks in the database, that may have stemmed from users still trying to query the server while an ALTER TABLE statement was locking the server. After studying the error logs, we have found a lot of semaphore waits in the server. The one that stuck out the most was this one:
2018-03-07 12:27:43 5968 [Note] InnoDB: A semaphore wait:
--Thread 4304 has waited at row0mysql.cc line 3383 for 219.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 00000103B9E9D870 created in file dict0dict.cc line 1107
a writer (thread id 4304) has reserved it in mode wait exclusive
number of readers 4, waiters flag 0, lock_word: fffffffffffffffc
Last time read locked in file row0purge.cc line 853
Last time write locked in file fts0fts.cc line 7190
It appears here that the thread is waiting on itself, and could be the reason the whole server is locking up. However, we're not sure about this for certain. It has been discussed a little on this mysql bug report (https://bugs.mysql.com/bug.php?id=73890) but not a lot has been provided in terms of a solution.
After attempting various solutions, we have had to grab a backup of the server and run on older data while we update what we can, as this server is currently unusable outside of running in recovery mode. I've posted our error logs our my.ini (before the server went into recovery mode), and the extract of our error logs from when errors started occuring. I've had to remove database names and the like, and replace them with placeholders where necessary to protect our data as much as possible.
For a bit of context, this has happened on our production server as we believed with the latest updates to MariaDB, these sort of updates would be possible. The update occured on a database which ibd file of 56.8GB.