[MDEV-30864] Crash on innodb_fatal_semaphore_wait_threshold Created: 2023-03-16 Updated: 2023-06-27 Resolved: 2023-06-27 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server, Storage Engine - InnoDB |
| Affects Version/s: | 10.9.5 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Jacob Williams | Assignee: | Marko Mäkelä |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Amazon Linux 2 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
We have had this crash 4 times in the last 3 weeks since we upgraded to 10.9.5. We were previously on 10.4.x and had not seen this or any other crashes recently. We run 12 production servers and we have seen this on 3 of them. All times were under pretty heavy load. We suspected that the heavy io load made have been an issue and increased the IO capacity of our EBS volumes so that we did not spend as much time at peak capacity. But the error happened today under the faster io config and the load was not reaching capacity this time, although the server was moderately busy at the time.
|
| Comments |
| Comment by Marko Mäkelä [ 2023-03-16 ] |
|
Can you please try to produce full stack traces of all threads of the hung process? Without that information, it is impossible to diagnose hangs. There are good chances that this would be a duplicate of |
| Comment by Jacob Williams [ 2023-03-16 ] |
|
OK working on a full stack trace for all threads but it's slow going. Only at 33% after about 2hrs. And it will take us a little while to make sure we don't have any customer data in it. In the mean time i picked out a few interesting threads in what has come out so far and cleaned all the sensitive strings from them. Maybe they can confirm that it is same as |
| Comment by Jacob Williams [ 2023-03-16 ] |
|
I uploaded full stacktrace of all threads to FTP as |
| Comment by Daniel Black [ 2023-04-03 ] |
|
Thread 215 (Thread 0x7fb2efa6e700 (LWP 30276)) stack is like this stack in MDEV-29835 - recursive call on btr_page_split_and_insert |
| Comment by Jacob Williams [ 2023-04-07 ] |
|
We reverted to 10.9.4 and have not seen this crash for the last few weeks. We will wait for 10.9.6. Unfortunately we have been seeing extraordinary memory usage in 10.9.4, but not completely sure that it is not present in 10.9.5. I'll create a ticket for that if I can get together enough information on it. |
| Comment by Daniel Black [ 2023-04-07 ] |
|
> we have been seeing extraordinary memory usage in 10.9.4 I'm fairly sure that was known and fixed. |
| Comment by Marko Mäkelä [ 2023-05-26 ] |
|
MariaDB Server 10.9.6 was released a couple of weeks ago. Would that fix the hang for you? |