[MDEV-16785] MariaDB server is running in 100% on one cpu Created: 2018-07-20 Updated: 2023-04-27 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 5.5.57 |
| Fix Version/s: | 10.4, 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Michael Graf | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Windows Server 2012 R2 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
MariaDB server is running regularly in an endless loop in which one cpu is consuming 100%. An analysis of the created minidump shows that the thread with id 3574 consumed most of the cpu. |
| Comments |
| Comment by Elena Stepanova [ 2018-08-04 ] | ||||||||||||||||||||||||||||||
|
Threads from the minidump: threads However, the contents doesn't seem to correspond the described situation. According to the coredump, there is a lot of query-processing activity (numerous updates, deletes, inserts, selects), so the processlist surely shouldn't be empty. Stack trace for the alleged culprit 3574 is not very impressive, either:
wlad, would you like to take a look? | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-08-04 ] | ||||||||||||||||||||||||||||||
|
reassigning to Marko, looks like Innodb deadlock. | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-04-16 ] | ||||||||||||||||||||||||||||||
|
I see a possible bug in the InnoDB sync_array in threads
These threads are operating on unrelated buf_block_t::lock, but there appears to be a possible race condition in the sync_array element 0x000000d78d9dccb0 between sync_array_reserve_cell() and sync_array_wait_event(). It appears that we could be attempting to allocate a slot that is still in use. The sync array is something that I have never worked on. I think that we should try to eliminate it, but I do not think that we can do it in a GA version. | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-04-16 ] | ||||||||||||||||||||||||||||||
|
There is quite a bit of nonzero-overhead abstraction (or obfuscation) going on. Apparently, there is only one sync_array_t object in existence: sync_primary_wait_array, and sync_primary_wait_array->protection is always SYNC_ARRAY_OS_MUTEX. It looks like sync_primary_wait_array->os_mutex is the contention point that adds insult to injury when some contention occurs inside InnoDB. wlad points out that srv_error_monitor_thread() is continuously holding this mutex while gathering information about InnoDB mutex or rw-lock contention. Unless InnoDB is actually hanging, I am afraid that there is nothing that we can do about this in a GA version. | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-09-30 ] | ||||||||||||||||||||||||||||||
|
The code could be refactored as part of |