[MDEV-15016] multiple page cleaner threads use a lot of CPU on idle server Created: 2018-01-20 Updated: 2018-01-25 Resolved: 2018-01-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.3.4 |
| Fix Version/s: | 10.2.13, 10.3.5 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Vladislav Vaintroub | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Sprint: | 10.1.31 | ||||||||||||||||||||||||||||
| Description |
|
Issue described in https://lists.launchpad.net/maria-discuss/msg04975.html as follows I cannot tell if previous 10.3 versions also did, as I don't have 10.3 " Examination of the attached stacktraces taken from the process when it in this "idle" state, shows 3 CPUs being occupied by something, that for most time seems to be some busy-look acquiring innodb mutex Setting innodb_page_cleaners = 1, according to reporter, fixed the issue. Relevant callstack information :
|
| Comments |
| Comment by Vladislav Vaintroub [ 2018-01-21 ] | ||||||||||||||||
|
Reproducible with any configuration that results in innodb_page_cleaners > 1. By default, installer uses 1/8 of RAM, i.e 1GB when RAM >= 8GB, which nowadays would be true even for most laptops. | ||||||||||||||||
| Comment by Elena Stepanova [ 2018-01-21 ] | ||||||||||||||||
|
Two commits seem to be involved.
made server hang on startup with innodb-buffer-pool-instances > 1.
fixed the hang, but left the server with this CPU usage problem. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-21 ] | ||||||||||||||||
|
elenst tested the latest 10.3 after reverting the I think that we should revert | ||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-01-22 ] | ||||||||||||||||
|
https://github.com/MariaDB/server/commit/6a8e070d62db375500ed856f9ad5034564447eee | ||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-22 ] | ||||||||||||||||
|
I would like to see a complete high-level design that covers the communication between the threads: at startup, normal operation, shutdown, and when the requested number of threads differs from the current number of threads. I am concerned that as a result of the proposed patch, the page cleaner threads might under some circumstances stop doing useful work. | ||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-01-23 ] | ||||||||||||||||
|
Startup is same as increasing number of page cleaner threads i.e. we create threads and start waiting is_started events until requested number of threads is reached. When page cleaner thread starts it increases number of running threads and sends a is_started event and starts to wait for work in is_requested event. When number of page cleaner threads decrease we send is_requested event i.e. this is similar as in shutdown there is "work" to be done. Again we wait these threads to exit on is_started event. Page cleaner threads that id is less than n_workers will reset is_requested event as there is no "work" to be done and send is_started event before decreasing number of running threads and exists. After several tries I do not know fix that would not use os_event_reset(), if I remove it either when thread number is decreased we are in busy loop or when they are increased we are again on busy loop. Based on my testing both of added os_event_reset() are needed or bad things happen. I do not see a problem using os_event_reset as it is already used e.g. buf0flu.cc line 2864 when no real work is available. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-23 ] | ||||||||||||||||
|
Thank you for the explanation. I believe that the flow would be a little cleaner if we used condition variables instead of events. Can we make the call of os_event_reset() conditional on "there is no work available", similar to how pc_flush_slot() does it?
Or even better, would the following patch fix this issue?
| ||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-01-24 ] | ||||||||||||||||
|
Yes it does, in case we increase the number of page cleaner threads I would also avoid sending unnecessary is_requested event, see modified patch in | ||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-24 ] | ||||||||||||||||
|
For the record: The intended usage of multiple page cleaner threads appears to be to ensure that each buffer pool instance will be flushed without too much delay. I started to wonder whether multiple InnoDB buffer pools actually help with any workloads. Yes, it probably was a good idea to split the buffer pool mutex when Inaam Rana introduced multiple buffer pools in MySQL 5.5.5, but since then, there have been multiple fixes to reduce contention on the buffer pool mutex, such as Inaam’s follow-up fix in MySQL 5.6.2 to use rw-locks instead of mutexes for the buf_pool->page_hash. In MySQL 8.0.0, Shaohua Wang implemented one more thing that MariaDB should copy: I think that we should seriously consider removing all code to support multiple buffer pools or page cleaners. The description of WL#6642: InnoDB: multiple page_cleaner threads seems to imply that it may have been a mistake to partition the buffer pool. Note: partitioning or splitting mutexes often seems to be a good idea. But partitioning data structures or threads might not be. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-24 ] | ||||||||||||||||
|
I filed | ||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-01-24 ] | ||||||||||||||||
|
Done in: https://github.com/MariaDB/server/commit/1b1e0c0a1e47e49531aaef5b7df76c08c1c9d519 |