[MDEV-23754] Replace buf_pool.flush_list with a priority queue Created: 2020-09-18 Updated: 2020-10-28 Resolved: 2020-10-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | N/A |
| Type: | Task | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | performance | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
In an effort to speed up crash recovery, InnoDB introduced buf_pool_t::flush_rbt, which would keep the flush_list sorted by oldest_modification during recovery. Such ordering should be beneficial at all times, but maintaining 2 data collections (doubly linked list and binary search tree) is clearly suboptimal. Replacing the doubly-linked buf_pool.flush_list with a priority queue ORDER BY page.oldest_modification ASC, page.id ASC should include the following benefits:
One implementation challenge is that apart from the buf_block_t that are in buf_pool.chunks for ROW_FORMAT=COMPRESSED pages we have some buf_page_t that are allocated with malloc(). The majority of buf_page_t are actually buf_block_t::page. Because of this, the priority queue will probably have to store pointers to buf_page_t. |
| Comments |
| Comment by Marko Mäkelä [ 2020-09-24 ] | ||||||||
|
I suspect that the following O(n) iteration in a critical section of buf_pool.flush_list_mutex in page_cleaner_flush_pages_recommendation() could be replaced with an O(log n) binary search when we switch to a priority queue:
| ||||||||
| Comment by Vladislav Vaintroub [ 2020-09-24 ] | ||||||||
|
STL priority_queue does not have an interface to iterate over it. you can access the top element though. Although, it is probably possible to derive from STL priority_queue and rely on the fact that underlying container is organized as heap structure. then I guess it would be possible to iterate | ||||||||
| Comment by Marko Mäkelä [ 2020-09-24 ] | ||||||||
|
We could ‘manually’ create something similar to std::priority_queue, possibly by making use of std::make_heap, which works with random access iterators. In a perf record trace of the server with some work-in-progress | ||||||||
| Comment by Marko Mäkelä [ 2020-09-25 ] | ||||||||
|
We might solve But, I would still think that we’d better write the pages of temporary tables solely via the LRU eviction mechanism, that is, keep them entirely out of the buf_pool.flush_list or its replacement. So, it might be better to implement | ||||||||
| Comment by Marko Mäkelä [ 2020-10-01 ] | ||||||||
|
It turns out that the buf_pool.flush_list actually is already ORDER BY oldest_modification DESC (most recently dirtied blocks are added first). Blocks are added by mtr_t::commit(), and the LSN would always be incremented between mtr_t::commit(), unless we have several mini-transactions that only modified pages in the temporary tablespace (which per What we can do is the following:
Currently it looks like there is not much that can be done as part of this task. The doubly-linked list should remain an optimal data structure for this purpose. | ||||||||
| Comment by Marko Mäkelä [ 2020-10-28 ] | ||||||||
|
Sorry, I should have done more research before filing this. |