Details
-
Bug
-
Status: In Progress (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.8.6, 12.3.1
-
Related to performance
Description
In buf_flush_ahead(), for the async case, the mutex buf_pool.flush_list_mutex is acquired for updating the atomic lsn target and waking up an indefinitely-waiting page flusher thread.
This creates contention around the said mutex when the lsn age is above the async flushing threshold, and many threads are concurring to notify this to the page-flusher thread.
Since the lsn target buf_flush_async_lsn is atomic, CAS-looping could be used to monotonically update the maximum, then having only CAS-loop winners lock-and-signal, if necessary. This requires also making the "idle-bit" of buf_pool.page_cleaner_status an Atomic_relaxed<bool> variable. Some touchups might be required as well in the waiting-branch in buf_flush_page_cleaner().
I'm already working at a draft. Hopefully, it shall make notification to proceed with async-flushing less contended.
Testing highlighted high% of buf_flush_ahead() waits/spinloops in both MDEV-39341 at high-VU count (>88) and in MDEV-37924 128G/64G UAW tests.
Attachments
Issue Links
- relates to
-
MDEV-35155 Small innodb_log_file_size leads to excessive write amplification
-
- Closed
-
-
MDEV-38069 Heavy contention on buf_pool.flush_list_mutex
-
- Closed
-
-
MDEV-37924 buf_pool.mutex contention under I/O-limited OLTP workload
-
- In Progress
-