[MDEV-18724] Replace buf_block_t::mutex with more std::atomic - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2.2, 10.3.0, 10.4.0
Fix Version/s: 10.5.4
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

InnoDB uses a combination of buffer-fixing and I/O fixing for buffer pool blocks. The buf_page_t::buf_fix_count is used in addition to buf_block_t::lock for user threads that are accessing pages in the buffer pool. The buf_page_t::io_fix was originally used for I/O operations, pinning the block during a read or write.

In MySQL 5.6.4, the fix of Bug#11759044 - 51325: DROPPING AN EMPTY INNODB TABLE TAKES A LONG TIME WITH LARGE BUFFER POOL introduced another pseudo-I/O-fix state BUF_IO_PIN. This state seems to be redundant; we could increment and decrement buf_fix_count instead.

Since MariaDB 10.2, the buf_page_t::buf_fix_count is always protected by a combination of atomic memory operations and buf_pool->mutex, while buf_page_t::io_fix uses a combination of buf_pool->mutex and the block mutex.

If we can solely use buf_fix_count instead of io_fix = BUF_IO_PIN, then we could remove a few operations on the block mutex. Furthermore, if we increment or decrement buf_fix_count synchronized to setting or unsetting io_fix, the function buf_page_can_relocate() could be simplified to an atomic read of buf_fix_count, and we could invoke it without holding the block mutex.

One source of flush_list relocation is buf_flush_relocate_on_flush_list(). It does not seem to be a problem. Before buf_page_get_gen() is allocating an uncompressed page for a compressed-only ROW_FORMAT=COMPRESSED block, it is checking that nobody else has buffer-fixed the block. Other calls are guarded by buf_page_can_relocate().

inaamrana, do you remember why we replaced a combination of BUF_IO_READ and buf_fix_count with the BUF_IO_PIN state? As far as I understand, the purpose was to prevent the block from being moved or added or removed on the flush_list.

It seems that while buf_flush_or_remove_pages() invokes buf_flush_try_yield(), the expectation is that buf_page_io_complete() (and buf_flush_write_complete()) cannot be invoked for the block. What actually guarantees this?

It looks like buf_LRU_flush_or_remove_pages() is covered by exclusive MDL, so there cannot be multiple concurrent calls with the same tablespace ID. The only potential race would seem to be with buf_page_io_complete() or possibly with FLUSH TABLES…FOR EXPORT. I assume that MDL prevents FLUSH TABLES…FOR EXPORT from executing concurrently with any DDL, but maybe concurrent executions of multiple FLUSH TABLES…FOR EXPORT for the same table are allowed. (This needs to be tested.)

Attachments

Issue Links

blocks

MDEV-27058 Buffer page descriptors are too large

Closed

duplicates

MDEV-15053 Reduce buf_pool_t::mutex contention

Closed

is blocked by

MDEV-15528 Avoid writing freed InnoDB pages

Closed

MDEV-19514 Defer change buffer merge until pages are requested

Closed

relates to

MDEV-10813 Clean-up InnoDB atomics, memory barriers and mutexes

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2019-02-24 21:27

Updated:: 2022-09-20 08:29

Resolved:: 2020-06-05 09:41

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server