[MDEV-31956] SSD based InnoDB buffer pool extension - Jira

Details

Type: Task
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: 12.1
Component/s: Storage Engine - InnoDB
Labels:
- performance

Sprint:
Server 12.1 dev sprint

Description

In one of the practical cloud MariaDB setups, a server node accesses its datadir over the network, but also has a fast local SSD storage for temporary data. The content of such temporary storage is lost when the server container is destroyed.

It could make sense to use this ephemeral fast local storage (SSD) as an extension of the portion of InnoDB buffer pool (DRAM) that caches persistent data pages. This cache would be separate from the persistent storage of data files and ib_logfile0.

Such a local cache would avoid page reads and writes on slow network or HDD storage. On the persistent store, only ib_logfile0 would be written, until a write-back is needed for other reasons.

We can simply treat the combination of the buffer pool and the local disk as one unit. Persistent pages could be freely moved between the local disk and the buffer pool. We must keep track of which pages in this combined "virtual buffer pool" are dirty, that is, will have to be written back to the persistent storage. Any durability guarantees would only apply to the persistent storage.

Attachments

Issue Links

relates to

MDEV-26055 Adaptive flushing is still not getting invoked in 10.5.11

Closed

Activity

Ascending order - Click to sort in descending order

Max Mether added a comment - 2023-02-01 12:47 - edited

Wouldn't the benefit of this local SSD only be to enlarge the size of the buffer pool?

Another idea would be to have the REDO log on local SSD storage for fast writes but the actual data files on remote storage... ?

Max Mether added a comment - 2023-02-01 12:47 - edited Wouldn't the benefit of this local SSD only be to enlarge the size of the buffer pool? Another idea would be to have the REDO log on local SSD storage for fast writes but the actual data files on remote storage... ?

Jags (Inactive) added a comment - 2023-02-01 15:18

maxmether you still cannot treat local SSD as persistent storage in this "cloud" scenario where containers are independently moved or stopped. You need all Log writes to be flushed to the network storage device.

Jags (Inactive) added a comment - 2023-02-01 15:18 maxmether you still cannot treat local SSD as persistent storage in this "cloud" scenario where containers are independently moved or stopped. You need all Log writes to be flushed to the network storage device.

Marko Mäkelä added a comment - 2023-02-01 18:15

Right, it is the write-ahead log that determines what is durable. If we archived all log from the database creation to the current time, the data files could be reconstructed for any past point of time.

Marko Mäkelä added a comment - 2023-02-01 18:15 Right, it is the write-ahead log that determines what is durable. If we archived all log from the database creation to the current time, the data files could be reconstructed for any past point of time.

Max Mether added a comment - 2023-02-01 20:07

My thinking was that by putting the WAL on the SSD we would significantly increase the write throughput (as the log entry of every commit has to be written to disk for the transaction to be durable). Of course if we cannot guarantee that the disk is durable then that of course doesn't work.

Max Mether added a comment - 2023-02-01 20:07 My thinking was that by putting the WAL on the SSD we would significantly increase the write throughput (as the log entry of every commit has to be written to disk for the transaction to be durable). Of course if we cannot guarantee that the disk is durable then that of course doesn't work.

Vladislav Lesin added a comment - 2024-03-15 16:28

High level design.

External buffer pool file.

Fixed size external buffer pool file is created on server start and should be ignored during backup and recovery. The file contains plain buffer pool pages, any information about what external buffer pool page corresponds to what space and page is in-memory. We might also think about external buffer pool file encryption.

Buffer pool integrating

Flushed to external buffer pool file pages are represented with special objects in buffer pool page hash. The objects should have some field that distinguishes it from an in-memory block descriptor near the start. Also either separate LRU list should be used for external buffer pool to evict page from it, or existing LRU should be segmented. When some page is evicted from external buffer pool file because the file is full and the page itself is at the end of LRU list, its description is just removed from page hash, there is no need to change anything in the external buffer pool file.

Storing and loading pages in external buffer pool file

A page is stored in external buffer pool file when it's clean and evicted in page cleaner thread, or when it's dirty and flushed to space. Before reading a page from a space it's checked if external buffer pool file contains it, if yes, the page is read from external buffer pool file, otherwise it's read from data file.

Vladislav Lesin added a comment - 2024-03-15 16:28 High level design. External buffer pool file. Fixed size external buffer pool file is created on server start and should be ignored during backup and recovery. The file contains plain buffer pool pages, any information about what external buffer pool page corresponds to what space and page is in-memory. We might also think about external buffer pool file encryption. Buffer pool integrating Flushed to external buffer pool file pages are represented with special objects in buffer pool page hash. The objects should have some field that distinguishes it from an in-memory block descriptor near the start. Also either separate LRU list should be used for external buffer pool to evict page from it, or existing LRU should be segmented. When some page is evicted from external buffer pool file because the file is full and the page itself is at the end of LRU list, its description is just removed from page hash, there is no need to change anything in the external buffer pool file. Storing and loading pages in external buffer pool file A page is stored in external buffer pool file when it's clean and evicted in page cleaner thread, or when it's dirty and flushed to space. Before reading a page from a space it's checked if external buffer pool file contains it, if yes, the page is read from external buffer pool file, otherwise it's read from data file.

Vladislav Lesin added a comment - 2024-03-15 16:40

Some update. I worked on this task during the last features development sprint. The general question I tried to solve is how to implement pages flushing to external buffer pool file and pages reading from it. As a result - I implemented the variant where write io requests to external buffer pool file are submitted along with write io requests to data files, and the page is unlocked only when both requests are completed. It passed smoke testing, but there are crashes in some mtr tests, which I plan to fix during the next features development sprint. I pushed my work in ES repository.

Vladislav Lesin added a comment - 2024-03-15 16:40 Some update. I worked on this task during the last features development sprint. The general question I tried to solve is how to implement pages flushing to external buffer pool file and pages reading from it. As a result - I implemented the variant where write io requests to external buffer pool file are submitted along with write io requests to data files, and the page is unlocked only when both requests are completed. It passed smoke testing, but there are crashes in some mtr tests, which I plan to fix during the next features development sprint. I pushed my work in ES repository.

People

Assignee:: Vladislav Lesin

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2023-02-01 09:17

Updated:: 5 hours ago

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server