[MDEV-21808] Adding batch updates to S3 engine Created: 2020-02-24  Updated: 2020-06-08

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - S3
Fix Version/s: None

Type: Task Priority: Major
Reporter: Michael Widenius Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-17841 S3 Storage engine Closed

 Description   

In some cases there is a need to update a S3 table. Currently the only way to do it is to first ALTER it to another storage engine, do the changes and then ALTER it back. If the table is very big and the changes are small this can take a long time and use a lot of disk space.

The purpose of this JIRA entry is to propose a faster way to do it.

The assumptions for doing this new feature are:

  • The table will be write locked during the update.
  • While the update is done, there are no one else accessing the S3 table, either from the same MariaDB server or from another server.
  • The only supported commands will be write, update and delete.

As the blocks of S3 are very big (4M is the default) and the underlying Aria blocks are normally 8K, it's not practical to update the big S3 block when an Aria block is flushed from the page cache.

Instead we propose the following solution:

  • All write locks on the S3 table will be of type WRITE ONLY (Single
    user).
  • When doing the first write, update or delete, we create an on-disk cache for the table.
  • When an updated block is flushed from the page cache, we write it
    to the disk cache. This is also done for the first block of the Aria
    key file that holds system information for the table.
  • We have an internal in memory hash table to remember where each
    block is in the disk cache.
  • On the statement end, or UNLOCK TABLE if LOCK TABLES is used,
    we do:
  • Flush all pages from the page cache to the disk cache.
  • Write out the page information to a temporary file (to protect
    against MariaDB server crashes).
  • Sort blocks on disk cache in page order (separately for index and data file).
  • Loop over all blocks and update the corresponding the S3 block.
  • Update the header block (that holds number of rows, file sizes
    etc).
  • If MariaDB server would crash and any existing disk caches that
    where committed will be applied to the S3 engine as part of
    recovery. All disk caches will then be deleted.

The benefits of the proposed solution

  • Much faster than the ALTER TABLE approach, if the updates only affects a small part of the table.

The disadvantages:

  • The S3 table will not be as 'perfectly aligned' as it was when originally created, which may slightly slow down the S3 table. There will also be somewhat more wasted space not completely full b-tree and not completely full row pages. This can always be fixed later by a separate ALTER TABLE of the S3 table if needed.

Generated at Thu Feb 08 09:09:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.