Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21808

Adding batch updates to S3 engine

    XMLWordPrintable

Details

    Description

      In some cases there is a need to update a S3 table. Currently the only way to do it is to first ALTER it to another storage engine, do the changes and then ALTER it back. If the table is very big and the changes are small this can take a long time and use a lot of disk space.

      The purpose of this JIRA entry is to propose a faster way to do it.

      The assumptions for doing this new feature are:

      • The table will be write locked during the update.
      • While the update is done, there are no one else accessing the S3 table, either from the same MariaDB server or from another server.
      • The only supported commands will be write, update and delete.

      As the blocks of S3 are very big (4M is the default) and the underlying Aria blocks are normally 8K, it's not practical to update the big S3 block when an Aria block is flushed from the page cache.

      Instead we propose the following solution:

      • All write locks on the S3 table will be of type WRITE ONLY (Single
        user).
      • When doing the first write, update or delete, we create an on-disk cache for the table.
      • When an updated block is flushed from the page cache, we write it
        to the disk cache. This is also done for the first block of the Aria
        key file that holds system information for the table.
      • We have an internal in memory hash table to remember where each
        block is in the disk cache.
      • On the statement end, or UNLOCK TABLE if LOCK TABLES is used,
        we do:
      • Flush all pages from the page cache to the disk cache.
      • Write out the page information to a temporary file (to protect
        against MariaDB server crashes).
      • Sort blocks on disk cache in page order (separately for index and data file).
      • Loop over all blocks and update the corresponding the S3 block.
      • Update the header block (that holds number of rows, file sizes
        etc).
      • If MariaDB server would crash and any existing disk caches that
        where committed will be applied to the S3 engine as part of
        recovery. All disk caches will then be deleted.

      The benefits of the proposed solution

      • Much faster than the ALTER TABLE approach, if the updates only affects a small part of the table.

      The disadvantages:

      • The S3 table will not be as 'perfectly aligned' as it was when originally created, which may slightly slow down the S3 table. There will also be somewhat more wasted space not completely full b-tree and not completely full row pages. This can always be fixed later by a separate ALTER TABLE of the S3 table if needed.

      Attachments

        Issue Links

          Activity

            People

              monty Michael Widenius
              monty Michael Widenius
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.