In some cases there is a need to update a S3 table. Currently the only way to do it is to first ALTER it to another storage engine, do the changes and then ALTER it back. If the table is very big and the changes are small this can take a long time and use a lot of disk space.
The purpose of this JIRA entry is to propose a faster way to do it.
The assumptions for doing this new feature are:
- The table will be write locked during the update.
- While the update is done, there are no one else accessing the S3 table, either from the same MariaDB server or from another server.
- The only supported commands will be write, update and delete.
As the blocks of S3 are very big (4M is the default) and the underlying Aria blocks are normally 8K, it's not practical to update the big S3 block when an Aria block is flushed from the page cache.
Instead we propose the following solution:
- All write locks on the S3 table will be of type WRITE ONLY (Single
- When doing the first write, update or delete, we create an on-disk cache for the table.
- When an updated block is flushed from the page cache, we write it
to the disk cache. This is also done for the first block of the Aria
key file that holds system information for the table.
- We have an internal in memory hash table to remember where each
block is in the disk cache.
- On the statement end, or UNLOCK TABLE if LOCK TABLES is used,
- Flush all pages from the page cache to the disk cache.
- Write out the page information to a temporary file (to protect
against MariaDB server crashes).
- Sort blocks on disk cache in page order (separately for index and data file).
- Loop over all blocks and update the corresponding the S3 block.
- Update the header block (that holds number of rows, file sizes
- If MariaDB server would crash and any existing disk caches that
where committed will be applied to the S3 engine as part of
recovery. All disk caches will then be deleted.
The benefits of the proposed solution
- Much faster than the ALTER TABLE approach, if the updates only affects a small part of the table.
- The S3 table will not be as 'perfectly aligned' as it was when originally created, which may slightly slow down the S3 table. There will also be somewhat more wasted space not completely full b-tree and not completely full row pages. This can always be fixed later by a separate ALTER TABLE of the S3 table if needed.