Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17841

S3 Storage engine




      The task is to create a read-only storage engine that will allow one
      to archived MariaDB tables in Amazon S3, or any third-party public or
      private cloud that implements S3 API (of which there are many), but still
      have them accessible for reading in MariaDB.

      The typical use case would be that there exists tables that after some
      time would become fairly inactive, but not so inactive that they could
      be removed. In that case an option is to move such a table to an
      archiving service, which is accessible through an S3 API.

      Notice that S3 means the Cloud Object Storage API defined by Amazon
      AWS. Often the whole of Amazon’s Cloud Object Storage is referred to
      as S3. In the context of the S3 archive storage engine, it refers to
      the API itself that defines how to store objects in a cloud service
      being it Amazon’s or someone else’s. OpenStack for example provides an
      S3 API for storing objects.

      The main benefit of storing things in a S3 compatible storage is that the
      cost of storage is much cheaper than many other alternatives. Many S3
      implementations also provides very reliable long time storage.

      It should be trivial to move an existing table to be stored in S3 (and
      read only) and move it back to local storage.

      The current idea is to use MyISAM or Aria as a base for this storage
      engine, as these are ideally suitable for read-only. Having MyISAM or
      Aria as a base allows the S3 storage engine to support all MariaDB
      types, indexes etc.

      The suggested way to move a table to S3 would be something like:

      (USER 'aws id string', PASSWORD 'aws secret string');

      CONNECTION="S3_SERVER" BUCKET="test.old_table" BUCKET_SIZE=4*1024*1024;

      The ALTER TABLE will first create a local table in the normal MySQL/Aria
      format and then move both index and data to S3 in buckets of BUCKET_SIZE.
      The .frm file will stay on the main server. We will also write an
      external C program that can be used to copy any MyISAM/Aria table to
      S3. Same code will also be used in the server for doing the copy.

      One advantage of this approach is that we can create all indexes one by one
      with sorting and thus get an optimal layout on storage for key accesses.

      If one wants to change the table to a "normal writable table" one can
      do that with another ALTER TABLE;

      ALTER TABLE test.old_table ROW_FORMAT=page;
      ALTER TABLE test.old_table ENGINE=InnoDB;

      One of the properties of many S3 implementations is that it favors big reads.
      It's said that 4M gives the best performance.
      Write performance for S3 is not great, but in this project that is not a

      What makes MyISAM and Aria good candidates for having S3 support is that both
      have their own extendable 'storage modules' (a bit like storage engines
      in MariaDB). It's relatively easy to add a new format to either of them.

      One of the first tasks is decide which one to choose as a base for the new
      storage module.
      There is no planned changes in the storage engine API for this.



      • Supports different concurrent block sizes for the key cache
      • Has support for MERGE tables, which can be very useful with
        with monthly archived data.
      • Has a segmented key cache which gives better concurrent multi-user support


      • No cache for rows (must be developed if we want to add S3 support)
      • Aria gets more development resources (Aria is basically a better version
        of MyISAM with more storage modules and optionally crash-safe).


      • Has row cache for ROW_FORMAT=block and we could re-use that for S3


      • No support for different concurrent block sizes in the Aria page cache
        (we need to support both 8K and 4M in the same cache).
      • May be solved by having a different cache for S3, but in that case
        all S3 tables in one installation must have the same block size.
      • Other option would be to add an upper layer function that would split
        a read block into multiple smaller blocks.
      • No support for MERGE tables
      • This something we should have added a long time ago and not that hard
        to do based on the original MERGE table code.
      • No segmented page cache
      • Would be a nice to have but not critical in a read only scenario and can
        be added later

      Currently I am slight in favour of using Aria as a base as the Aria
      code base is newer. There will be a little more work for adding the
      missing pieces but nothing major.

      S3 Interface
      We are looking at using the AWS interface to S3 from

      Example code to use from:
      Buckets for a table will be stored as

      Special characters in the database and table will be converted to hex.
      'base' we will get from SERVER object.

      The main API to use are:
      s3_client.CreateBucket(request); # For insert
      s3_client.GetObject(object_request); # For read
      s3_client.DeleteBucket(bucket_request); # For drop

      Another option is to use http request directly, but then we would need
      to get or create a library for simplify the interface.

      The code for the above C++ objects can be found at
      aws-cpp-sdk-s3/source/S3Client.cpp in the above git repository.

      Other things

      • Internally we would store the b-tree and data in 8K pages, as this minimizes
        scan for index entries. This also allows us to get better cache allocation
        as we can flush out 8K pieces that are not accessed.
      • Optionally we could compress the table with myisampack/aria_pack before we
        store it in A3. The benefit of this is that we will take 30% less space and
        we get more performance as we get more rows read with each S3 read request.
      • In theory we could also support one to create a 'S3 table' and then
        do one bulk load to it before it's moved to S3 and read only.


        Issue Links



              monty Michael Widenius
              monty Michael Widenius
              0 Vote for this issue
              15 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.