Status: Closed (View Workflow)
The task is to create a read-only storage engine that will allow one
to archived MariaDB tables in Amazon S3, or any third-party public or
private cloud that implements S3 API (of which there are many), but still
have them accessible for reading in MariaDB.
The typical use case would be that there exists tables that after some
time would become fairly inactive, but not so inactive that they could
be removed. In that case an option is to move such a table to an
archiving service, which is accessible through an S3 API.
Notice that S3 means the Cloud Object Storage API defined by Amazon
AWS. Often the whole of Amazon’s Cloud Object Storage is referred to
as S3. In the context of the S3 archive storage engine, it refers to
the API itself that defines how to store objects in a cloud service
being it Amazon’s or someone else’s. OpenStack for example provides an
S3 API for storing objects.
The main benefit of storing things in a S3 compatible storage is that the
cost of storage is much cheaper than many other alternatives. Many S3
implementations also provides very reliable long time storage.
It should be trivial to move an existing table to be stored in S3 (and
read only) and move it back to local storage.
The current idea is to use MyISAM or Aria as a base for this storage
engine, as these are ideally suitable for read-only. Having MyISAM or
Aria as a base allows the S3 storage engine to support all MariaDB
types, indexes etc.
The suggested way to move a table to S3 would be something like:
CREATE SERVER 'S3_SERVER' FOREIGN DATA WRAPPER 'AWS' OPTIONS
(USER 'aws id string', PASSWORD 'aws secret string');
ALTER TABLE test.old_table STORAGE_ENGINE=MyISAM ROW_FORMAT=S3
CONNECTION="S3_SERVER" BUCKET="test.old_table" BUCKET_SIZE=4*1024*1024;
The ALTER TABLE will first create a local table in the normal MySQL/Aria
format and then move both index and data to S3 in buckets of BUCKET_SIZE.
The .frm file will stay on the main server. We will also write an
external C program that can be used to copy any MyISAM/Aria table to
S3. Same code will also be used in the server for doing the copy.
One advantage of this approach is that we can create all indexes one by one
with sorting and thus get an optimal layout on storage for key accesses.
If one wants to change the table to a "normal writable table" one can
do that with another ALTER TABLE;
ALTER TABLE test.old_table ROW_FORMAT=page;
ALTER TABLE test.old_table ENGINE=InnoDB;
One of the properties of many S3 implementations is that it favors big reads.
It's said that 4M gives the best performance.
Write performance for S3 is not great, but in this project that is not a
What makes MyISAM and Aria good candidates for having S3 support is that both
have their own extendable 'storage modules' (a bit like storage engines
in MariaDB). It's relatively easy to add a new format to either of them.
One of the first tasks is decide which one to choose as a base for the new
There is no planned changes in the storage engine API for this.
- Supports different concurrent block sizes for the key cache
- Has support for MERGE tables, which can be very useful with
with monthly archived data.
- Has a segmented key cache which gives better concurrent multi-user support
- No cache for rows (must be developed if we want to add S3 support)
- Aria gets more development resources (Aria is basically a better version
of MyISAM with more storage modules and optionally crash-safe).
- Has row cache for ROW_FORMAT=block and we could re-use that for S3
- No support for different concurrent block sizes in the Aria page cache
(we need to support both 8K and 4M in the same cache).
- May be solved by having a different cache for S3, but in that case
all S3 tables in one installation must have the same block size.
- Other option would be to add an upper layer function that would split
a read block into multiple smaller blocks.
- No support for MERGE tables
- This something we should have added a long time ago and not that hard
to do based on the original MERGE table code.
- No segmented page cache
- Would be a nice to have but not critical in a read only scenario and can
be added later
Currently I am slight in favour of using Aria as a base as the Aria
code base is newer. There will be a little more work for adding the
missing pieces but nothing major.
We are looking at using the AWS interface to S3 from
Example code to use from:
Buckets for a table will be stored as
Special characters in the database and table will be converted to hex.
'base' we will get from SERVER object.
The main API to use are:
s3_client.CreateBucket(request); # For insert
s3_client.GetObject(object_request); # For read
s3_client.DeleteBucket(bucket_request); # For drop
Another option is to use http request directly, but then we would need
to get or create a library for simplify the interface.
The code for the above C++ objects can be found at
aws-cpp-sdk-s3/source/S3Client.cpp in the above git repository.
- Internally we would store the b-tree and data in 8K pages, as this minimizes
scan for index entries. This also allows us to get better cache allocation
as we can flush out 8K pieces that are not accessed.
- Optionally we could compress the table with myisampack/aria_pack before we
store it in A3. The benefit of this is that we will take 30% less space and
we get more performance as we get more rows read with each S3 read request.
- In theory we could also support one to create a 'S3 table' and then
do one bulk load to it before it's moved to S3 and read only.
MDEV-23362 Undefined symbol "uncompress" in s3_get_object()
MDEV-19867 fix mysqldump to by default not copy S3 tables
MDEV-19963 Add MERGE support for S3 (and Aria)
MDEV-19964 S3 replication support
MDEV-19965 Add cache for S3 connections
- relates to
MDEV-20279 Increase Aria index length limit
MDEV-21808 Adding batch updates to S3 engine
MDEV-22606 Include the S3 storage engine in MariaDB Community Server binary packages
MDEV-22980 S3 storage engine fails to load due to plugin_maturity
MDEV-20302 Assertion `page_st != 2' failed in read_big_block or server hangs upon concurrent SELECT from partitioned S3 table
MDEV-20306 Assertion `!(end_of_data > info->scan.dir_end)' failed in _ma_scan_block_record upon converting table from S3 to Aria with encryption enabled
- In Progress
MDEV-22088 S3 partitioning support
MDEV-22925 ALTER TABLE s3_table ENGINE=Aria can cause failure on slave
MDEV-23377 libmarias3 doesn't compile in OpenBSD