Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Goal:
The purpose of this version of MariaDB Exa is to allow customers to have heterogenous data on some tables. This means that historical data can be removed from the InnoDB tables on the MariaDB side while it still remains intact on the Exasol side. In order to deal and properly manage this discrepancy in data MaxScale as the central has to be aware of this so that the queries can be executed on the right side.
Proposal: Use Partitioning to manage historical data
MariaDB already has a pretty advanced set of partitioning features. This makes it somewhat easy to remove historical data by just deleting whole partitions (ALTER TABLE … DROP PARTITION). Partitioning further has the benefit of being part of the table definition and thus easily viewable by MaxScale or other tools. Furthermore our current CDC solution ignores DDL changes to PARITIONING (all DDL changes at the moment) and changes to the CDC would thus not be required.
Details on the MariaDB Server side:
1. The base table is partitioned by a DATETIME column in some set of chunks (could be months or days depending on table size, insertion rate etc).
2. At a regular interval (say once / month or once / day) these partitions are re-organized so that the oldest partition is removed (with ALTER TABLE … DROP PARTITION) and newer partitions are created (with ALTER TABLE … REORGANIZE PARTITIONS) etc.
3. If LIST partitioning is used this is fine by itself, if RANGE partitioning is used there should also be a constraint that refuses writes in the range of the deleted partitions.
2. Should be handled by a script (or stored procedure) that is launched on a regular basis.
Details on the MaxScale side:
Each time the script is changing the partitioning scheme of the table MaxScale should be made aware of these changes so that SELECT queries potentially touching ranges outside the partitioning RANGE of the table should always only go to Exasol. This can be done by:
1) The script/procedure directly
2) MaxScale looking at the table definition
3) MaxScale looking at the INFORMATION_SCHEMA tables
Is MaxScale able to catch such queries ?
MaxScale can regularly check information_schema to become aware of partitions, but probably better to provide a way for tri