[MDEV-9983] Automatic time-based partitioning Created: 2016-04-25  Updated: 2016-04-26

Status: Open
Project: MariaDB Server
Component/s: Partitioning
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Anders Karlsson Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: None


 Description   

The most-common use-case for Partitioning is to allow data to "expire" using a time-expression on the partitioning expression based on some field in the table, mostly a field with some time-based information on it but sometimes this is a sequence number or similar.

The feature here is based on this notion, but is adds the feature that the data / time or partitioning expression being used doesn't have to be stored in the database itself. This does have some distinct advantages, but it also has disadvantages, but it is a useful future for the future.

The idea is that every time an INSERT into such a partitioned table is executed, the partitioning expression is determined outside the actual data being inserted. An example is the current date and time. For SELECTs the same applies, only the partitions matching the current date and time would be evaluated.

The advantage of the above is that the partitioning data doesn't have to be stored in the table data. To allow data to be processed in a partition not matching the current date and time, the FROM <table> PARTITION <partition> format.

In addition, I think that there is a need to set a session variable that contains the key to use instead of the current date and time, if set. Let's call the session variable PARTITION_KEY for now, and this really should affect both SELECT as well as INSERT/SELECT/UPDATE.

I would suggest this order of precedence:
1) if a FROM .. PARTITION clause is present, use this.
2) If the PARTITION_KEY session variable is set, then use this.
3) Else, use the current date/time

I can think of cases where we one would want to use something else that a time-based value for partitioning, but I leave this to the implementation.

An additional feature here would be to have MariaDB create partitions as needed, and also possibly drop outdated partitions by specifying how many "active" partitions one would want to keep.



 Comments   
Comment by Sergei Golubchik [ 2016-04-25 ]

could you show an example, please? For example, what an sql script using this feature might look like?

Comment by Anders Karlsson [ 2016-04-26 ]

Let me give a concrete example of what this could look like, first we create a table which will reuse partitions every week.

CREATE TABLE orders(order_id INT NOT NULL PRIMARY KEY,
    customer_id INT NOT NULL,
    order_value NUMBER(10,2),
    PRIMARY KEY(order_id))
    PARTITION BY AUTO LIST(WEEKDAY(CURRENT_DATE))(
    PARTITION p0 VALUES IN (0),
    PARTITION p1 VALUES IN (1),
    PARTITION p2 VALUES IN (2),
    PARTITION p3 VALUES IN (3),
    PARTITION p4 VALUES IN (4),
    PARTITION p5 VALUES IN (5),
    PARTITION p6 VALUES IN (6));

Now. let's assume this is on a wednesday, this this insert will end up in the p2 partition:

INSERT INTO orders VALUES(1, 42),(2, 57),(3, 18);

Then we can try to SELECT some data, selecting from the "current" partition, assuming that PARTITION_KEY is not set.

SELECT * FROM orders WHERE order_id = 1;

To SELECT from all partitions you would execute:

SELECT * FROM orders PARTITIONS ALL WHERE customer_id = 42;

Which of the above is the default (i.e. SELECT from all or only the "active partition" is a matter of taste. For my use case the above would be the best).
To SELECT data from a specific partition, one could still use:

SELECT * FROM orders PARTITION p1 WHERE customer_id = 57;

I would also think that this construct would be useful

SELECT * FROM orders PARTITION ALL WHERE PARTITION_KEY = WEEKDAY(CURRENT_DATE())

Here we have a pseudo-column, PARTITION_KEY that returns the value of a specific partition. I guess this SQL construct could be different and in line with FROM ... PARTITION, but I leave that to the implementation.

To get rid of "old" data we would then run, early wednesday:

ALTER TABLE orders TRUNCATE PARTITION p3;

As can be seen from the above, this has the advantage that I don't have to store the partitioning value. On the other hand, there is the issue with the PRIMARY KEY, but when you think about it is less of an issue, all one has to remember is that the "real" primary key is, in the example above, (order_id, PARTITION_KEY),

This covers one of the most common use-cases for partitioning. One feature that might be added is to be able to tell which partitions are "active" be default, and this could be either in the table itself by extending the partitioning definition (but the semantics here could be difficult) or use a session variable:

SET ACTIVE_PARTITIONS = 'p1, p2';
SELECT * FROM orders WHERE customer_id = 42;

Where the above would then look in the tuesday and wednesday partitions.

Generated at Thu Feb 08 07:38:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.