[MDEV-9983] Automatic time-based partitioning Created: 2016-04-25 Updated: 2016-04-26 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Partitioning |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor |
| Reporter: | Anders Karlsson | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | None | ||
| Description |
|
The most-common use-case for Partitioning is to allow data to "expire" using a time-expression on the partitioning expression based on some field in the table, mostly a field with some time-based information on it but sometimes this is a sequence number or similar. The feature here is based on this notion, but is adds the feature that the data / time or partitioning expression being used doesn't have to be stored in the database itself. This does have some distinct advantages, but it also has disadvantages, but it is a useful future for the future. The idea is that every time an INSERT into such a partitioned table is executed, the partitioning expression is determined outside the actual data being inserted. An example is the current date and time. For SELECTs the same applies, only the partitions matching the current date and time would be evaluated. The advantage of the above is that the partitioning data doesn't have to be stored in the table data. To allow data to be processed in a partition not matching the current date and time, the FROM <table> PARTITION <partition> format. In addition, I think that there is a need to set a session variable that contains the key to use instead of the current date and time, if set. Let's call the session variable PARTITION_KEY for now, and this really should affect both SELECT as well as INSERT/SELECT/UPDATE. I would suggest this order of precedence: I can think of cases where we one would want to use something else that a time-based value for partitioning, but I leave this to the implementation. An additional feature here would be to have MariaDB create partitions as needed, and also possibly drop outdated partitions by specifying how many "active" partitions one would want to keep. |
| Comments |
| Comment by Sergei Golubchik [ 2016-04-25 ] | ||||||||||||||||||||
|
could you show an example, please? For example, what an sql script using this feature might look like? | ||||||||||||||||||||
| Comment by Anders Karlsson [ 2016-04-26 ] | ||||||||||||||||||||
|
Let me give a concrete example of what this could look like, first we create a table which will reuse partitions every week.
Now. let's assume this is on a wednesday, this this insert will end up in the p2 partition:
Then we can try to SELECT some data, selecting from the "current" partition, assuming that PARTITION_KEY is not set.
To SELECT from all partitions you would execute:
Which of the above is the default (i.e. SELECT from all or only the "active partition" is a matter of taste. For my use case the above would be the best).
I would also think that this construct would be useful
Here we have a pseudo-column, PARTITION_KEY that returns the value of a specific partition. I guess this SQL construct could be different and in line with FROM ... PARTITION, but I leave that to the implementation. To get rid of "old" data we would then run, early wednesday:
As can be seen from the above, this has the advantage that I don't have to store the partitioning value. On the other hand, there is the issue with the PRIMARY KEY, but when you think about it is less of an issue, all one has to remember is that the "real" primary key is, in the example above, (order_id, PARTITION_KEY), This covers one of the most common use-cases for partitioning. One feature that might be added is to be able to tell which partitions are "active" be default, and this could be either in the table itself by extending the partitioning definition (but the semantics here could be difficult) or use a session variable:
Where the above would then look in the tuesday and wednesday partitions. |