Details
-
Task
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Duplicate
Description
A lot of popular software powered by MySQL/MariaDB could benefit from a fairly easy optimization that MariaDB could apply when compiling a SQL query.
Given an indexed DATE/DATETIME/TIMESTAMP column used as a WHERE constraint in a SQL query, if one of a limited set of standard date functions is applied to the column prior to comparing it to a value, MariaDB is no longer able to use the index for the query in question and must resort to slower methods of comparison.
As a concrete example, here's a very standard query from a stock WordPress install:
SELECT wp_posts.* |
FROM wp_posts |
WHERE 1=1 |
AND ( YEAR( wp_posts.post_date ) = 2017 ) |
AND wp_posts.post_name = 'tahoma-vs-verdana' |
AND wp_posts.post_type = 'post' |
ORDER BY wp_posts.post_date DESC |
While all the fields post_date, post_name, post_type, and post_date are indexed, the use of the YEAR function on the indexed post_date column prevents MariaDB from using the index, as it must first apply the specified function to the named column before being able to perform the comparison.
This can be trivially rewritten to the following:
SELECT wp_posts.* |
FROM wp_posts |
WHERE 1=1 |
AND wp_posts.post_date >= DATE("2017-01-01") |
AND wp_posts.post_date < DATE("2018-01-01") |
AND wp_posts.post_name = 'tahoma-vs-verdana' |
AND wp_posts.post_type = 'post' |
ORDER BY wp_posts.post_date DESC |
and will then take full advantage of the index of post_date.
I would like to suggest that MariaDB automatically apply this type of optimization when a basic comparison is made against an indexed DATE/DATETIME/TIMESTAMP column, which should be both fairly straightforward and bring immediately measurable returns in terms of performance and scaling. Eliminating non-indexed lookups for such a large percentage of queries for MariaDB users around the web would really be awesome.
As a simple starting point, this class of optimizations could be restricted to this very simple case:
F(X) = Y where F is in YEAR, MONTH, DATE and Y is a string or int type. In all these cases, the query would be transformed to
F(X) >= START AND F(X) < END where START is the minimum value X' that would produce F(X') = Y and END is similarly the maximum value X'' that would produce F(X'') = Y.
This need not be algorithmically determined, as there are only a finite set of such transforms. To whit, for F = YEAR, START = sprintf("DATE(\"%d-01-01\")", Y), END = sprintf(\"DATE(\"%d-01-01\")", Y + 1)
(Obviously for the optimized constraint to make any sense and return valid results, it can only be applied when the entire set of results falls within a single, strongly defined date range. i.e. DAY can only be optimized if both MONTH and YEAR are also specified.)
Attachments
Issue Links
- relates to
-
MDEV-8320 Allow index usage for DATE(datetime_column) = const
- Closed