Details
-
Task
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.2.2-2, 10.2.2-3, 10.2.2-1, 10.2.2-2, 10.2.2-3, 10.2.2-4, 10.2.3-1, 10.2.3-2
Description
Would it be possible to make DATE() on datetime column sargable in some cases?
Like rewrite "DATE(col) = const" to "col BETWEEN concat(const, ' 00:00:00') AND concat(const, ' 23:59:59')"
Other similar cases:
- "YEAR(col) = const" is almost the same
- "YEAR(col) = c1 AND MONTH(col) = c2" and other such combinations may be too complex to be worth it.
Attachments
Issue Links
- causes
-
MDEV-30901 Index usage for DATE(datetime_column) = const does not work for engine Memory
-
- Closed
-
-
MDEV-30913 Index usage for DATE(datetime_column) = const does not work for engine Connect
-
- Closed
-
-
MDEV-30946 Index usage for DATE(datetime_column) = const does not work for DELETE and UPDATE
-
- Closed
-
-
MDEV-33299 Assertion `(tm->tv_usec % (int) log_10_int[6 - dec]) == 0' failed in void my_timestamp_to_binary(const timeval*, uchar*, uint)
-
- Closed
-
- is duplicated by
-
MDEV-8937 Optimize date functions in the parser
-
- Closed
-
- relates to
-
MDEV-30930 Make more date conditions sargable
-
- Open
-
-
MDEV-14635 Convert date operations on indexed column to sargable equivalents where possible
-
- Closed
-
Activity
I'd look first at add_key_fields(). But perhaps I would've been wrong
add_key_fields() is an auxiliary function that used for constructing ref access plans (here, ref includes ref_or_null, or eq_ref). ref access plans are based on equalities (or other conditions that can be reduced to equalities), such that the records can be read with handler->index_read()/handler->index_next_same() calls.
YEAR(date_col) = const specifies a range. So, the right way would be to use range optimizer to handle this.
As for using range optimizer... I'll debug a similar query and point places where changes are be made.
Let the query be:
explain select * from t100 where a between '2015-10-10' and '2015-10-12';
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
| 1 | SIMPLE | t100 | range | a | a | 4 | NULL | 3 | Using index condition |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
The first place is:
#0 add_key_field (join=0x7fff5c0065a0, key_fields=0x7ffff7eb5b18, and_level=0, cond=0x7fff5c005cf8, field=0x7fff5c010a90, eq_func=false, value=0x7fff5c005e48, num_values=2, usable_tables=18446744073709551615, sargables=0x7ffff7eb5c88) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4482
|
#1 0x0000555555ab03be in add_key_equal_fields (join=0x7fff5c0065a0, key_fields=0x7ffff7eb5b18, and_level=0, cond=0x7fff5c005cf8, field_item=0x7fff5c005aa8, eq_func=false, val=0x7fff5c005e48, num_values=2, usable_tables=18446744073709551615, sargables=0x7ffff7eb5c88) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4568
|
#2 0x0000555555ab0a58 in Item_func_between::add_key_fields (this=0x7fff5c005cf8, join=0x7fff5c0065a0, key_fields=0x7ffff7eb5b18, and_level=0x7ffff7eb5b50, usable_tables=18446744073709551615, sargables=0x7ffff7eb5c88) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4734
|
#3 0x0000555555ab2524 in update_ref_and_keys (thd=0x55555aad3700, keyuse=0x7fff5c0068a0, join_tab=0x7fff5c006e18, tables=1, cond=0x7fff5c005cf8, normal_tables=18446744073709551615, select_lex=0x55555aad77b0, sargables=0x7ffff7eb5c88) at /home/psergey/dev-git/10.1/sql/sql_select.cc:5302
|
#4 0x0000555555aad6a1 in make_join_statistics (join=0x7fff5c0065a0, tables_list=..., keyuse_array=0x7fff5c0068a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3615
|
#5 0x0000555555aa5836 in JOIN::optimize_inner (this=0x7fff5c0065a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1341
|
#6 0x0000555555aa478c in JOIN::optimize (this=0x7fff5c0065a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1023
|
This is the preparation stage. We collect in TABLE::const_keys a set of indexes for which range optimization is probably useful. Note the frame with Item_func_between::add_key_fields, I guess Item_func_eq will need a similar function that will check if it is comparing a YEAR(key_col) with a constant
Construction of range objects happens here:
#0 SEL_ARG::SEL_ARG (this=0x7fff5c014a60, f=0x7fff5c010a90, min_value_arg=0x7fff5c014a58 "", max_value_arg=0x7fff5c014a58 "") at /home/psergey/dev-git/10.1/sql/opt_range.cc:2479
|
#1 0x0000555555988784 in get_mm_leaf (param=0x7ffff7eb3410, conf_func=0x7fff5c005cf8, field=0x7fff5c010a90, key_part=0x7fff5c0147b8, type=Item_func::GE_FUNC, value=0x7fff5c005bb8) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8598
|
#2 0x00005555559878c9 in get_mm_parts (param=0x7ffff7eb3410, cond_func=0x7fff5c005cf8, field=0x7fff5c010a90, type=Item_func::GE_FUNC, value=0x7fff5c005bb8, cmp_type=TIME_RESULT) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8280
|
#3 0x0000555555985b0b in get_func_mm_tree (param=0x7ffff7eb3410, cond_func=0x7fff5c005cf8, field=0x7fff5c010a90, value=0x0, cmp_type=TIME_RESULT, inv=false) at /home/psergey/dev-git/10.1/sql/opt_range.cc:7604
|
#4 0x000055555598652d in get_full_func_mm_tree (param=0x7ffff7eb3410, cond_func=0x7fff5c005cf8, field_item=0x7fff5c005aa8, value=0x0, inv=false) at /home/psergey/dev-git/10.1/sql/opt_range.cc:7931
|
#5 0x0000555555986de0 in Item_func_between::get_mm_tree (this=0x7fff5c005cf8, param=0x7ffff7eb3410, cond_ptr=0x7fff5c007eb0) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8117
|
#6 0x000055555597c083 in SQL_SELECT::test_quick_select (this=0x7fff5c007ea8, thd=0x55555aad3700, keys_to_use=..., prev_tables=0, limit=18446744073709551615, force_quick_range=false, ordered_output=false, remove_false_parts_of_where=true) at /home/psergey/dev-git/10.1/sql/opt_range.cc:3133
|
#7 0x0000555555aac9a4 in get_quick_record_count (thd=0x55555aad3700, select=0x7fff5c007ea8, table=0x7fff5c00f640, keys=0x7fff5c006f98, limit=18446744073709551615, in_outer_join=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3362
|
#8 0x0000555555aaed53 in make_join_statistics (join=0x7fff5c0065a0, tables_list=..., keyuse_array=0x7fff5c0068a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4004
|
#9 0x0000555555aa5836 in JOIN::optimize_inner (this=0x7fff5c0065a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1341
|
#10 0x0000555555aa478c in JOIN::optimize (this=0x7fff5c0065a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1023
|
Once SEL_ARG objects are there, the rest of things (getting records_in_range() estimates, handling of AND/OR, multi-component keys) will happen automatically. The tricks is to create the right SEL_ARGs.
Looking at where SEL_ARG object is made for a query with equality:
explain select * from t100 where a= '2015-10-10' ;
|
#0 SEL_ARG::SEL_ARG (this=0x7fff5c015cb0, f=0x7fff5c010a90, min_value_arg=0x7fff5c015ca8 "", max_value_arg=0x7fff5c015ca8 "") at /home/psergey/dev-git/10.1/sql/opt_range.cc:2479
|
#1 0x0000555555988784 in get_mm_leaf (param=0x7ffff7eb3410, conf_func=0x7fff5c006d58, field=0x7fff5c010a90, key_part=0x7fff5c015a08, type=Item_func::EQ_FUNC, value=0x7fff5c005b88) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8598
|
#2 0x00005555559878c9 in get_mm_parts (param=0x7ffff7eb3410, cond_func=0x7fff5c006d58, field=0x7fff5c010a90, type=Item_func::EQ_FUNC, value=0x7fff5c005b88, cmp_type=TIME_RESULT) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8280
|
#3 0x00005555559872b3 in Item_equal::get_mm_tree (this=0x7fff5c006d58, param=0x7ffff7eb3410, cond_ptr=0x7fff5c008078) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8194
|
#4 0x000055555597c083 in SQL_SELECT::test_quick_select (this=0x7fff5c008070, thd=0x55555aad3700, keys_to_use=..., prev_tables=0, limit=18446744073709551615, force_quick_range=false, ordered_output=false, remove_false_parts_of_where=true) at /home/psergey/dev-git/10.1/sql/opt_range.cc:3133
|
#5 0x0000555555aac9a4 in get_quick_record_count (thd=0x55555aad3700, select=0x7fff5c008070, table=0x7fff5c00f640, keys=0x7fff5c007128, limit=18446744073709551615, in_outer_join=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3362
|
#6 0x0000555555aaed53 in make_join_statistics (join=0x7fff5c0064e0, tables_list=..., keyuse_array=0x7fff5c0067e0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4004
|
#7 0x0000555555aa5836 in JOIN::optimize_inner (this=0x7fff5c0064e0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1341
|
#8 0x0000555555aa478c in JOIN::optimize (this=0x7fff5c0064e0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1023
|
It looks like get_mm_leaf is the function which should have the logic that figures that
YEAR(key_col) = C1 --> " concat(C1, '-01-01') <= key_col <= concat(C1, '-12-31') "
|
danblack, hopefully these are enough? I hope I didn't miss anything.
Thank you all for hints. I tried to implement this in https://github.com/jkavalik/server/commit/2a66fa9bde198bca78625210f632112586c818e7
No idea if it is the right way to do such a thing but it seems to work as I imagined. I did not yet manage to run complete test suite (had to put my laptop to sleep and it then crashed on timeout..) but I ran some (innodb, select, explain, type_date, type_datetime directly and then some 1200 from the entire suite before the sleep) without troubles.
Commented on the patch in github. Summary: found a number of issues which need to be resolved before this can be pushed.
Minor addition, when these optimisations occur it would be good to see an addition text in the 'Extra' of the explain output like '
{funct}(colref) optimised to range;'
Saw a customer case with conditions like this:
DATE(table.datetime_col) BETWEEN '2015-08-12' AND '2015-08-12'
|
|
DATE(table.date_col) BETWEEN '2015-08-12' AND '2015-08-12'
|
The columns have types DATE and DATETIME.
Saw another case where the real-world queries are using conditions in form:
DATE(date_col) = '2016-06-23'
|
Sometimes these conditions are OR-ed together. The comparison is always with a constant.
It could be that date_col is actually a DATETIME.
The patch v0. Makes DATE(date_col) <>= CONST_DATE sargable.
http://lists.askmonty.org/pipermail/commits/2016-September/009874.html
When this is done, please don't forget to update this kb article: https://mariadb.com/kb/en/mariadb/building-the-best-index-for-a-given-select/
This issue is in review for a while now, any status update?
Thanks!
So the patch works when the WHERE clause has just the condition we are
targeting:
explain select * from t1 where date(a) < '2017-01-10' ;
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
| 1 | SIMPLE | t1 | range | a | a | 6 | NULL | 90 | Using index condition |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
but doesn't work with equality propagation:
explain select * from t1 where date(a1) < '2017-01-10' and a1=a;
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10122 | Using where |
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
Actually, even irrelevant equalities make the patch not to work. Let's add
"b=b1" to the first query and we get:
explain select * from t1 where date(a) < '2017-01-10' and b=b1;
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10122 | Using where |
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
Another thing: the patch works only if DATE(col) is on the left side of the equality:
explain select * from t1 where date(a) = '2017-01-10';
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
| 1 | SIMPLE | t1 | range | a | a | 6 | NULL | 10 | Using index condition |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
explain select * from t1 where '2017-01-10'=date(a);
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10122 | Using where |
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|
Elaborating on Monty's "do a rewrite at fix_fields" proposal.
Let's consider YEAR(a), where a is a DATETIME column.
introduce functions:
YEAR_START(datetime_col) returns CONCAT(YEAR(datetime_col), '-01-01 00:00:00')
|
YEAR_END(datetime_col) returns CONCAT(YEAR(datetime_col), '-31-12 23:59:59 ...')
|
(add more precision if necessary)
then
YEAR(col) <= val is the same as col <= YEAR_END(val)
|
YEAR(col) < val is the same as col < YEAR_START(val)
|
YEAR(col) >= val is the same as col >= YEAR_START(val)
|
YEAR(col) > val is the same as col > YEAR_END(val)
|
YEAR(col) = val is the same as col>=YEAR_START(val) AND col<=YEAR_END(val)
|
YEAR(col) BETWEEN val1 AND val2
|
is the same as
YEAR(col) >= val2 AND YEAR(col) <= val2
|
which reduces it to the cases already handled above.
If we use the rewrite approach, equality propagation will work automatically.
A patch: http://lists.askmonty.org/pipermail/commits/2017-November/011674.html
Still need to figure out if we need to take measures to account for leap seconds.
Somehow, I am unable to observe this leap second.
I get Ubuntu 16.04.3 LTS, compile mysql-5.7, get the datadir from mysql-test/var/install.db, with non-empty mysql.time_zone* tables, and try to reproduce what is said at: https://dev.mysql.com/doc/refman/5.7/en/time-zone-leap-seconds.html.
The docs say:
mysql> -- values differ internally but display the same
|
mysql> SELECT a, ts, UNIX_TIMESTAMP(ts) FROM t1;
|
+------+---------------------+--------------------+
|
| a | ts | UNIX_TIMESTAMP(ts) |
|
+------+---------------------+--------------------+
|
| 1 | 2008-12-31 23:59:59 | 1230767999 |
|
| 2 | 2008-12-31 23:59:59 | 1230768000 |
|
+------+---------------------+--------------------+
|
2 rows in set (0.00 sec)
|
while I get:
mysql> SELECT a, ts, UNIX_TIMESTAMP(ts) FROM t1;
|
+------+---------------------+--------------------+
|
| a | ts | UNIX_TIMESTAMP(ts) |
|
+------+---------------------+--------------------+
|
| 1 | 2008-12-31 23:59:59 | 1230767999 |
|
| 2 | 2009-01-01 00:00:00 | 1230768000 |
|
+------+---------------------+--------------------+
|
2 rows in set (0.00 sec)
|
|
Also discussed with bar.
Takeaways:
- one needs to have TZ data loaded into the server in order to observe the leap second.
- Check mysql_tzinfo_to_sql, and --leap parameter.
- Note that not all timezone files have leap second information (use "file" to see which do)
- DATE[TIME] type has no leap seconds. only TIMESTAMP does
- TIMESTAMP has open bugs like
MDEV-13995- It's better to not have this patch support TIMESTAMP for now. Let's do it just for DATE TIMEs.
- This means we dont have to support leap seconds.
Adjusted the patch according to the above: http://lists.askmonty.org/pipermail/commits/2017-November/011680.html
A test patch to also make rewrite function a member of Item: http://lists.askmonty.org/pipermail/commits/2017-November/011681.html
There are test failures looking like
main.sargable_date_cond w3 [ fail ]
|
Test ended at 2023-03-16 04:35:43
|
|
CURRENT_TEST: main.sargable_date_cond
|
--- /usr/share/mysql/mysql-test/main/sargable_date_cond.result 2023-03-16 01:36:33.000000000 -0400
|
+++ /dev/shm/var/3/log/sargable_date_cond.reject 2023-03-16 04:35:42.953159711 -0400
|
@@ -1919,6 +1919,8 @@
|
timestampadd(hour, B.a, date_add('2016-01-01', interval A.a*8 day)),
|
date_add('2016-01-01', interval A.a*7 day)
|
from t1 A, t0 B;
|
+Warnings:
|
+Warning 1299 Invalid TIMESTAMP value in column 'a' at row 11
|
that occur only on some machines and not on others. Turns out it depends on whether the system time zone employs daylight saving time.
If it does then calling timestampadd() can lead to execution of the following code
my_time.cc |
/*
|
Fix that if we are in the non existing daylight saving time hour
|
we move the start of the next real hour.
|
|
This code doesn't handle such exotical thing as time-gaps whose length
|
is more than one hour or non-integer (latter can theoretically happen
|
if one of seconds will be removed due leap correction, or because of
|
general time correction like it happened for Africa/Monrovia time zone
|
in year 1972).
|
*/
|
if (loop == 2 && t->hour != (uint) l_time->tm_hour)
|
{
|
int days= t->day - l_time->tm_mday;
|
if (days < -1)
|
days=1; /* Month has wrapped */
|
else if (days > 1)
|
days= -1;
|
diff=(3600L*(long) (days*24+((int) t->hour - (int) l_time->tm_hour))+
|
(long) (60*((int) t->minute - (int) l_time->tm_min)) +
|
(long) ((int) t->second - (int) l_time->tm_sec));
|
if (diff == 3600)
|
tmp+=3600 - t->minute*60 - t->second; /* Move to next hour */
|
else if (diff == -3600)
|
tmp-=t->minute*60 + t->second; /* Move to previous hour */
|
|
*error_code= ER_WARN_INVALID_TIMESTAMP;
|
}
|
At the moment, the task has been tested with cases for the SELECT, a fix of MDEV-30946 is needed to continue testing
any hints where to start looking in the code to implement this appreciated.