[MDEV-8320] Allow index usage for DATE(datetime_column) = const Created: 2015-06-16  Updated: 2024-01-26  Resolved: 2023-04-25

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 11.1.1

Type: Task Priority: Blocker
Reporter: Jiri Kavalik Assignee: Oleg Smirnov
Resolution: Fixed Votes: 5
Labels: Preview_11.1

Issue Links:
Duplicate
is duplicated by MDEV-8937 Optimize date functions in the parser Closed
Problem/Incident
causes MDEV-30901 Index usage for DATE(datetime_column)... Closed
causes MDEV-30913 Index usage for DATE(datetime_column)... Closed
causes MDEV-30946 Index usage for DATE(datetime_column)... Closed
causes MDEV-33299 Assertion `(tm->tv_usec % (int) log_1... Closed
Relates
relates to MDEV-30930 Make more date conditions sargable Open
relates to MDEV-14635 Convert date operations on indexed co... Closed
Sprint: 10.2.2-2, 10.2.2-3, 10.2.2-1, 10.2.2-2, 10.2.2-3, 10.2.2-4, 10.2.3-1, 10.2.3-2

 Description   

Would it be possible to make DATE() on datetime column sargable in some cases?
Like rewrite "DATE(col) = const" to "col BETWEEN concat(const, ' 00:00:00') AND concat(const, ' 23:59:59')"

Other similar cases:

  • "YEAR(col) = const" is almost the same
  • "YEAR(col) = c1 AND MONTH(col) = c2" and other such combinations may be too complex to be worth it.


 Comments   
Comment by Daniel Black [ 2015-06-16 ]

any hints where to start looking in the code to implement this appreciated.

Comment by Sergei Golubchik [ 2015-06-17 ]

I'd look first at add_key_fields(). But perhaps I would've been wrong

Comment by Sergei Petrunia [ 2015-06-17 ]

add_key_fields() is an auxiliary function that used for constructing ref access plans (here, ref includes ref_or_null, or eq_ref). ref access plans are based on equalities (or other conditions that can be reduced to equalities), such that the records can be read with handler->index_read()/handler->index_next_same() calls.

YEAR(date_col) = const specifies a range. So, the right way would be to use range optimizer to handle this.

Comment by Sergei Petrunia [ 2015-06-17 ]

As for using range optimizer... I'll debug a similar query and point places where changes are be made.

Let the query be:

explain select * from t100 where a between '2015-10-10' and '2015-10-12';
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t100  | range | a             | a    | 4       | NULL |    3 | Using index condition |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

The first place is:

  #0  add_key_field (join=0x7fff5c0065a0, key_fields=0x7ffff7eb5b18, and_level=0, cond=0x7fff5c005cf8, field=0x7fff5c010a90, eq_func=false, value=0x7fff5c005e48, num_values=2, usable_tables=18446744073709551615, sargables=0x7ffff7eb5c88) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4482
  #1  0x0000555555ab03be in add_key_equal_fields (join=0x7fff5c0065a0, key_fields=0x7ffff7eb5b18, and_level=0, cond=0x7fff5c005cf8, field_item=0x7fff5c005aa8, eq_func=false, val=0x7fff5c005e48, num_values=2, usable_tables=18446744073709551615, sargables=0x7ffff7eb5c88) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4568
  #2  0x0000555555ab0a58 in Item_func_between::add_key_fields (this=0x7fff5c005cf8, join=0x7fff5c0065a0, key_fields=0x7ffff7eb5b18, and_level=0x7ffff7eb5b50, usable_tables=18446744073709551615, sargables=0x7ffff7eb5c88) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4734
  #3  0x0000555555ab2524 in update_ref_and_keys (thd=0x55555aad3700, keyuse=0x7fff5c0068a0, join_tab=0x7fff5c006e18, tables=1, cond=0x7fff5c005cf8, normal_tables=18446744073709551615, select_lex=0x55555aad77b0, sargables=0x7ffff7eb5c88) at /home/psergey/dev-git/10.1/sql/sql_select.cc:5302
  #4  0x0000555555aad6a1 in make_join_statistics (join=0x7fff5c0065a0, tables_list=..., keyuse_array=0x7fff5c0068a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3615
  #5  0x0000555555aa5836 in JOIN::optimize_inner (this=0x7fff5c0065a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1341
  #6  0x0000555555aa478c in JOIN::optimize (this=0x7fff5c0065a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1023

This is the preparation stage. We collect in TABLE::const_keys a set of indexes for which range optimization is probably useful. Note the frame with Item_func_between::add_key_fields, I guess Item_func_eq will need a similar function that will check if it is comparing a YEAR(key_col) with a constant

Comment by Sergei Petrunia [ 2015-06-17 ]

Construction of range objects happens here:

  #0  SEL_ARG::SEL_ARG (this=0x7fff5c014a60, f=0x7fff5c010a90, min_value_arg=0x7fff5c014a58 "", max_value_arg=0x7fff5c014a58 "") at /home/psergey/dev-git/10.1/sql/opt_range.cc:2479
  #1  0x0000555555988784 in get_mm_leaf (param=0x7ffff7eb3410, conf_func=0x7fff5c005cf8, field=0x7fff5c010a90, key_part=0x7fff5c0147b8, type=Item_func::GE_FUNC, value=0x7fff5c005bb8) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8598
  #2  0x00005555559878c9 in get_mm_parts (param=0x7ffff7eb3410, cond_func=0x7fff5c005cf8, field=0x7fff5c010a90, type=Item_func::GE_FUNC, value=0x7fff5c005bb8, cmp_type=TIME_RESULT) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8280
  #3  0x0000555555985b0b in get_func_mm_tree (param=0x7ffff7eb3410, cond_func=0x7fff5c005cf8, field=0x7fff5c010a90, value=0x0, cmp_type=TIME_RESULT, inv=false) at /home/psergey/dev-git/10.1/sql/opt_range.cc:7604
  #4  0x000055555598652d in get_full_func_mm_tree (param=0x7ffff7eb3410, cond_func=0x7fff5c005cf8, field_item=0x7fff5c005aa8, value=0x0, inv=false) at /home/psergey/dev-git/10.1/sql/opt_range.cc:7931
  #5  0x0000555555986de0 in Item_func_between::get_mm_tree (this=0x7fff5c005cf8, param=0x7ffff7eb3410, cond_ptr=0x7fff5c007eb0) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8117
  #6  0x000055555597c083 in SQL_SELECT::test_quick_select (this=0x7fff5c007ea8, thd=0x55555aad3700, keys_to_use=..., prev_tables=0, limit=18446744073709551615, force_quick_range=false, ordered_output=false, remove_false_parts_of_where=true) at /home/psergey/dev-git/10.1/sql/opt_range.cc:3133
  #7  0x0000555555aac9a4 in get_quick_record_count (thd=0x55555aad3700, select=0x7fff5c007ea8, table=0x7fff5c00f640, keys=0x7fff5c006f98, limit=18446744073709551615, in_outer_join=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3362
  #8  0x0000555555aaed53 in make_join_statistics (join=0x7fff5c0065a0, tables_list=..., keyuse_array=0x7fff5c0068a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4004
  #9  0x0000555555aa5836 in JOIN::optimize_inner (this=0x7fff5c0065a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1341
  #10 0x0000555555aa478c in JOIN::optimize (this=0x7fff5c0065a0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1023

Once SEL_ARG objects are there, the rest of things (getting records_in_range() estimates, handling of AND/OR, multi-component keys) will happen automatically. The tricks is to create the right SEL_ARGs.

Comment by Sergei Petrunia [ 2015-06-17 ]

Looking at where SEL_ARG object is made for a query with equality:

 explain select * from t100 where a= '2015-10-10' ;

  #0  SEL_ARG::SEL_ARG (this=0x7fff5c015cb0, f=0x7fff5c010a90, min_value_arg=0x7fff5c015ca8 "", max_value_arg=0x7fff5c015ca8 "") at /home/psergey/dev-git/10.1/sql/opt_range.cc:2479
  #1  0x0000555555988784 in get_mm_leaf (param=0x7ffff7eb3410, conf_func=0x7fff5c006d58, field=0x7fff5c010a90, key_part=0x7fff5c015a08, type=Item_func::EQ_FUNC, value=0x7fff5c005b88) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8598
  #2  0x00005555559878c9 in get_mm_parts (param=0x7ffff7eb3410, cond_func=0x7fff5c006d58, field=0x7fff5c010a90, type=Item_func::EQ_FUNC, value=0x7fff5c005b88, cmp_type=TIME_RESULT) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8280
  #3  0x00005555559872b3 in Item_equal::get_mm_tree (this=0x7fff5c006d58, param=0x7ffff7eb3410, cond_ptr=0x7fff5c008078) at /home/psergey/dev-git/10.1/sql/opt_range.cc:8194
  #4  0x000055555597c083 in SQL_SELECT::test_quick_select (this=0x7fff5c008070, thd=0x55555aad3700, keys_to_use=..., prev_tables=0, limit=18446744073709551615, force_quick_range=false, ordered_output=false, remove_false_parts_of_where=true) at /home/psergey/dev-git/10.1/sql/opt_range.cc:3133
  #5  0x0000555555aac9a4 in get_quick_record_count (thd=0x55555aad3700, select=0x7fff5c008070, table=0x7fff5c00f640, keys=0x7fff5c007128, limit=18446744073709551615, in_outer_join=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3362
  #6  0x0000555555aaed53 in make_join_statistics (join=0x7fff5c0064e0, tables_list=..., keyuse_array=0x7fff5c0067e0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:4004
  #7  0x0000555555aa5836 in JOIN::optimize_inner (this=0x7fff5c0064e0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1341
  #8  0x0000555555aa478c in JOIN::optimize (this=0x7fff5c0064e0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:1023

It looks like get_mm_leaf is the function which should have the logic that figures that

YEAR(key_col) = C1  -->   " concat(C1, '-01-01') <=  key_col  <= concat(C1, '-12-31')  " 

Comment by Sergei Petrunia [ 2015-06-17 ]

danblack, hopefully these are enough? I hope I didn't miss anything.

Comment by Daniel Black [ 2015-06-18 ]

great start, thanks psergei!

Comment by Jiri Kavalik [ 2015-07-17 ]

Thank you all for hints. I tried to implement this in https://github.com/jkavalik/server/commit/2a66fa9bde198bca78625210f632112586c818e7

No idea if it is the right way to do such a thing but it seems to work as I imagined. I did not yet manage to run complete test suite (had to put my laptop to sleep and it then crashed on timeout..) but I ran some (innodb, select, explain, type_date, type_datetime directly and then some 1200 from the entire suite before the sleep) without troubles.

Comment by Sergei Petrunia [ 2015-07-20 ]

Commented on the patch in github. Summary: found a number of issues which need to be resolved before this can be pushed.

Comment by Jiri Kavalik [ 2015-07-21 ]

Thank you for the comments. I will try to work on those issues.

Comment by Daniel Black [ 2015-07-28 ]

Minor addition, when these optimisations occur it would be good to see an addition text in the 'Extra' of the explain output like '

{funct}

(colref) optimised to range;'

Comment by Sergei Petrunia [ 2015-08-12 ]

Saw a customer case with conditions like this:

DATE(table.datetime_col) BETWEEN '2015-08-12' AND '2015-08-12'
 
DATE(table.date_col) BETWEEN '2015-08-12' AND '2015-08-12'

The columns have types DATE and DATETIME.

Comment by Sergei Petrunia [ 2016-06-23 ]

Saw another case where the real-world queries are using conditions in form:

DATE(date_col) = '2016-06-23'

Sometimes these conditions are OR-ed together. The comparison is always with a constant.
It could be that date_col is actually a DATETIME.

Comment by Alexey Botchkov [ 2016-09-20 ]

The patch v0. Makes DATE(date_col) <>= CONST_DATE sargable.
http://lists.askmonty.org/pipermail/commits/2016-September/009874.html

Comment by Alexey Botchkov [ 2016-09-20 ]

http://lists.askmonty.org/pipermail/commits/2016-September/009874.html

Comment by Michaël de groot [ 2016-09-22 ]

When this is done, please don't forget to update this kb article: https://mariadb.com/kb/en/mariadb/building-the-best-index-for-a-given-select/

Comment by Alexey Botchkov [ 2016-09-28 ]

Test added.
http://lists.askmonty.org/pipermail/commits/2016-September/009935.html

Comment by Michaël de groot [ 2017-05-30 ]

This issue is in review for a while now, any status update?

Thanks!

Comment by Sergei Petrunia [ 2017-11-16 ]

So the patch works when the WHERE clause has just the condition we are
targeting:

explain select * from t1 where date(a) < '2017-01-10' ;
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t1    | range | a             | a    | 6       | NULL |   90 | Using index condition |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

but doesn't work with equality propagation:

explain select * from t1 where date(a1) < '2017-01-10' and a1=a;
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 10122 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+

Actually, even irrelevant equalities make the patch not to work. Let's add
"b=b1" to the first query and we get:

explain select * from t1 where date(a) < '2017-01-10' and b=b1;
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 10122 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+

Another thing: the patch works only if DATE(col) is on the left side of the equality:

explain select * from t1 where date(a) = '2017-01-10';
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t1    | range | a             | a    | 6       | NULL |   10 | Using index condition |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

explain select * from t1 where '2017-01-10'=date(a);
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 10122 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+

Comment by Sergei Petrunia [ 2017-11-16 ]

Elaborating on Monty's "do a rewrite at fix_fields" proposal.

Let's consider YEAR(a), where a is a DATETIME column.

introduce functions:

YEAR_START(datetime_col) returns CONCAT(YEAR(datetime_col), '-01-01 00:00:00')
YEAR_END(datetime_col) returns CONCAT(YEAR(datetime_col), '-31-12 23:59:59 ...')

(add more precision if necessary)

then

YEAR(col) <= val  is the same as  col <= YEAR_END(val)
YEAR(col) < val   is the same as  col < YEAR_START(val)

YEAR(col) >= val  is the same as  col >= YEAR_START(val)
YEAR(col) > val   is the same as  col > YEAR_END(val)

YEAR(col) = val   is the same as  col>=YEAR_START(val) AND col<=YEAR_END(val)

YEAR(col) BETWEEN val1 AND val2 

is the same as

YEAR(col) >= val2 AND YEAR(col) <= val2

which reduces it to the cases already handled above.

If we use the rewrite approach, equality propagation will work automatically.

Comment by Sergei Petrunia [ 2017-11-20 ]

I'm working on a patch to implement the new proposal.

Comment by Sergei Petrunia [ 2017-11-23 ]

A patch: http://lists.askmonty.org/pipermail/commits/2017-November/011674.html
Still need to figure out if we need to take measures to account for leap seconds.

Comment by Sergei Petrunia [ 2017-11-23 ]

Somehow, I am unable to observe this leap second.
I get Ubuntu 16.04.3 LTS, compile mysql-5.7, get the datadir from mysql-test/var/install.db, with non-empty mysql.time_zone* tables, and try to reproduce what is said at: https://dev.mysql.com/doc/refman/5.7/en/time-zone-leap-seconds.html.

The docs say:

mysql> -- values differ internally but display the same
mysql> SELECT a, ts, UNIX_TIMESTAMP(ts) FROM t1;
+------+---------------------+--------------------+
| a    | ts                  | UNIX_TIMESTAMP(ts) |
+------+---------------------+--------------------+
|    1 | 2008-12-31 23:59:59 |         1230767999 |
|    2 | 2008-12-31 23:59:59 |         1230768000 |
+------+---------------------+--------------------+
2 rows in set (0.00 sec)

while I get:

mysql> SELECT a, ts, UNIX_TIMESTAMP(ts) FROM t1;
+------+---------------------+--------------------+
| a    | ts                  | UNIX_TIMESTAMP(ts) |
+------+---------------------+--------------------+
|    1 | 2008-12-31 23:59:59 |         1230767999 |
|    2 | 2009-01-01 00:00:00 |         1230768000 |
+------+---------------------+--------------------+
2 rows in set (0.00 sec)

Comment by Sergei Petrunia [ 2017-11-24 ]

Also discussed with bar.

Takeaways:

  • one needs to have TZ data loaded into the server in order to observe the leap second.
    • Check mysql_tzinfo_to_sql, and --leap parameter.
    • Note that not all timezone files have leap second information (use "file" to see which do)
  • DATE[TIME] type has no leap seconds. only TIMESTAMP does
  • TIMESTAMP has open bugs like MDEV-13995
    • It's better to not have this patch support TIMESTAMP for now. Let's do it just for DATE TIMEs.
    • This means we dont have to support leap seconds.
Comment by Sergei Petrunia [ 2017-11-28 ]

Adjusted the patch according to the above: http://lists.askmonty.org/pipermail/commits/2017-November/011680.html

A test patch to also make rewrite function a member of Item: http://lists.askmonty.org/pipermail/commits/2017-November/011681.html

Comment by Sergei Petrunia [ 2017-11-28 ]

bar, please review.

Comment by Oleg Smirnov [ 2023-03-17 ]

There are test failures looking like

main.sargable_date_cond                  w3 [ fail ]
        Test ended at 2023-03-16 04:35:43
 
CURRENT_TEST: main.sargable_date_cond
--- /usr/share/mysql/mysql-test/main/sargable_date_cond.result	2023-03-16 01:36:33.000000000 -0400
+++ /dev/shm/var/3/log/sargable_date_cond.reject	2023-03-16 04:35:42.953159711 -0400
@@ -1919,6 +1919,8 @@
 timestampadd(hour, B.a, date_add('2016-01-01', interval A.a*8 day)),
 date_add('2016-01-01', interval A.a*7 day)
 from t1 A, t0 B;
+Warnings:
+Warning	1299	Invalid TIMESTAMP value in column 'a' at row 11

that occur only on some machines and not on others. Turns out it depends on whether the system time zone employs daylight saving time.

If it does then calling timestampadd() can lead to execution of the following code

my_time.cc

  /*
    Fix that if we are in the non existing daylight saving time hour
    we move the start of the next real hour.
 
    This code doesn't handle such exotical thing as time-gaps whose length
    is more than one hour or non-integer (latter can theoretically happen
    if one of seconds will be removed due leap correction, or because of
    general time correction like it happened for Africa/Monrovia time zone
    in year 1972).
  */
  if (loop == 2 && t->hour != (uint) l_time->tm_hour)
  {
    int days= t->day - l_time->tm_mday;
    if (days < -1)
      days=1;					/* Month has wrapped */
    else if (days > 1)
      days= -1;
    diff=(3600L*(long) (days*24+((int) t->hour - (int) l_time->tm_hour))+
	  (long) (60*((int) t->minute - (int) l_time->tm_min)) +
          (long) ((int) t->second - (int) l_time->tm_sec));
    if (diff == 3600)
      tmp+=3600 - t->minute*60 - t->second;	/* Move to next hour */
    else if (diff == -3600)
      tmp-=t->minute*60 + t->second;		/* Move to previous hour */
 
    *error_code= ER_WARN_INVALID_TIMESTAMP;
  }

Comment by Sergei Petrunia [ 2023-03-20 ]

https://github.com/MariaDB/server/tree/preview-11.1-mdev-8320

Comment by Lena Startseva [ 2023-04-04 ]

At the moment, the task has been tested with cases for the SELECT, a fix of MDEV-30946 is needed to continue testing

Comment by Lena Startseva [ 2023-04-25 ]

Testing done. Ok to push

Comment by Oleg Smirnov [ 2023-04-25 ]

Pushed to 11.1

Generated at Thu Feb 08 07:26:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.