[MDEV-32148] Inefficient WHERE timestamp_column=datetime_const_expr Created: 2023-09-11  Updated: 2024-01-26  Resolved: 2024-01-12

Status: Closed
Project: MariaDB Server
Component/s: Data types, Optimizer
Affects Version/s: 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 11.3.2, 11.4.1

Type: Bug Priority: Critical
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Blocks
Problem/Incident
causes MDEV-33299 Assertion `(tm->tv_usec % (int) log_1... Closed
Relates
relates to MDEV-14271 Dynamic SQL: TIMESTAMP parameter valu... Open
relates to MDEV-16422 DST and unix_timestamp(now()) Confirmed
relates to MDEV-16423 DST and timestamp comparison Confirmed
relates to MDEV-32113 utf8mb3_key_col=utf8mb4_value cannot ... Closed
relates to MDEV-32152 Wrong results near DST for `WHERE ind... Open
relates to MDEV-32203 Raise notes when an index cannot be u... Closed
relates to MDEV-19121 optimizer fails to optimize query on ... Confirmed
relates to MDEV-22828 Bad results near DST change: UNIX_TIM... Open

 Description   

A query with this condition:

  WHERE timestamp_column=datetime_const_expr

for example

  WHERE timestamp_col = DATE_SUB('2023-09-01 17:17:15',INTERVAL 1 HOUR)

goes through an inefficient execution path. It involves a gmt_sec_to_TIME() call per every row, which in case of the @@time_zone=SYSTEM involves a slow localtime_r() call.

This happens because TIMESTAMP and DATETIME are compared as DATETIME historically.

This could be optimized to use TIMESTAMP comparison when it's possible to avoid TIMESTAMP->DATETIME per-row conversion. This should be faster for big tables.

What can be done

It's generally not possible to compare always TIMESTAMP and DATETIME as TIMESTAMP without behavior changes, because:

  • DATETIME has a wider range.
  • Two different TIMESTAMP values can have the same DATETIME value near the "fall back" DST change, as well as for leap seconds.
  • There are DATETIME gaps during the "spring forward" DST switch, which are adjusted to the start of the gap.

However, if the DATETIME side is a constant inside monotone continuous periods (without DST changes and leap seconds), then we can compare it to TIMESTAMP as TIMESTAMP. The DATETIME argument can be converted once to TIMESTAMP, so no data type conversion will happen per row.



 Comments   
Comment by Alexander Barkov [ 2023-09-12 ]

Testing how the patch affected performance

Summary:

In a CMAKE_BUILD_TYPE=Release build, the timing (in seconds) of the query:

SELECT * FROM t1 WHERE timestamp_column='2001-01-01 00:00:00';

changed:

  • from 0.253 to 0.189 for InnoDB (25% improvement)
  • from 0.109 to 0.058 for MyISAM (47% improvement)
  • from 0.106 to 0.052 for HEAP (51% improvement)

See details below.

A procedure to create and populate a table for benchmarking

DELIMITER $$
CREATE OR REPLACE PROCEDURE populate(eng TEXT)
BEGIN
  EXECUTE IMMEDIATE CONCAT('CREATE OR REPLACE TABLE t1 (a timestamp) ENGINE=', eng);
  INSERT INTO t1 VALUES ('2001-01-01 00:00:00');
  START TRANSACTION;
  FOR i IN 1..1000000
  DO
    INSERT INTO t1 VALUES ('2001-01-01 00:00:01');
  END FOR;
  COMMIT;
END;
$$
DELIMITER ;

Running performance tests before the change

CALL populate('InnoDB');
SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';

+---------------------+
| a                   |
+---------------------+
| 2001-01-01 00:00:00 |
+---------------------+
1 row in set (0.253 sec)

CALL populate('MyISAM');
SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';

+---------------------+
| a                   |
+---------------------+
| 2001-01-01 00:00:00 |
+---------------------+
1 row in set (0.109 sec)

CALL populate('MEMORY');
SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';

+---------------------+
| a                   |
+---------------------+
| 2001-01-01 00:00:00 |
+---------------------+
1 row in set (0.106 sec)

Running performance tests after the change

CALL populate('InnoDB');
SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';

+---------------------+
| a                   |
+---------------------+
| 2001-01-01 00:00:00 |
+---------------------+
1 row in set (0.189 sec)

CALL populate('MyISAM');
SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';

+---------------------+
| a                   |
+---------------------+
| 2001-01-01 00:00:00 |
+---------------------+
1 row in set (0.058 sec)

CALL populate('MEMORY');
SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';

+---------------------+
| a                   |
+---------------------+
| 2001-01-01 00:00:00 |
+---------------------+
1 row in set (0.052 sec)

Comment by Alexander Barkov [ 2023-09-12 ]

Sergei, please review this patch:

https://github.com/MariaDB/server/commit/8eb85762e53f3ab4c4673be44c22876bcafbb072

Thanks.

Comment by Sergei Petrunia [ 2023-09-26 ]

About type_timestamp.test: why does the patch (for 10.6) add:

--echo #
--echo # End of 10.6 tests
--echo #

and then some tests, and then

--echo #
--echo # End of 10.10 tests
--echo #

This will be very confusing for those doing merges...
(continuing to review)

Comment by Sergei Petrunia [ 2023-10-02 ]

bar, why does the patch add this function:

+Type_handler_timestamp_common::Item_const_eq(const Item_const *a,
+                                             const Item_const *b,
+                                             bool binary_cmp) const

I've put DBUG_ASSERT(0); into it, ran the main testsuite and saw no failures. Please add test coverage or explain why it is not there...

Comment by Alexander Barkov [ 2023-10-03 ]

psergei, thanks, good catch about MTR.

The new method was needed to override the DATETIME-based code in Type_handler_temporal_result::Item_const_eq(), which compares two MYSQL_TIME values, and which is not relevant for TIMESTAMP any more because Item_timestamp_literal does not have a MYSQL_TIME value inside. Without the change, Type_handler_temporal_result::Item_const_eq() would crash on elimination of equal conditions having Item_timestamp_literal inside.

But it appeared that trivial condition elimination haven't been covered in MTR at all so far, this is why you did not get crashes.

Please find a new patch:

https://github.com/MariaDB/server/commit/c44b546c0067bdf7aa66706af438b4b024a4578b

Now instead of "return false", it implements detecting equal Item_timestamp_literal's inside this method, and adds MTR tests.

Comment by Sergei Petrunia [ 2023-12-20 ]

An observation: IN is not covered:

MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE a in ('2020-03-03 01:01:01', '2020-03-04 01:01:01');
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
|    1 | SIMPLE      | t1    | range | a             | a    | 4       | NULL | 2    |   100.00 | Using where; Using index |
+------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.002 sec)
 
MariaDB [test]> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                               |
+-------+------+-----------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where `test`.`t1`.`a` in ('2020-03-03 01:01:01','2020-03-04 01:01:01') |
+-------+------+-----------------------------------------------------------------------------------------------------------------------+

Not sure if it can/should be done as part of this patch, yet, continuing with review.

BETWEEN is not handled, either:

explain format=json select * from t1 where a >= '2021-01-01 00:00:00' and  a<='2021-02-02 00:00:00';
"attached_condition": "t1.a >= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-01-01 00:00:00' and t1.a <= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-02-02 00:00:00'"
 
explain format=json select * from t1 where a between '2021-01-01 00:00:00' and '2021-02-02 00:00:00';
"attached_condition": "t1.a between <cache>('2021-01-01 00:00:00') and <cache>('2021-02-02 00:00:00')"

The commit comment for the patch:

    MDEV-32148 Inefficient WHERE timestamp_column=datetime_const_expr
    
    Changing the way how the following condition is evaluated:
    
        WHERE timestamp_column=datetime_const_expr
    
    Before the change it was always performed as DATETIME.
...

this should be changed to indicate that non-equality comparisons are also handled.

For

--let $replace_regex_tsltz6= ...
...
--replace_regex $replace_regex_tsltz6

I would ask to introduce something with less cryptic name and use it in only one location:

--source include/replace_timestamp_w_local_tz_value.inc

Comment by Sergei Petrunia [ 2023-12-22 ]

A suggestion how to get rid of the extra warnings. At the cost of not providing the optimization for the queries that produce such warnings:

--- item_cmpfunc.cc
+++ item_cmpfunc.cc
@@ -460,9 +460,28 @@  get_timestamp_item_for_comparison
       !expr1->type_handler()->can_return_date())
     return NULL;
 
+  struct Count_handler : public Internal_error_handler
+  {
+    bool hit=false;
+    bool handle_condition(THD *thd,
+                          uint sql_errno,
+                          const char *sqlstate,
+                          Sql_condition::enum_warning_level *level,
+                          const char *msg,
+                          Sql_condition **cond_hdl)
+    {
+      hit=true;
+      return *level == Sql_condition::WARN_LEVEL_WARN;
+    }
+  } cnt_handler;
+
+  //Suppress_warnings_error_handler warning_handler;
+  thd->push_internal_handler(&cnt_handler);
+
   Datetime dt(thd, expr1, Timestamp::DatetimeOptions(thd));
 
-  if (!dt.is_valid_datetime())
+  thd->pop_internal_handler();
+  if (cnt_handler.hit || !dt.is_valid_datetime())
     return NULL; // SQL NULL DATETIME, or a DATETIME with zeros in YYYYMMDD
 
   // '0000-00-00 00:00:00' is a special valid MariaDB TIMESTAMP value

Comment by Sergei Petrunia [ 2023-12-30 ]

Note to self: The patch has this call:

expr1->can_eval_in_optimize() 

and it will evaluate ##expr1## if that call returns TRUE. This will happen at Name Resolution (fix_fields()) phase. However, this will not attempt to evaluate "cheap" subqueries: subqueries do not yet have query plans at Name Resolution stage, so they are not considered "cheap".

Comment by Sergei Petrunia [ 2024-01-08 ]

Ok, the patch needs the following:

After that, it's ok to push into 11.3.

Generated at Thu Feb 08 10:29:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.