Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32148

Inefficient WHERE timestamp_column=datetime_const_expr

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
    • 11.3.2, 11.4.1
    • Data types, Optimizer
    • None

    Description

      A query with this condition:

        WHERE timestamp_column=datetime_const_expr
      

      for example

        WHERE timestamp_col = DATE_SUB('2023-09-01 17:17:15',INTERVAL 1 HOUR)
      

      goes through an inefficient execution path. It involves a gmt_sec_to_TIME() call per every row, which in case of the @@time_zone=SYSTEM involves a slow localtime_r() call.

      This happens because TIMESTAMP and DATETIME are compared as DATETIME historically.

      This could be optimized to use TIMESTAMP comparison when it's possible to avoid TIMESTAMP->DATETIME per-row conversion. This should be faster for big tables.

      What can be done

      It's generally not possible to compare always TIMESTAMP and DATETIME as TIMESTAMP without behavior changes, because:

      • DATETIME has a wider range.
      • Two different TIMESTAMP values can have the same DATETIME value near the "fall back" DST change, as well as for leap seconds.
      • There are DATETIME gaps during the "spring forward" DST switch, which are adjusted to the start of the gap.

      However, if the DATETIME side is a constant inside monotone continuous periods (without DST changes and leap seconds), then we can compare it to TIMESTAMP as TIMESTAMP. The DATETIME argument can be converted once to TIMESTAMP, so no data type conversion will happen per row.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            psergei, thanks, good catch about MTR.

            The new method was needed to override the DATETIME-based code in Type_handler_temporal_result::Item_const_eq(), which compares two MYSQL_TIME values, and which is not relevant for TIMESTAMP any more because Item_timestamp_literal does not have a MYSQL_TIME value inside. Without the change, Type_handler_temporal_result::Item_const_eq() would crash on elimination of equal conditions having Item_timestamp_literal inside.

            But it appeared that trivial condition elimination haven't been covered in MTR at all so far, this is why you did not get crashes.

            Please find a new patch:

            https://github.com/MariaDB/server/commit/c44b546c0067bdf7aa66706af438b4b024a4578b

            Now instead of "return false", it implements detecting equal Item_timestamp_literal's inside this method, and adds MTR tests.

            bar Alexander Barkov added a comment - - edited psergei , thanks, good catch about MTR. The new method was needed to override the DATETIME-based code in Type_handler_temporal_result::Item_const_eq(), which compares two MYSQL_TIME values, and which is not relevant for TIMESTAMP any more because Item_timestamp_literal does not have a MYSQL_TIME value inside. Without the change, Type_handler_temporal_result::Item_const_eq() would crash on elimination of equal conditions having Item_timestamp_literal inside. But it appeared that trivial condition elimination haven't been covered in MTR at all so far, this is why you did not get crashes. Please find a new patch: https://github.com/MariaDB/server/commit/c44b546c0067bdf7aa66706af438b4b024a4578b Now instead of "return false", it implements detecting equal Item_timestamp_literal's inside this method, and adds MTR tests.
            psergei Sergei Petrunia added a comment - - edited

            An observation: IN is not covered:

            MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE a in ('2020-03-03 01:01:01', '2020-03-04 01:01:01');
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
            | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
            |    1 | SIMPLE      | t1    | range | a             | a    | 4       | NULL | 2    |   100.00 | Using where; Using index |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
            1 row in set, 1 warning (0.002 sec)
             
            MariaDB [test]> show warnings;
            +-------+------+-----------------------------------------------------------------------------------------------------------------------+
            | Level | Code | Message                                                                                                               |
            +-------+------+-----------------------------------------------------------------------------------------------------------------------+
            | Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where `test`.`t1`.`a` in ('2020-03-03 01:01:01','2020-03-04 01:01:01') |
            +-------+------+-----------------------------------------------------------------------------------------------------------------------+
            

            Not sure if it can/should be done as part of this patch, yet, continuing with review.

            BETWEEN is not handled, either:

            explain format=json select * from t1 where a >= '2021-01-01 00:00:00' and  a<='2021-02-02 00:00:00';
            "attached_condition": "t1.a >= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-01-01 00:00:00' and t1.a <= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-02-02 00:00:00'"
             
            explain format=json select * from t1 where a between '2021-01-01 00:00:00' and '2021-02-02 00:00:00';
            "attached_condition": "t1.a between <cache>('2021-01-01 00:00:00') and <cache>('2021-02-02 00:00:00')"
            

            The commit comment for the patch:

                MDEV-32148 Inefficient WHERE timestamp_column=datetime_const_expr
                
                Changing the way how the following condition is evaluated:
                
                    WHERE timestamp_column=datetime_const_expr
                
                Before the change it was always performed as DATETIME.
            ...
            

            this should be changed to indicate that non-equality comparisons are also handled.

            For

            --let $replace_regex_tsltz6= ...
            ...
            --replace_regex $replace_regex_tsltz6
            

            I would ask to introduce something with less cryptic name and use it in only one location:

            --source include/replace_timestamp_w_local_tz_value.inc
            

            psergei Sergei Petrunia added a comment - - edited An observation: IN is not covered: MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE a in ('2020-03-03 01:01:01', '2020-03-04 01:01:01'); +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | t1 | range | a | a | 4 | NULL | 2 | 100.00 | Using where; Using index | +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+ 1 row in set, 1 warning (0.002 sec)   MariaDB [test]> show warnings; +-------+------+-----------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-----------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where `test`.`t1`.`a` in ('2020-03-03 01:01:01','2020-03-04 01:01:01') | +-------+------+-----------------------------------------------------------------------------------------------------------------------+ Not sure if it can/should be done as part of this patch, yet, continuing with review. BETWEEN is not handled, either: explain format=json select * from t1 where a >= '2021-01-01 00:00:00' and a<='2021-02-02 00:00:00'; "attached_condition": "t1.a >= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-01-01 00:00:00' and t1.a <= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-02-02 00:00:00'"   explain format=json select * from t1 where a between '2021-01-01 00:00:00' and '2021-02-02 00:00:00'; "attached_condition": "t1.a between <cache>('2021-01-01 00:00:00') and <cache>('2021-02-02 00:00:00')" The commit comment for the patch: MDEV-32148 Inefficient WHERE timestamp_column=datetime_const_expr Changing the way how the following condition is evaluated: WHERE timestamp_column=datetime_const_expr Before the change it was always performed as DATETIME. ... this should be changed to indicate that non-equality comparisons are also handled. For --let $replace_regex_tsltz6= ... ... --replace_regex $replace_regex_tsltz6 I would ask to introduce something with less cryptic name and use it in only one location: --source include/replace_timestamp_w_local_tz_value.inc

            A suggestion how to get rid of the extra warnings. At the cost of not providing the optimization for the queries that produce such warnings:

            --- item_cmpfunc.cc
            +++ item_cmpfunc.cc
            @@ -460,9 +460,28 @@  get_timestamp_item_for_comparison
                   !expr1->type_handler()->can_return_date())
                 return NULL;
             
            +  struct Count_handler : public Internal_error_handler
            +  {
            +    bool hit=false;
            +    bool handle_condition(THD *thd,
            +                          uint sql_errno,
            +                          const char *sqlstate,
            +                          Sql_condition::enum_warning_level *level,
            +                          const char *msg,
            +                          Sql_condition **cond_hdl)
            +    {
            +      hit=true;
            +      return *level == Sql_condition::WARN_LEVEL_WARN;
            +    }
            +  } cnt_handler;
            +
            +  //Suppress_warnings_error_handler warning_handler;
            +  thd->push_internal_handler(&cnt_handler);
            +
               Datetime dt(thd, expr1, Timestamp::DatetimeOptions(thd));
             
            -  if (!dt.is_valid_datetime())
            +  thd->pop_internal_handler();
            +  if (cnt_handler.hit || !dt.is_valid_datetime())
                 return NULL; // SQL NULL DATETIME, or a DATETIME with zeros in YYYYMMDD
             
               // '0000-00-00 00:00:00' is a special valid MariaDB TIMESTAMP value
            
            

            psergei Sergei Petrunia added a comment - A suggestion how to get rid of the extra warnings. At the cost of not providing the optimization for the queries that produce such warnings: --- item_cmpfunc.cc +++ item_cmpfunc.cc @@ -460,9 +460,28 @@ get_timestamp_item_for_comparison !expr1->type_handler()->can_return_date()) return NULL; + struct Count_handler : public Internal_error_handler + { + bool hit=false; + bool handle_condition(THD *thd, + uint sql_errno, + const char *sqlstate, + Sql_condition::enum_warning_level *level, + const char *msg, + Sql_condition **cond_hdl) + { + hit=true; + return *level == Sql_condition::WARN_LEVEL_WARN; + } + } cnt_handler; + + //Suppress_warnings_error_handler warning_handler; + thd->push_internal_handler(&cnt_handler); + Datetime dt(thd, expr1, Timestamp::DatetimeOptions(thd)); - if (!dt.is_valid_datetime()) + thd->pop_internal_handler(); + if (cnt_handler.hit || !dt.is_valid_datetime()) return NULL; // SQL NULL DATETIME, or a DATETIME with zeros in YYYYMMDD // '0000-00-00 00:00:00' is a special valid MariaDB TIMESTAMP value

            Note to self: The patch has this call:

            expr1->can_eval_in_optimize() 
            

            and it will evaluate ##expr1## if that call returns TRUE. This will happen at Name Resolution (fix_fields()) phase. However, this will not attempt to evaluate "cheap" subqueries: subqueries do not yet have query plans at Name Resolution stage, so they are not considered "cheap".

            psergei Sergei Petrunia added a comment - Note to self: The patch has this call: expr1->can_eval_in_optimize() and it will evaluate ##expr1## if that call returns TRUE. This will happen at Name Resolution (fix_fields()) phase. However, this will not attempt to evaluate "cheap" subqueries: subqueries do not yet have query plans at Name Resolution stage, so they are not considered "cheap".

            Ok, the patch needs the following:

            After that, it's ok to push into 11.3.

            psergei Sergei Petrunia added a comment - Ok, the patch needs the following: Address the input in https://jira.mariadb.org/browse/MDEV-32148?focusedCommentId=277190&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-277190 Consider applying comment/formatting fixes from here: https://gist.github.com/spetrunia/89531b74dfb7ba8aeaf6c293e31aa909 Change the target version to be 11.3 (Discussed this with Monty) After that, it's ok to push into 11.3.

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.