Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32148

Inefficient WHERE timestamp_column=datetime_const_expr

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
    • 11.3.2, 11.4.1
    • Data types, Optimizer
    • None

    Description

      A query with this condition:

        WHERE timestamp_column=datetime_const_expr
      

      for example

        WHERE timestamp_col = DATE_SUB('2023-09-01 17:17:15',INTERVAL 1 HOUR)
      

      goes through an inefficient execution path. It involves a gmt_sec_to_TIME() call per every row, which in case of the @@time_zone=SYSTEM involves a slow localtime_r() call.

      This happens because TIMESTAMP and DATETIME are compared as DATETIME historically.

      This could be optimized to use TIMESTAMP comparison when it's possible to avoid TIMESTAMP->DATETIME per-row conversion. This should be faster for big tables.

      What can be done

      It's generally not possible to compare always TIMESTAMP and DATETIME as TIMESTAMP without behavior changes, because:

      • DATETIME has a wider range.
      • Two different TIMESTAMP values can have the same DATETIME value near the "fall back" DST change, as well as for leap seconds.
      • There are DATETIME gaps during the "spring forward" DST switch, which are adjusted to the start of the gap.

      However, if the DATETIME side is a constant inside monotone continuous periods (without DST changes and leap seconds), then we can compare it to TIMESTAMP as TIMESTAMP. The DATETIME argument can be converted once to TIMESTAMP, so no data type conversion will happen per row.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            Testing how the patch affected performance

            Summary:

            In a CMAKE_BUILD_TYPE=Release build, the timing (in seconds) of the query:

            SELECT * FROM t1 WHERE timestamp_column='2001-01-01 00:00:00';
            

            changed:

            • from 0.253 to 0.189 for InnoDB (25% improvement)
            • from 0.109 to 0.058 for MyISAM (47% improvement)
            • from 0.106 to 0.052 for HEAP (51% improvement)

            See details below.

            A procedure to create and populate a table for benchmarking

            DELIMITER $$
            CREATE OR REPLACE PROCEDURE populate(eng TEXT)
            BEGIN
              EXECUTE IMMEDIATE CONCAT('CREATE OR REPLACE TABLE t1 (a timestamp) ENGINE=', eng);
              INSERT INTO t1 VALUES ('2001-01-01 00:00:00');
              START TRANSACTION;
              FOR i IN 1..1000000
              DO
                INSERT INTO t1 VALUES ('2001-01-01 00:00:01');
              END FOR;
              COMMIT;
            END;
            $$
            DELIMITER ;
            

            Running performance tests before the change

            CALL populate('InnoDB');
            SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';
            

            +---------------------+
            | a                   |
            +---------------------+
            | 2001-01-01 00:00:00 |
            +---------------------+
            1 row in set (0.253 sec)
            

            CALL populate('MyISAM');
            SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';
            

            +---------------------+
            | a                   |
            +---------------------+
            | 2001-01-01 00:00:00 |
            +---------------------+
            1 row in set (0.109 sec)
            

            CALL populate('MEMORY');
            SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';
            

            +---------------------+
            | a                   |
            +---------------------+
            | 2001-01-01 00:00:00 |
            +---------------------+
            1 row in set (0.106 sec)
            

            Running performance tests after the change

            CALL populate('InnoDB');
            SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';
            

            +---------------------+
            | a                   |
            +---------------------+
            | 2001-01-01 00:00:00 |
            +---------------------+
            1 row in set (0.189 sec)
            

            CALL populate('MyISAM');
            SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';
            

            +---------------------+
            | a                   |
            +---------------------+
            | 2001-01-01 00:00:00 |
            +---------------------+
            1 row in set (0.058 sec)
            

            CALL populate('MEMORY');
            SELECT * FROM t1 WHERE a='2001-01-01 00:00:00';
            

            +---------------------+
            | a                   |
            +---------------------+
            | 2001-01-01 00:00:00 |
            +---------------------+
            1 row in set (0.052 sec)
            

            bar Alexander Barkov added a comment - - edited Testing how the patch affected performance Summary: In a CMAKE_BUILD_TYPE=Release build, the timing (in seconds) of the query: SELECT * FROM t1 WHERE timestamp_column= '2001-01-01 00:00:00' ; changed: from 0.253 to 0.189 for InnoDB (25% improvement) from 0.109 to 0.058 for MyISAM (47% improvement) from 0.106 to 0.052 for HEAP (51% improvement) See details below. A procedure to create and populate a table for benchmarking DELIMITER $$ CREATE OR REPLACE PROCEDURE populate(eng TEXT) BEGIN EXECUTE IMMEDIATE CONCAT( 'CREATE OR REPLACE TABLE t1 (a timestamp) ENGINE=' , eng); INSERT INTO t1 VALUES ( '2001-01-01 00:00:00' ); START TRANSACTION ; FOR i IN 1..1000000 DO INSERT INTO t1 VALUES ( '2001-01-01 00:00:01' ); END FOR ; COMMIT ; END ; $$ DELIMITER ; Running performance tests before the change CALL populate( 'InnoDB' ); SELECT * FROM t1 WHERE a= '2001-01-01 00:00:00' ; +---------------------+ | a | +---------------------+ | 2001-01-01 00:00:00 | +---------------------+ 1 row in set (0.253 sec) CALL populate( 'MyISAM' ); SELECT * FROM t1 WHERE a= '2001-01-01 00:00:00' ; +---------------------+ | a | +---------------------+ | 2001-01-01 00:00:00 | +---------------------+ 1 row in set (0.109 sec) CALL populate( 'MEMORY' ); SELECT * FROM t1 WHERE a= '2001-01-01 00:00:00' ; +---------------------+ | a | +---------------------+ | 2001-01-01 00:00:00 | +---------------------+ 1 row in set (0.106 sec) Running performance tests after the change CALL populate( 'InnoDB' ); SELECT * FROM t1 WHERE a= '2001-01-01 00:00:00' ; +---------------------+ | a | +---------------------+ | 2001-01-01 00:00:00 | +---------------------+ 1 row in set (0.189 sec) CALL populate( 'MyISAM' ); SELECT * FROM t1 WHERE a= '2001-01-01 00:00:00' ; +---------------------+ | a | +---------------------+ | 2001-01-01 00:00:00 | +---------------------+ 1 row in set (0.058 sec) CALL populate( 'MEMORY' ); SELECT * FROM t1 WHERE a= '2001-01-01 00:00:00' ; +---------------------+ | a | +---------------------+ | 2001-01-01 00:00:00 | +---------------------+ 1 row in set (0.052 sec)
            bar Alexander Barkov added a comment - - edited Sergei, please review this patch: https://github.com/MariaDB/server/commit/8eb85762e53f3ab4c4673be44c22876bcafbb072 Thanks.
            psergei Sergei Petrunia added a comment - - edited

            About type_timestamp.test: why does the patch (for 10.6) add:

            --echo #
            --echo # End of 10.6 tests
            --echo #
            

            and then some tests, and then

            --echo #
            --echo # End of 10.10 tests
            --echo #
            

            This will be very confusing for those doing merges...
            (continuing to review)

            psergei Sergei Petrunia added a comment - - edited About type_timestamp.test: why does the patch (for 10.6) add: --echo # --echo # End of 10.6 tests --echo # and then some tests, and then --echo # --echo # End of 10.10 tests --echo # This will be very confusing for those doing merges... (continuing to review)

            bar, why does the patch add this function:

            +Type_handler_timestamp_common::Item_const_eq(const Item_const *a,
            +                                             const Item_const *b,
            +                                             bool binary_cmp) const
            

            I've put DBUG_ASSERT(0); into it, ran the main testsuite and saw no failures. Please add test coverage or explain why it is not there...

            psergei Sergei Petrunia added a comment - bar , why does the patch add this function: +Type_handler_timestamp_common::Item_const_eq(const Item_const *a, + const Item_const *b, + bool binary_cmp) const I've put DBUG_ASSERT(0); into it, ran the main testsuite and saw no failures. Please add test coverage or explain why it is not there...
            bar Alexander Barkov added a comment - - edited

            psergei, thanks, good catch about MTR.

            The new method was needed to override the DATETIME-based code in Type_handler_temporal_result::Item_const_eq(), which compares two MYSQL_TIME values, and which is not relevant for TIMESTAMP any more because Item_timestamp_literal does not have a MYSQL_TIME value inside. Without the change, Type_handler_temporal_result::Item_const_eq() would crash on elimination of equal conditions having Item_timestamp_literal inside.

            But it appeared that trivial condition elimination haven't been covered in MTR at all so far, this is why you did not get crashes.

            Please find a new patch:

            https://github.com/MariaDB/server/commit/c44b546c0067bdf7aa66706af438b4b024a4578b

            Now instead of "return false", it implements detecting equal Item_timestamp_literal's inside this method, and adds MTR tests.

            bar Alexander Barkov added a comment - - edited psergei , thanks, good catch about MTR. The new method was needed to override the DATETIME-based code in Type_handler_temporal_result::Item_const_eq(), which compares two MYSQL_TIME values, and which is not relevant for TIMESTAMP any more because Item_timestamp_literal does not have a MYSQL_TIME value inside. Without the change, Type_handler_temporal_result::Item_const_eq() would crash on elimination of equal conditions having Item_timestamp_literal inside. But it appeared that trivial condition elimination haven't been covered in MTR at all so far, this is why you did not get crashes. Please find a new patch: https://github.com/MariaDB/server/commit/c44b546c0067bdf7aa66706af438b4b024a4578b Now instead of "return false", it implements detecting equal Item_timestamp_literal's inside this method, and adds MTR tests.
            psergei Sergei Petrunia added a comment - - edited

            An observation: IN is not covered:

            MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE a in ('2020-03-03 01:01:01', '2020-03-04 01:01:01');
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
            | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
            |    1 | SIMPLE      | t1    | range | a             | a    | 4       | NULL | 2    |   100.00 | Using where; Using index |
            +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
            1 row in set, 1 warning (0.002 sec)
             
            MariaDB [test]> show warnings;
            +-------+------+-----------------------------------------------------------------------------------------------------------------------+
            | Level | Code | Message                                                                                                               |
            +-------+------+-----------------------------------------------------------------------------------------------------------------------+
            | Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where `test`.`t1`.`a` in ('2020-03-03 01:01:01','2020-03-04 01:01:01') |
            +-------+------+-----------------------------------------------------------------------------------------------------------------------+
            

            Not sure if it can/should be done as part of this patch, yet, continuing with review.

            BETWEEN is not handled, either:

            explain format=json select * from t1 where a >= '2021-01-01 00:00:00' and  a<='2021-02-02 00:00:00';
            "attached_condition": "t1.a >= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-01-01 00:00:00' and t1.a <= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-02-02 00:00:00'"
             
            explain format=json select * from t1 where a between '2021-01-01 00:00:00' and '2021-02-02 00:00:00';
            "attached_condition": "t1.a between <cache>('2021-01-01 00:00:00') and <cache>('2021-02-02 00:00:00')"
            

            The commit comment for the patch:

                MDEV-32148 Inefficient WHERE timestamp_column=datetime_const_expr
                
                Changing the way how the following condition is evaluated:
                
                    WHERE timestamp_column=datetime_const_expr
                
                Before the change it was always performed as DATETIME.
            ...
            

            this should be changed to indicate that non-equality comparisons are also handled.

            For

            --let $replace_regex_tsltz6= ...
            ...
            --replace_regex $replace_regex_tsltz6
            

            I would ask to introduce something with less cryptic name and use it in only one location:

            --source include/replace_timestamp_w_local_tz_value.inc
            

            psergei Sergei Petrunia added a comment - - edited An observation: IN is not covered: MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 WHERE a in ('2020-03-03 01:01:01', '2020-03-04 01:01:01'); +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | t1 | range | a | a | 4 | NULL | 2 | 100.00 | Using where; Using index | +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+ 1 row in set, 1 warning (0.002 sec)   MariaDB [test]> show warnings; +-------+------+-----------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-----------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where `test`.`t1`.`a` in ('2020-03-03 01:01:01','2020-03-04 01:01:01') | +-------+------+-----------------------------------------------------------------------------------------------------------------------+ Not sure if it can/should be done as part of this patch, yet, continuing with review. BETWEEN is not handled, either: explain format=json select * from t1 where a >= '2021-01-01 00:00:00' and a<='2021-02-02 00:00:00'; "attached_condition": "t1.a >= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-01-01 00:00:00' and t1.a <= TIMESTAMP/*WITH LOCAL TIME ZONE*/'2021-02-02 00:00:00'"   explain format=json select * from t1 where a between '2021-01-01 00:00:00' and '2021-02-02 00:00:00'; "attached_condition": "t1.a between <cache>('2021-01-01 00:00:00') and <cache>('2021-02-02 00:00:00')" The commit comment for the patch: MDEV-32148 Inefficient WHERE timestamp_column=datetime_const_expr Changing the way how the following condition is evaluated: WHERE timestamp_column=datetime_const_expr Before the change it was always performed as DATETIME. ... this should be changed to indicate that non-equality comparisons are also handled. For --let $replace_regex_tsltz6= ... ... --replace_regex $replace_regex_tsltz6 I would ask to introduce something with less cryptic name and use it in only one location: --source include/replace_timestamp_w_local_tz_value.inc

            A suggestion how to get rid of the extra warnings. At the cost of not providing the optimization for the queries that produce such warnings:

            --- item_cmpfunc.cc
            +++ item_cmpfunc.cc
            @@ -460,9 +460,28 @@  get_timestamp_item_for_comparison
                   !expr1->type_handler()->can_return_date())
                 return NULL;
             
            +  struct Count_handler : public Internal_error_handler
            +  {
            +    bool hit=false;
            +    bool handle_condition(THD *thd,
            +                          uint sql_errno,
            +                          const char *sqlstate,
            +                          Sql_condition::enum_warning_level *level,
            +                          const char *msg,
            +                          Sql_condition **cond_hdl)
            +    {
            +      hit=true;
            +      return *level == Sql_condition::WARN_LEVEL_WARN;
            +    }
            +  } cnt_handler;
            +
            +  //Suppress_warnings_error_handler warning_handler;
            +  thd->push_internal_handler(&cnt_handler);
            +
               Datetime dt(thd, expr1, Timestamp::DatetimeOptions(thd));
             
            -  if (!dt.is_valid_datetime())
            +  thd->pop_internal_handler();
            +  if (cnt_handler.hit || !dt.is_valid_datetime())
                 return NULL; // SQL NULL DATETIME, or a DATETIME with zeros in YYYYMMDD
             
               // '0000-00-00 00:00:00' is a special valid MariaDB TIMESTAMP value
            
            

            psergei Sergei Petrunia added a comment - A suggestion how to get rid of the extra warnings. At the cost of not providing the optimization for the queries that produce such warnings: --- item_cmpfunc.cc +++ item_cmpfunc.cc @@ -460,9 +460,28 @@ get_timestamp_item_for_comparison !expr1->type_handler()->can_return_date()) return NULL; + struct Count_handler : public Internal_error_handler + { + bool hit=false; + bool handle_condition(THD *thd, + uint sql_errno, + const char *sqlstate, + Sql_condition::enum_warning_level *level, + const char *msg, + Sql_condition **cond_hdl) + { + hit=true; + return *level == Sql_condition::WARN_LEVEL_WARN; + } + } cnt_handler; + + //Suppress_warnings_error_handler warning_handler; + thd->push_internal_handler(&cnt_handler); + Datetime dt(thd, expr1, Timestamp::DatetimeOptions(thd)); - if (!dt.is_valid_datetime()) + thd->pop_internal_handler(); + if (cnt_handler.hit || !dt.is_valid_datetime()) return NULL; // SQL NULL DATETIME, or a DATETIME with zeros in YYYYMMDD // '0000-00-00 00:00:00' is a special valid MariaDB TIMESTAMP value

            Note to self: The patch has this call:

            expr1->can_eval_in_optimize() 
            

            and it will evaluate ##expr1## if that call returns TRUE. This will happen at Name Resolution (fix_fields()) phase. However, this will not attempt to evaluate "cheap" subqueries: subqueries do not yet have query plans at Name Resolution stage, so they are not considered "cheap".

            psergei Sergei Petrunia added a comment - Note to self: The patch has this call: expr1->can_eval_in_optimize() and it will evaluate ##expr1## if that call returns TRUE. This will happen at Name Resolution (fix_fields()) phase. However, this will not attempt to evaluate "cheap" subqueries: subqueries do not yet have query plans at Name Resolution stage, so they are not considered "cheap".

            Ok, the patch needs the following:

            After that, it's ok to push into 11.3.

            psergei Sergei Petrunia added a comment - Ok, the patch needs the following: Address the input in https://jira.mariadb.org/browse/MDEV-32148?focusedCommentId=277190&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-277190 Consider applying comment/formatting fixes from here: https://gist.github.com/spetrunia/89531b74dfb7ba8aeaf6c293e31aa909 Change the target version to be 11.3 (Discussed this with Monty) After that, it's ok to push into 11.3.

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.