Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16426

Optimizer erroneously treats equal constants of different formats as same

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
    • 10.4.0
    • Data types, Optimizer
    • None

    Description

      This script demonstrates that although TIME'00:00:00.0' and TIME'00:00:00.00' are equal to each other for comparison, they are still not equal in other contexts:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a TIME);
      INSERT INTO t1 VALUES ('00:00:00'),('00:00:01'),('00:00:02');
      SELECT LENGTH(COALESCE(TIME'00:00:00.0',a)),LENGTH(COALESCE(TIME'00:00:00.00',a)) FROM t1;
      

      +--------------------------------------+---------------------------------------+
      | LENGTH(COALESCE(TIME'00:00:00.0',a)) | LENGTH(COALESCE(TIME'00:00:00.00',a)) |
      +--------------------------------------+---------------------------------------+
      |                                   10 |                                    11 |
      |                                   10 |                                    11 |
      |                                   10 |                                    11 |
      +--------------------------------------+---------------------------------------+
      

      Notice, the two expressions return a different result (as expected), although TIME literals are equivalent for each other for comparison.

      Now I put the same expressions into a WHERE condition with <=>:

      SELECT * FROM t1 WHERE LENGTH(COALESCE(TIME'00:00:00.0',a))<=>LENGTH(COALESCE(TIME'00:00:00.00',a));
      

      +----------+
      | a        |
      +----------+
      | 00:00:00 |
      | 00:00:01 |
      | 00:00:02 |
      +----------+
      

      It returns all rows. This is wrong. The condition is expected to be false, and the query should return empty set.

      EXPLAIN for the same query tells that the condition was erroneously simplified to true by the optimizer:

      EXPLAIN EXTENDED SELECT * FROM t1 WHERE LENGTH(COALESCE(TIME'00:00:00.0',a))<=>LENGTH(COALESCE(TIME'00:00:00.00',a));
      SHOW WARNINGS;
      

      +-------+------+--------------------------------------------------------+
      | Level | Code | Message                                                |
      +-------+------+--------------------------------------------------------+
      | Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where 1 |
      +-------+------+--------------------------------------------------------+
      

      Attachments

        Issue Links

          Activity

            The same problem is repeatable with DECIMAL:

            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (a DECIMAL(10,3));
            INSERT INTO t1 VALUES (10.0),(10.1);
            SELECT LENGTH(10.0)+a,LENGTH(10.000)+a FROM t1;
            

            +----------------+------------------+
            | LENGTH(10.0)+a | LENGTH(10.000)+a |
            +----------------+------------------+
            |         14.000 |           16.000 |
            |         14.100 |           16.100 |
            +----------------+------------------+
            

            SELECT * FROM t1 WHERE LENGTH(10.0)+a<=>LENGTH(10.000)+a;
            

            +--------+
            | a      |
            +--------+
            | 10.000 |
            | 10.100 |
            +--------+
            

            EXPLAIN EXTENDED SELECT * FROM t1 WHERE LENGTH(10.0)+a<=>LENGTH(10.000)+a;
            SHOW WARNINGS;
            

            +-------+------+--------------------------------------------------------+
            | Level | Code | Message                                                |
            +-------+------+--------------------------------------------------------+
            | Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where 1 |
            +-------+------+--------------------------------------------------------+
            

            Looks wrong. The SELECT query should return empty set, and EXPLAIN should not tell that the condition was simplified to true.

            bar Alexander Barkov added a comment - The same problem is repeatable with DECIMAL: DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a DECIMAL (10,3)); INSERT INTO t1 VALUES (10.0),(10.1); SELECT LENGTH(10.0)+a,LENGTH(10.000)+a FROM t1; +----------------+------------------+ | LENGTH(10.0)+a | LENGTH(10.000)+a | +----------------+------------------+ | 14.000 | 16.000 | | 14.100 | 16.100 | +----------------+------------------+ SELECT * FROM t1 WHERE LENGTH(10.0)+a<=>LENGTH(10.000)+a; +--------+ | a | +--------+ | 10.000 | | 10.100 | +--------+ EXPLAIN EXTENDED SELECT * FROM t1 WHERE LENGTH(10.0)+a<=>LENGTH(10.000)+a; SHOW WARNINGS; +-------+------+--------------------------------------------------------+ | Level | Code | Message | +-------+------+--------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where 1 | +-------+------+--------------------------------------------------------+ Looks wrong. The SELECT query should return empty set, and EXPLAIN should not tell that the condition was simplified to true .
            bar Alexander Barkov added a comment - - edited

            The problem happens because Item_bool_func2::remove_eq_conds() correctly calls eq() for the arguments like this:

            if (args[0]->eq(args[1], true))  // Notice true
            

            but further eq() can handle binary vs non-binary comparison only for string constants (and not for temporal or decimal).

            eq() should be fixed to take into account possible different formats when the binary_cmp parameter is true, for all affected data types (such as temporal and DECIMAL).

            bar Alexander Barkov added a comment - - edited The problem happens because Item_bool_func2::remove_eq_conds() correctly calls eq() for the arguments like this: if (args[0]->eq(args[1], true )) // Notice true but further eq() can handle binary vs non-binary comparison only for string constants (and not for temporal or decimal). eq() should be fixed to take into account possible different formats when the binary_cmp parameter is true , for all affected data types (such as temporal and DECIMAL).
            bar Alexander Barkov added a comment - - edited

            In this scenario it works fine (the constants 'a' and 0x61 are not detected as same):

            SET NAMES utf8;
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (a DECIMAL(10,3));
            INSERT INTO t1 VALUES (10.0),(10.1);
            SELECT CHARSET('a'),CHARSET(0x61),LENGTH(CHARSET('a'))+a,LENGTH(CHARSET(0x61))+a FROM t1;
            

            +--------------+---------------+------------------------+-------------------------+
            | CHARSET('a') | CHARSET(0x61) | LENGTH(CHARSET('a'))+a | LENGTH(CHARSET(0x61))+a |
            +--------------+---------------+------------------------+-------------------------+
            | utf8         | binary        |                 14.000 |                  16.000 |
            | utf8         | binary        |                 14.100 |                  16.100 |
            +--------------+---------------+------------------------+-------------------------+
            

            SELECT * FROM t1 WHERE LENGTH(CHARSET('a'))+a<=>LENGTH(CHARSET(0x61))+a;
            

            Empty set (1.80 sec)
            

            EXPLAIN EXTENDED SELECT * FROM t1 WHERE LENGTH(CHARSET('a'))+a<=>LENGTH(CHARSET(0x61))+a;
            SHOW WARNINGS;
            

            +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | Level | Code | Message                                                                                                                                                               |
            +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where <cache>(octet_length(charset('a'))) + `test`.`t1`.`a` <=> <cache>(octet_length(charset(0x61))) + `test`.`t1`.`a` |
            +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            

            because Item_hex_constant::type() and Item_string::type() returns different codes (STRING_ITEM vs VARBIN_ITEM).

            bar Alexander Barkov added a comment - - edited In this scenario it works fine (the constants 'a' and 0x61 are not detected as same): SET NAMES utf8; DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a DECIMAL (10,3)); INSERT INTO t1 VALUES (10.0),(10.1); SELECT CHARSET( 'a' ),CHARSET(0x61),LENGTH(CHARSET( 'a' ))+a,LENGTH(CHARSET(0x61))+a FROM t1; +--------------+---------------+------------------------+-------------------------+ | CHARSET('a') | CHARSET(0x61) | LENGTH(CHARSET('a'))+a | LENGTH(CHARSET(0x61))+a | +--------------+---------------+------------------------+-------------------------+ | utf8 | binary | 14.000 | 16.000 | | utf8 | binary | 14.100 | 16.100 | +--------------+---------------+------------------------+-------------------------+ SELECT * FROM t1 WHERE LENGTH(CHARSET( 'a' ))+a<=>LENGTH(CHARSET(0x61))+a; Empty set (1.80 sec) EXPLAIN EXTENDED SELECT * FROM t1 WHERE LENGTH(CHARSET( 'a' ))+a<=>LENGTH(CHARSET(0x61))+a; SHOW WARNINGS; +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where <cache>(octet_length(charset('a'))) + `test`.`t1`.`a` <=> <cache>(octet_length(charset(0x61))) + `test`.`t1`.`a` | +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ because Item_hex_constant::type() and Item_string::type() returns different codes (STRING_ITEM vs VARBIN_ITEM).

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.