[MDEV-16426] Optimizer erroneously treats equal constants of different formats as same Created: 2018-06-07  Updated: 2018-06-08  Resolved: 2018-06-08

Status: Closed
Project: MariaDB Server
Component/s: Data types, Optimizer
Affects Version/s: 10.0, 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.4.0

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-4912 Data type plugin API version 1 Closed
Relates
relates to MDEV-11361 Equal condition propagation does not ... Open
relates to MDEV-11362 True condition elimination does not w... Closed

 Description   

This script demonstrates that although TIME'00:00:00.0' and TIME'00:00:00.00' are equal to each other for comparison, they are still not equal in other contexts:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a TIME);
INSERT INTO t1 VALUES ('00:00:00'),('00:00:01'),('00:00:02');
SELECT LENGTH(COALESCE(TIME'00:00:00.0',a)),LENGTH(COALESCE(TIME'00:00:00.00',a)) FROM t1;

+--------------------------------------+---------------------------------------+
| LENGTH(COALESCE(TIME'00:00:00.0',a)) | LENGTH(COALESCE(TIME'00:00:00.00',a)) |
+--------------------------------------+---------------------------------------+
|                                   10 |                                    11 |
|                                   10 |                                    11 |
|                                   10 |                                    11 |
+--------------------------------------+---------------------------------------+

Notice, the two expressions return a different result (as expected), although TIME literals are equivalent for each other for comparison.

Now I put the same expressions into a WHERE condition with <=>:

SELECT * FROM t1 WHERE LENGTH(COALESCE(TIME'00:00:00.0',a))<=>LENGTH(COALESCE(TIME'00:00:00.00',a));

+----------+
| a        |
+----------+
| 00:00:00 |
| 00:00:01 |
| 00:00:02 |
+----------+

It returns all rows. This is wrong. The condition is expected to be false, and the query should return empty set.

EXPLAIN for the same query tells that the condition was erroneously simplified to true by the optimizer:

EXPLAIN EXTENDED SELECT * FROM t1 WHERE LENGTH(COALESCE(TIME'00:00:00.0',a))<=>LENGTH(COALESCE(TIME'00:00:00.00',a));
SHOW WARNINGS;

+-------+------+--------------------------------------------------------+
| Level | Code | Message                                                |
+-------+------+--------------------------------------------------------+
| Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where 1 |
+-------+------+--------------------------------------------------------+



 Comments   
Comment by Alexander Barkov [ 2018-06-08 ]

The same problem is repeatable with DECIMAL:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a DECIMAL(10,3));
INSERT INTO t1 VALUES (10.0),(10.1);
SELECT LENGTH(10.0)+a,LENGTH(10.000)+a FROM t1;

+----------------+------------------+
| LENGTH(10.0)+a | LENGTH(10.000)+a |
+----------------+------------------+
|         14.000 |           16.000 |
|         14.100 |           16.100 |
+----------------+------------------+

SELECT * FROM t1 WHERE LENGTH(10.0)+a<=>LENGTH(10.000)+a;

+--------+
| a      |
+--------+
| 10.000 |
| 10.100 |
+--------+

EXPLAIN EXTENDED SELECT * FROM t1 WHERE LENGTH(10.0)+a<=>LENGTH(10.000)+a;
SHOW WARNINGS;

+-------+------+--------------------------------------------------------+
| Level | Code | Message                                                |
+-------+------+--------------------------------------------------------+
| Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where 1 |
+-------+------+--------------------------------------------------------+

Looks wrong. The SELECT query should return empty set, and EXPLAIN should not tell that the condition was simplified to true.

Comment by Alexander Barkov [ 2018-06-08 ]

The problem happens because Item_bool_func2::remove_eq_conds() correctly calls eq() for the arguments like this:

if (args[0]->eq(args[1], true))  // Notice true

but further eq() can handle binary vs non-binary comparison only for string constants (and not for temporal or decimal).

eq() should be fixed to take into account possible different formats when the binary_cmp parameter is true, for all affected data types (such as temporal and DECIMAL).

Comment by Alexander Barkov [ 2018-06-08 ]

In this scenario it works fine (the constants 'a' and 0x61 are not detected as same):

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a DECIMAL(10,3));
INSERT INTO t1 VALUES (10.0),(10.1);
SELECT CHARSET('a'),CHARSET(0x61),LENGTH(CHARSET('a'))+a,LENGTH(CHARSET(0x61))+a FROM t1;

+--------------+---------------+------------------------+-------------------------+
| CHARSET('a') | CHARSET(0x61) | LENGTH(CHARSET('a'))+a | LENGTH(CHARSET(0x61))+a |
+--------------+---------------+------------------------+-------------------------+
| utf8         | binary        |                 14.000 |                  16.000 |
| utf8         | binary        |                 14.100 |                  16.100 |
+--------------+---------------+------------------------+-------------------------+

SELECT * FROM t1 WHERE LENGTH(CHARSET('a'))+a<=>LENGTH(CHARSET(0x61))+a;

Empty set (1.80 sec)

EXPLAIN EXTENDED SELECT * FROM t1 WHERE LENGTH(CHARSET('a'))+a<=>LENGTH(CHARSET(0x61))+a;
SHOW WARNINGS;

+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                               |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` where <cache>(octet_length(charset('a'))) + `test`.`t1`.`a` <=> <cache>(octet_length(charset(0x61))) + `test`.`t1`.`a` |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

because Item_hex_constant::type() and Item_string::type() returns different codes (STRING_ITEM vs VARBIN_ITEM).

Generated at Thu Feb 08 08:28:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.