[MDEV-6818] Possibly wrong result with a query comparing a date field to a char constant. Created: 2014-10-01  Updated: 2022-12-11  Resolved: 2022-12-09

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.1
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Cannot Reproduce Votes: 1
Labels: datatype, optimizer


 Description   

After the commit on 10.1 tree, the provided test case started producing a different result.

commit 8bd4716272ef16a4bcd3196ba62f249aa3878998
Merge: 26e048f f8f8a59
Author: Sergei Petrunia <psergey@askmonty.org>
Date:   Tue Sep 9 13:05:28 2014 +0400
 
    Merge ../10.1-orderby-fixes into 10.1

I'm not quite sure which result is correct, but I want to make sure that the change was intentional.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a INT, b INT, d DATE, KEY(d));
INSERT INTO t1 VALUES (1,7,'1900-01-01'),(4,0,'0000-00-00');
SELECT * FROM t1 WHERE d = 'c' OR b >= a;
DROP TABLE t1;

Old result

a	b	d
1	7	1900-01-01
4	0	0000-00-00

Old EXPLAIN

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t1	ALL	d	NULL	NULL	NULL	2	100.00	Using where
Warnings:
Note	1003	select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t1`.`d` AS `d` from `test`.`t1` where ((`test`.`t1`.`d` = 'c') or (`test`.`t1`.`b` >= `test`.`t1`.`a`))

New result

a	b	d
1	7	1900-01-01

New EXPLAIN

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t1	ALL	d	NULL	NULL	NULL	2	100.00	Using where
Warnings:
Note	1003	select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t1`.`d` AS `d` from `test`.`t1` where ((`test`.`t1`.`b` >= `test`.`t1`.`a`))
DROP TABLE t1;



 Comments   
Comment by Sergei Petrunia [ 2014-10-01 ]

So, the missing row is:

a	b	d
4	0	0000-00-00

  • "b>=a" is false for this row
  • d='c' is interesting.

d ids defined as DATE, and then invalid date is compared with string literal 'c'.

Comment by Sergei Petrunia [ 2014-10-01 ]

Let's check how they compare without indexes:

select *,  d = 'c', b >= a from t1;
+------+------+------------+---------+--------+
| a    | b    | d          | d = 'c' | b >= a |
+------+------+------------+---------+--------+
|    1 |    7 | 1900-01-01 |       0 |      1 |
|    4 |    0 | 0000-00-00 |       1 |      0 |
+------+------+------------+---------+--------+

So, '0000-00-00' = 'c'.

Comment by Sergei Petrunia [ 2014-10-01 ]

However, range optimizer has a different idea. Even in the old code (before the 8bd4716272ef16a4bcd3196ba62f249aa3878998 commit), it thinkd that d='c' can never be true:

MariaDB [test]> SELECT * FROM t1 WHERE d = 'c';
Empty set (0.00 sec)

MariaDB [test]> explain SELECT * FROM t1 WHERE d = 'c';
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+

and this does cause result mismatch:

MariaDB [test]> SELECT * FROM t1 use index () WHERE d = 'c';
+------+------+------------+
| a    | b    | d          |
+------+------+------------+
|    4 |    0 | 0000-00-00 |
+------+------+------------+

Comment by Sergei Petrunia [ 2014-10-01 ]

The query

SELECT * FROM t1 WHERE d = 'c' OR b >= a;

used to return the row with 0000-00-00, because "OR b>=a" prevented use of range optimizer.

But then, we've got "MDEV-6480: Remove conditions for which range optimizer returned SEL_ARG::IMPOSSIBLE". The new code sees range optimizer's judgement that d='c' can never be true, and removes it from the WHERE. This causes the difference in query result (although it was possible to construct examples with inconsistent query results before, too).

The solution is to reconcile range optimizer logic with item evaluation logic.

Comment by Sergei Petrunia [ 2014-10-01 ]

Test result from mysql-5.6.20 and mysql-5.6.21:

MySQL [test]> SELECT * FROM t1 WHERE d = 'c' OR b >= a;
+------+------+------------+
| a    | b    | d          |
+------+------+------------+
|    1 |    7 | 1900-01-01 |
+------+------+------------+

It's the same as the new one that we've got. It seems, they have fixed the comparison to not return true:

MySQL [test]> SELECT * FROM t1 use index () WHERE d = 'c';
Empty set, 2 warnings (0.00 sec)

Debugging the range optimizer, I can see that it produces SEL_TREE(NULL), unlike MariaDB which produces IMPOSSIBLE. This is actually weird - if item comparison was changed to evaluate to FALSE, why change range optimizer to not return IMPOSSIBLE?

Comment by Alice Sherepa [ 2022-12-09 ]

current 10.3-10.10 return the correct result:

10.3 d360fa6fa897d9556dc381

SELECT * FROM t1 WHERE d = 'c' OR b >= a;
a	b	d
1	7	1900-01-01
4	0	0000-00-00

Generated at Thu Feb 08 07:14:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.