[MDEV-17606] Query returns wrong results (while using CHARACTER SET utf8) Created: 2018-11-02  Updated: 2020-06-27  Resolved: 2020-06-27

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3
Fix Version/s: 10.1.46, 10.2.33, 10.3.24, 10.4.14, 10.5.5

Type: Bug Priority: Major
Reporter: Alice Sherepa Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

Query returns an empty set, but the same query with additional strict conditions (adding WHERE ..) returns some results.

CREATE TABLE t1 (i1 int);
INSERT INTO t1 VALUES (NULL),(3),(8),(0),(2);
 
CREATE TABLE t2 (v1 varchar(10) CHARACTER SET utf8, KEY v1 (v1(3)));
INSERT INTO t2 VALUES  ('k'),('rid'),('f'),('x');
 
CREATE TABLE t3 (v1 varchar(10) CHARACTER SET utf8, l1 varchar(10), pk int NOT NULL PRIMARY KEY, i1 int , i2 int, KEY v1 (v1(3)));
INSERT INTO t3 VALUES ('measure','m',2,NULL,2),('o','k',3,1970208768,NULL);
 
CREATE TABLE t4 (v1 varchar(10) CHARACTER SET utf8, l1 varchar(10), pk int NOT NULL PRIMARY KEY, i1 int , i2 int, KEY v1 (v1(3)));
INSERT INTO t4 VALUES ('e','we',1,-1,1),('n','o',2,4,4),('r','o',3,3,2),('o','secure',4,NULL,NULL),('t','e',8,-1,6);
 
SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2));
SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2)) WHERE (t4.i2 <=2);

MariaDB [test2]> SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2));
Empty set (0.001 sec)
 
MariaDB [test2]> SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2)) WHERE (t4.i2 <=2);
+---+
| 1 |
+---+
| 1 |
| 1 |
+---+
2 rows in set (0.001 sec)
 
MariaDB [test2]> explain extended SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2));
+------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+------------------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref         | rows | filtered | Extra                              |
+------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+------------------------------------+
|    1 | PRIMARY     | t3    | index  | NULL          | PRIMARY | 4       | NULL        |    2 |   100.00 | Using index                        |
|    1 | PRIMARY     | t1    | ALL    | NULL          | NULL    | NULL    | NULL        |    5 |   100.00 | Using join buffer (flat, BNL join) |
|    1 | PRIMARY     | t4    | eq_ref | PRIMARY       | PRIMARY | 4       | test2.t1.i1 |    1 |   100.00 | Using where                        |
|    2 | SUBQUERY    | t2    | ALL    | NULL          | NULL    | NULL    | NULL        |    4 |   100.00 |                                    |
+------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+------------------------------------+
4 rows in set, 1 warning (0.001 sec)
 
Note (Code 1003): /* select#1 */ select 1 AS `1` from `test2`.`t3` join `test2`.`t1` left join `test2`.`t4` on(`test2`.`t4`.`pk` = `test2`.`t1`.`i1` and `test2`.`t1`.`i1` is not null) where <nop>(<in_optimizer>(`test2`.`t4`.`l1`,<max>(/* select#2 */ select max(`test2`.`t2`.`v1`) from `test2`.`t2`) > <cache>(convert(`test2`.`t4`.`l1` using utf8))))
 
MariaDB [test2]> explain extended SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2)) WHERE (t4.i2 <=2);
+------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+-------------------------------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref         | rows | filtered | Extra                                           |
+------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+-------------------------------------------------+
|    1 | PRIMARY     | t3    | index  | NULL          | PRIMARY | 4       | NULL        |    2 |   100.00 | Using index                                     |
|    1 | PRIMARY     | t1    | ALL    | NULL          | NULL    | NULL    | NULL        |    5 |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    1 | PRIMARY     | t4    | eq_ref | PRIMARY       | PRIMARY | 4       | test2.t1.i1 |    1 |   100.00 | Using where                                     |
|    2 | SUBQUERY    | t2    | ALL    | NULL          | NULL    | NULL    | NULL        |    4 |   100.00 |                                                 |
+------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+-------------------------------------------------+
4 rows in set, 1 warning (0.000 sec)
 
Note (Code 1003): /* select#1 */ select 1 AS `1` from `test2`.`t3` join `test2`.`t1` join `test2`.`t4` where `test2`.`t4`.`pk` = `test2`.`t1`.`i1` and `test2`.`t4`.`i2` <= 2 and <nop>(<in_optimizer>(`test2`.`t4`.`l1`,<max>(/* select#2 */ select max(`test2`.`t2`.`v1`) from `test2`.`t2`) > <cache>(convert(`test2`.`t4`.`l1` using utf8))))



 Comments   
Comment by Varun Gupta (Inactive) [ 2019-04-23 ]

A simple test case

CREATE TABLE t2 (v1 varchar(10) CHARACTER SET utf8, KEY v1 (v1(3)));
INSERT INTO t2 VALUES  ('k'),('rid'),('f'),('x');
create table t1(l1 varchar(10), i2 int); 
insert into t1 values (NULL, NULL),('o',2),('e',6),('o',4),(NULL, NULL);
 
SELECT * FROM t1 where  ( t1.l1 < ANY (SELECT MAX(t2.v1) FROM t2));

Comment by Varun Gupta (Inactive) [ 2020-05-19 ]

First attempt to a patch was

diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc
index c50f87c8f1c..c72390e05f5 100644
--- a/sql/item_subselect.cc
+++ b/sql/item_subselect.cc
@@ -2068,7 +2068,12 @@ bool Item_allany_subselect::transform_into_max_min(JOIN *join)
     The swap is needed for expressions of type 'f1 < ALL ( SELECT ....)'
     where we want to evaluate the sub query even if f1 would be null.
   */
-  subs= func->create_swap(thd, *(optimizer->get_cache()), subs);
+  subs= func->create_swap(thd,
+                          (new (thd->mem_root)
+                           Item_direct_ref(thd, &select_lex ->context,
+                                           (Item**)optimizer->get_cache(),
+                                            (char *)"<no matter>",
+                                            &in_left_expr_name)), subs);
   thd->change_item_tree(place, subs);
   if (subs->fix_fields(thd, &subs))
     DBUG_RETURN(true);

Comment by Varun Gupta (Inactive) [ 2020-05-20 ]

MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 where  ( t1.l1 < ANY(SELECT MAX(t2.v1) FROM t2));
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|    1 | PRIMARY     | t1    | ALL  | NULL          | NULL | NULL    | NULL | 3    |   100.00 | Using where |
|    2 | SUBQUERY    | t2    | ALL  | NULL          | NULL | NULL    | NULL | 4    |   100.00 |             |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
2 rows in set, 1 warning (0.002 sec)

and then see what the refined query looks like

+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`t1`.`l1` AS `l1`,`test`.`t1`.`i2` AS `i2` from `test`.`t1` where <nop>(<in_optimizer>(`test`.`t1`.`l1`,<max>(/* select#2 */ select max(`test`.`t2`.`v1`) from `test`.`t2`) > <cache>(convert(`test`.`t1`.`l1` using utf8)))) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

So if look at the WHERE clause:

where <nop>(<in_optimizer>(t1.l1, <max>(select max(t2.v1) from t2) > <cache>(convert(t1.l1 using utf8))))

So looking at the right hand side of the IN-optimizer
<max>(subquery#2) > <cache>(convert(t1.l1 using utf8))
we are actually converting t1.l1 to utf8 and caching this value.

The problem is we cache the value of t1.l1 (from the left_expr) in default charset(latin1 in this case). But the cache is caching the converted value to utf8.

Solution to fix this problem would be that we cache the value of t1.l1 in default character set and then convert the cached value.

Comment by Varun Gupta (Inactive) [ 2020-05-25 ]

After having a discussion with sanja, he confirms that caching the converted value to utf8 looks wrong in this case.
Also he mentioned that with the min-max optimization there is no need to use a cache as the subquery will be executed only once.

Comment by Varun Gupta (Inactive) [ 2020-05-26 ]

Disabling the cache for Item_in_optimizer is not possible currently, because whenever we call val_* functions on it, it always stores the left expr in the cache

longlong Item_in_optimizer::val_int()
{
  bool tmp;
  DBUG_ASSERT(fixed == 1);
  cache->store(args[0]);
  cache->cache_value();

Maybe in the future we can extend the functionality to not cache the left expr when we have min-max subqueries.

Comment by Varun Gupta (Inactive) [ 2020-05-26 ]

Patch
https://github.com/MariaDB/server/commit/db1f73b26edaace41345f19bca507a7848aa39a8

Comment by Oleksandr Byelkin [ 2020-06-25 ]

OK, to push! Good catch!

Generated at Thu Feb 08 08:37:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.