Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17606

Query returns wrong results (while using CHARACTER SET utf8)

Details

    Description

      Query returns an empty set, but the same query with additional strict conditions (adding WHERE ..) returns some results.

      CREATE TABLE t1 (i1 int);
      INSERT INTO t1 VALUES (NULL),(3),(8),(0),(2);
       
      CREATE TABLE t2 (v1 varchar(10) CHARACTER SET utf8, KEY v1 (v1(3)));
      INSERT INTO t2 VALUES  ('k'),('rid'),('f'),('x');
       
      CREATE TABLE t3 (v1 varchar(10) CHARACTER SET utf8, l1 varchar(10), pk int NOT NULL PRIMARY KEY, i1 int , i2 int, KEY v1 (v1(3)));
      INSERT INTO t3 VALUES ('measure','m',2,NULL,2),('o','k',3,1970208768,NULL);
       
      CREATE TABLE t4 (v1 varchar(10) CHARACTER SET utf8, l1 varchar(10), pk int NOT NULL PRIMARY KEY, i1 int , i2 int, KEY v1 (v1(3)));
      INSERT INTO t4 VALUES ('e','we',1,-1,1),('n','o',2,4,4),('r','o',3,3,2),('o','secure',4,NULL,NULL),('t','e',8,-1,6);
       
      SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2));
      SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2)) WHERE (t4.i2 <=2);
      

      MariaDB [test2]> SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2));
      Empty set (0.001 sec)
       
      MariaDB [test2]> SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2)) WHERE (t4.i2 <=2);
      +---+
      | 1 |
      +---+
      | 1 |
      | 1 |
      +---+
      2 rows in set (0.001 sec)
       
      MariaDB [test2]> explain extended SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2));
      +------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+------------------------------------+
      | id   | select_type | table | type   | possible_keys | key     | key_len | ref         | rows | filtered | Extra                              |
      +------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+------------------------------------+
      |    1 | PRIMARY     | t3    | index  | NULL          | PRIMARY | 4       | NULL        |    2 |   100.00 | Using index                        |
      |    1 | PRIMARY     | t1    | ALL    | NULL          | NULL    | NULL    | NULL        |    5 |   100.00 | Using join buffer (flat, BNL join) |
      |    1 | PRIMARY     | t4    | eq_ref | PRIMARY       | PRIMARY | 4       | test2.t1.i1 |    1 |   100.00 | Using where                        |
      |    2 | SUBQUERY    | t2    | ALL    | NULL          | NULL    | NULL    | NULL        |    4 |   100.00 |                                    |
      +------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+------------------------------------+
      4 rows in set, 1 warning (0.001 sec)
       
      Note (Code 1003): /* select#1 */ select 1 AS `1` from `test2`.`t3` join `test2`.`t1` left join `test2`.`t4` on(`test2`.`t4`.`pk` = `test2`.`t1`.`i1` and `test2`.`t1`.`i1` is not null) where <nop>(<in_optimizer>(`test2`.`t4`.`l1`,<max>(/* select#2 */ select max(`test2`.`t2`.`v1`) from `test2`.`t2`) > <cache>(convert(`test2`.`t4`.`l1` using utf8))))
       
      MariaDB [test2]> explain extended SELECT 1 FROM t3 JOIN (t1 LEFT JOIN t4 ON t4.pk = t1.i1) ON ( t4.l1 < ANY (SELECT MAX(t2.v1) FROM t2)) WHERE (t4.i2 <=2);
      +------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+-------------------------------------------------+
      | id   | select_type | table | type   | possible_keys | key     | key_len | ref         | rows | filtered | Extra                                           |
      +------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+-------------------------------------------------+
      |    1 | PRIMARY     | t3    | index  | NULL          | PRIMARY | 4       | NULL        |    2 |   100.00 | Using index                                     |
      |    1 | PRIMARY     | t1    | ALL    | NULL          | NULL    | NULL    | NULL        |    5 |   100.00 | Using where; Using join buffer (flat, BNL join) |
      |    1 | PRIMARY     | t4    | eq_ref | PRIMARY       | PRIMARY | 4       | test2.t1.i1 |    1 |   100.00 | Using where                                     |
      |    2 | SUBQUERY    | t2    | ALL    | NULL          | NULL    | NULL    | NULL        |    4 |   100.00 |                                                 |
      +------+-------------+-------+--------+---------------+---------+---------+-------------+------+----------+-------------------------------------------------+
      4 rows in set, 1 warning (0.000 sec)
       
      Note (Code 1003): /* select#1 */ select 1 AS `1` from `test2`.`t3` join `test2`.`t1` join `test2`.`t4` where `test2`.`t4`.`pk` = `test2`.`t1`.`i1` and `test2`.`t4`.`i2` <= 2 and <nop>(<in_optimizer>(`test2`.`t4`.`l1`,<max>(/* select#2 */ select max(`test2`.`t2`.`v1`) from `test2`.`t2`) > <cache>(convert(`test2`.`t4`.`l1` using utf8))))
      

      Attachments

        Activity

          A simple test case

          CREATE TABLE t2 (v1 varchar(10) CHARACTER SET utf8, KEY v1 (v1(3)));
          INSERT INTO t2 VALUES  ('k'),('rid'),('f'),('x');
          create table t1(l1 varchar(10), i2 int); 
          insert into t1 values (NULL, NULL),('o',2),('e',6),('o',4),(NULL, NULL);
           
          SELECT * FROM t1 where  ( t1.l1 < ANY (SELECT MAX(t2.v1) FROM t2));
          

          varun Varun Gupta (Inactive) added a comment - A simple test case CREATE TABLE t2 (v1 varchar (10) CHARACTER SET utf8, KEY v1 (v1(3))); INSERT INTO t2 VALUES ( 'k' ),( 'rid' ),( 'f' ),( 'x' ); create table t1(l1 varchar (10), i2 int ); insert into t1 values ( NULL , NULL ),( 'o' ,2),( 'e' ,6),( 'o' ,4),( NULL , NULL );   SELECT * FROM t1 where ( t1.l1 < ANY ( SELECT MAX (t2.v1) FROM t2));

          First attempt to a patch was

          diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc
          index c50f87c8f1c..c72390e05f5 100644
          --- a/sql/item_subselect.cc
          +++ b/sql/item_subselect.cc
          @@ -2068,7 +2068,12 @@ bool Item_allany_subselect::transform_into_max_min(JOIN *join)
               The swap is needed for expressions of type 'f1 < ALL ( SELECT ....)'
               where we want to evaluate the sub query even if f1 would be null.
             */
          -  subs= func->create_swap(thd, *(optimizer->get_cache()), subs);
          +  subs= func->create_swap(thd,
          +                          (new (thd->mem_root)
          +                           Item_direct_ref(thd, &select_lex ->context,
          +                                           (Item**)optimizer->get_cache(),
          +                                            (char *)"<no matter>",
          +                                            &in_left_expr_name)), subs);
             thd->change_item_tree(place, subs);
             if (subs->fix_fields(thd, &subs))
               DBUG_RETURN(true);
          
          

          varun Varun Gupta (Inactive) added a comment - First attempt to a patch was diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc index c50f87c8f1c..c72390e05f5 100644 --- a/sql/item_subselect.cc +++ b/sql/item_subselect.cc @@ -2068,7 +2068,12 @@ bool Item_allany_subselect::transform_into_max_min(JOIN *join) The swap is needed for expressions of type 'f1 < ALL ( SELECT ....)' where we want to evaluate the sub query even if f1 would be null. */ - subs= func->create_swap(thd, *(optimizer->get_cache()), subs); + subs= func->create_swap(thd, + (new (thd->mem_root) + Item_direct_ref(thd, &select_lex ->context, + (Item**)optimizer->get_cache(), + (char *)"<no matter>", + &in_left_expr_name)), subs); thd->change_item_tree(place, subs); if (subs->fix_fields(thd, &subs)) DBUG_RETURN(true);

          MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 where  ( t1.l1 < ANY(SELECT MAX(t2.v1) FROM t2));
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          |    1 | PRIMARY     | t1    | ALL  | NULL          | NULL | NULL    | NULL | 3    |   100.00 | Using where |
          |    2 | SUBQUERY    | t2    | ALL  | NULL          | NULL | NULL    | NULL | 4    |   100.00 |             |
          +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
          2 rows in set, 1 warning (0.002 sec)
          

          and then see what the refined query looks like

          +-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
          | Note  | 1003 | /* select#1 */ select `test`.`t1`.`l1` AS `l1`,`test`.`t1`.`i2` AS `i2` from `test`.`t1` where <nop>(<in_optimizer>(`test`.`t1`.`l1`,<max>(/* select#2 */ select max(`test`.`t2`.`v1`) from `test`.`t2`) > <cache>(convert(`test`.`t1`.`l1` using utf8)))) |
          +-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
          

          So if look at the WHERE clause:

          where <nop>(<in_optimizer>(t1.l1, <max>(select max(t2.v1) from t2) > <cache>(convert(t1.l1 using utf8))))
          

          So looking at the right hand side of the IN-optimizer
          <max>(subquery#2) > <cache>(convert(t1.l1 using utf8))
          we are actually converting t1.l1 to utf8 and caching this value.

          The problem is we cache the value of t1.l1 (from the left_expr) in default charset(latin1 in this case). But the cache is caching the converted value to utf8.

          Solution to fix this problem would be that we cache the value of t1.l1 in default character set and then convert the cached value.

          varun Varun Gupta (Inactive) added a comment - MariaDB [test]> EXPLAIN EXTENDED SELECT * FROM t1 where ( t1.l1 < ANY(SELECT MAX(t2.v1) FROM t2)); +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where | | 2 | SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ 2 rows in set, 1 warning (0.002 sec) and then see what the refined query looks like +-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | /* select#1 */ select `test`.`t1`.`l1` AS `l1`,`test`.`t1`.`i2` AS `i2` from `test`.`t1` where <nop>(<in_optimizer>(`test`.`t1`.`l1`,<max>(/* select#2 */ select max(`test`.`t2`.`v1`) from `test`.`t2`) > <cache>(convert(`test`.`t1`.`l1` using utf8)))) | +-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ So if look at the WHERE clause: where <nop>(<in_optimizer>(t1.l1, <max>(select max(t2.v1) from t2) > <cache>(convert(t1.l1 using utf8)))) So looking at the right hand side of the IN-optimizer <max>(subquery#2) > <cache>(convert(t1.l1 using utf8)) we are actually converting t1.l1 to utf8 and caching this value. The problem is we cache the value of t1.l1 (from the left_expr) in default charset(latin1 in this case). But the cache is caching the converted value to utf8. Solution to fix this problem would be that we cache the value of t1.l1 in default character set and then convert the cached value.

          After having a discussion with sanja, he confirms that caching the converted value to utf8 looks wrong in this case.
          Also he mentioned that with the min-max optimization there is no need to use a cache as the subquery will be executed only once.

          varun Varun Gupta (Inactive) added a comment - After having a discussion with sanja , he confirms that caching the converted value to utf8 looks wrong in this case. Also he mentioned that with the min-max optimization there is no need to use a cache as the subquery will be executed only once.
          varun Varun Gupta (Inactive) added a comment - - edited

          Disabling the cache for Item_in_optimizer is not possible currently, because whenever we call val_* functions on it, it always stores the left expr in the cache

          longlong Item_in_optimizer::val_int()
          {
            bool tmp;
            DBUG_ASSERT(fixed == 1);
            cache->store(args[0]);
            cache->cache_value();
          

          Maybe in the future we can extend the functionality to not cache the left expr when we have min-max subqueries.

          varun Varun Gupta (Inactive) added a comment - - edited Disabling the cache for Item_in_optimizer is not possible currently, because whenever we call val_* functions on it, it always stores the left expr in the cache longlong Item_in_optimizer::val_int() { bool tmp; DBUG_ASSERT(fixed == 1); cache->store(args[0]); cache->cache_value(); Maybe in the future we can extend the functionality to not cache the left expr when we have min-max subqueries.
          varun Varun Gupta (Inactive) added a comment - Patch https://github.com/MariaDB/server/commit/db1f73b26edaace41345f19bca507a7848aa39a8

          OK, to push! Good catch!

          sanja Oleksandr Byelkin added a comment - OK, to push! Good catch!

          People

            varun Varun Gupta (Inactive)
            alice Alice Sherepa
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.