Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35859

group by removal in Item_singlerow_subselect causing inconsistent results

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.27
    • 10.5
    • Optimizer
    • None

    Description

      create table t1 (t1a int, t1b int, t1c int) engine=myisam;
      insert into t1 values (1,1,1),(2,2,2);
       
      create table t2 (t2a int, t2b int, t2c int) engine=myisam;
      insert into t2 values (1,1,1),(2,2,2),(3,3,3);
      

      MariaDB [test]> select * from t2 where t2a in ( select t1a from t1 group by (select t1a > t2b) order by t1a );
       
      +------+------+------+
      | t2a  | t2b  | t2c  |
      +------+------+------+
      |    1 |    1 |    1 |
      |    2 |    2 |    2 |
      +------+------+------+
       
      2 rows in set (0.051 sec)
      MariaDB [test]> select * from t2 where t2a in ( select t1a from t1 group by (select t1a > t2b where true) order by t1a );
      +------+------+------+
      | t2a  | t2b  | t2c  |
      +------+------+------+
      |    1 |    1 |    1 |
      +------+------+------+
      1 row in set (0.019 sec)
       
      MariaDB [test]> prepare s from 'select * from t2 where t2a in ( select t1a from t1 group by (select t1a > t2b) order by t1a )';
       
      Query OK, 0 rows affected (0.006 sec)
      Statement prepared
       
      MariaDB [test]> execute s;
      +------+------+------+
      | t2a  | t2b  | t2c  |
      +------+------+------+
      |    1 |    1 |    1 |
      +------+------+------+
      
      

      Attachments

        Issue Links

          Activity

            Johnston Rex Johnston added a comment - - edited

            This is caused by check_and_do_in_subquery_rewrites()/Item_singlerow_subselect::select_transformer()
            If we disable this transformation, we have more consistency, but...

             
            MariaDB [test]> select * from t2 where t2a in ( select t1a from t1 group by (select t1a > t2b));
            +------+------+------+
            | t2a  | t2b  | t2c  |
            +------+------+------+
            |    1 |    1 |    1 |
            +------+------+------+
            1 row in set (0.018 sec)
             
            MariaDB [test]> select * from t2 where t2a in ( select t1a from t1 group by (select t1a > t2b where true));
            +------+------+------+
            | t2a  | t2b  | t2c  |
            +------+------+------+
            |    1 |    1 |    1 |
            +------+------+------+
            1 row in set (0.056 sec)
             
            MariaDB [test]> select * from t2 where t2a in ( select t1a from t1 group by t1a > t2b);
            +------+------+------+
            | t2a  | t2b  | t2c  |
            +------+------+------+
            |    1 |    1 |    1 |
            |    2 |    2 |    2 |
            +------+------+------+
            2 rows in set (0.006 sec)
            
            

            We clearly have another problem elsewhere.

            Johnston Rex Johnston added a comment - - edited This is caused by check_and_do_in_subquery_rewrites()/Item_singlerow_subselect::select_transformer() If we disable this transformation, we have more consistency, but...   MariaDB [test]> select * from t2 where t2a in ( select t1a from t1 group by ( select t1a > t2b)); + ------+------+------+ | t2a | t2b | t2c | + ------+------+------+ | 1 | 1 | 1 | + ------+------+------+ 1 row in set (0.018 sec)   MariaDB [test]> select * from t2 where t2a in ( select t1a from t1 group by ( select t1a > t2b where true )); + ------+------+------+ | t2a | t2b | t2c | + ------+------+------+ | 1 | 1 | 1 | + ------+------+------+ 1 row in set (0.056 sec)   MariaDB [test]> select * from t2 where t2a in ( select t1a from t1 group by t1a > t2b); + ------+------+------+ | t2a | t2b | t2c | + ------+------+------+ | 1 | 1 | 1 | | 2 | 2 | 2 | + ------+------+------+ 2 rows in set (0.006 sec) We clearly have another problem elsewhere.
            Johnston Rex Johnston added a comment - - edited

            The above problem can be distilled down to this wrong result

            MariaDB [test]> select 2 in (select t1a from t1 group by t1a > 2 order by t1a);
            +---------------------------------------------------------+
            | 2 in (select t1a from t1 group by t1a > 2 order by t1a) |
            +---------------------------------------------------------+
            |                                                       1 |
            +---------------------------------------------------------+
            1 row in set (0.008 sec)
             
            MariaDB [test]>  select t1a from t1 group by t1a > 2 order by t1a;
            +------+
            | t1a  |
            +------+
            |    1 |
            +------+
            

            What is happening here is that the group by clause is removed by remove_redundant_subquery_clauses(), then the condition t1a = 2 is pushed into the Item_subquery by Item_in_subselect::inject_in_to_exists_cond, resulting in the select_lex looking like this

            select t1.t1a from t1 where 2 = t1.t1a or t1.t1a is null having t1.t1a is null order by t1.t1a
            

            which is clearly not equivalent to the original subquery. Note that we remove the group by clause prior to injecting the new condition into the Item_subselect.

            Johnston Rex Johnston added a comment - - edited The above problem can be distilled down to this wrong result MariaDB [test]> select 2 in ( select t1a from t1 group by t1a > 2 order by t1a); + ---------------------------------------------------------+ | 2 in ( select t1a from t1 group by t1a > 2 order by t1a) | + ---------------------------------------------------------+ | 1 | + ---------------------------------------------------------+ 1 row in set (0.008 sec)   MariaDB [test]> select t1a from t1 group by t1a > 2 order by t1a; + ------+ | t1a | + ------+ | 1 | + ------+ What is happening here is that the group by clause is removed by remove_redundant_subquery_clauses() , then the condition t1a = 2 is pushed into the Item_subquery by Item_in_subselect::inject_in_to_exists_cond , resulting in the select_lex looking like this select t1.t1a from t1 where 2 = t1.t1a or t1.t1a is null having t1.t1a is null order by t1.t1a which is clearly not equivalent to the original subquery. Note that we remove the group by clause prior to injecting the new condition into the Item_subselect.
            Johnston Rex Johnston added a comment - - edited

            From remove_redundant_subquery_clauses()

                  for (ORDER *ord= subq_select_lex->group_list.first; ord; ord= ord->next)
                  {
                    /*
                      Do not remove the item if it is used in select list and then referred
                      from GROUP BY clause by its name or number. Example:
             
                        select (select ... ) as SUBQ ...  group by SUBQ
             
                      Here SUBQ cannot be removed.
                    */
                    if (!ord->in_field_list)
                    {
                      /*
                      Not necessary due to workaround for MDEV-28621:
                      (*ord->item)->walk(&Item::eliminate_subselect_processor, FALSE, NULL);
                      */
                      /*
                        Remove from the JOIN::all_fields list any reference to the elements
                        of the eliminated GROUP BY list unless it is 'in_field_list'.
                        This is needed in order not to confuse JOIN::make_aggr_tables_info()
                        when it constructs different structure for execution phase.
                      */
                      List_iterator<Item> li(subq_select_lex->join->all_fields);
                      Item *item;
                      while ((item= li++))
                      {
                        if (item == *ord->item)
                          li.remove();
                      }
                    }
            

            the test, ord->in_field_list assumes that the order is an Item_ident, but here it is an expression.
            We need to walk this (potential) ord expression looking for Item_field and abort group by removal if we find any matching Item_field in the select list expressions.

            Johnston Rex Johnston added a comment - - edited From remove_redundant_subquery_clauses() for (ORDER *ord= subq_select_lex->group_list.first; ord; ord= ord->next) { /* Do not remove the item if it is used in select list and then referred from GROUP BY clause by its name or number. Example:   select (select ... ) as SUBQ ... group by SUBQ   Here SUBQ cannot be removed. */ if (!ord->in_field_list) { /* Not necessary due to workaround for MDEV-28621: (*ord->item)->walk(&Item::eliminate_subselect_processor, FALSE, NULL); */ /* Remove from the JOIN::all_fields list any reference to the elements of the eliminated GROUP BY list unless it is 'in_field_list'. This is needed in order not to confuse JOIN::make_aggr_tables_info() when it constructs different structure for execution phase. */ List_iterator<Item> li(subq_select_lex->join->all_fields); Item *item; while ((item= li++)) { if (item == *ord->item) li. remove (); } } the test, ord->in_field_list assumes that the order is an Item_ident , but here it is an expression. We need to walk this (potential) ord expression looking for Item_field and abort group by removal if we find any matching Item_field in the select list expressions.

            select t1a from t1 group by (select t1a > t2b) order by t1a

            monty Michael Widenius added a comment - select t1a from t1 group by (select t1a > t2b) order by t1a

            People

              Johnston Rex Johnston
              Johnston Rex Johnston
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.