[MDEV-25841] group by in subquery returns wrong result Created: 2021-06-02  Updated: 2021-12-21  Resolved: 2021-06-04

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5.10
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Allen Lee (Inactive) Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Environment:

CentOS, Amazon EC2


Issue Links:
Duplicate
duplicates MDEV-25714 Join using derived with aggregation r... Closed

 Description   

Depending on the existence of "default null" on column definition, it leads to different query result.

  • Case 1

    create table AN_185324_T1 (CHANGE_REQUEST_ID int(11) NOT NULL, GLOBAL_DOC_ID int(11) DEFAULT NULL, PRIMARY KEY (CHANGE_REQUEST_ID)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    create table AN_185324_T2 (DOC_ID int(11) {color:red}DEFAULT NULL{color}, LLE_ID int(11) DEFAULT NULL, KEY (LLE_ID), KEY(DOC_ID)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
     
    insert into AN_185324_T1 VALUES (1,2155),(2,2155);
    insert into AN_185324_T2 VALUES (2154,NULL),(2155,NULL);
     
    MariaDB [CS0312790]> select t1.CHANGE_REQUEST_ID FROM ( AN_185324_T1 t1 join (select * FROM AN_185324_T2 WHERE LLE_ID IS NULL and DOC_ID IS NOT NULL GROUP BY DOC_ID) t2 ON (t1.GLOBAL_DOC_ID = t2.DOC_ID)) WHERE t1.CHANGE_REQUEST_ID=1;
    Empty set (0.001 sec)
     
    MariaDB [CS0312790]> SELECT DOC_ID FROM AN_185324_T2 WHERE LLE_ID IS NULL AND DOC_ID IS NOT NULL GROUP BY DOC_ID;
    +--------+
    | DOC_ID |
    +--------+
    |   2154 |
    |   2155 |
    +--------+
    2 rows in set (0.000 sec)
     
    Once replacing group by with select distinct, it returns OK.
    MariaDB [CS0312790]> select t1.CHANGE_REQUEST_ID FROM ( AN_185324_T1 t1 join (select distinct DOC_ID FROM AN_185324_T2 WHERE LLE_ID IS NULL and DOC_ID IS NOT NULL) t2 ON (t1.GLOBAL_DOC_ID = t2.DOC_ID)) WHERE t1.CHANGE_REQUEST_ID=1;
    +-------------------+
    | CHANGE_REQUEST_ID |
    +-------------------+
    |                 1 |
    +-------------------+
    1 row in set (0.000 sec)
    

  • sql_mode=NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
  • optimizer_switch = index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=off,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
  • Case 2 - removing `default null` and make DOC_ID column into not null, it returns the right result as well.

CREATE TABLE AN_185324_T2 (DOC_ID INT NOT NULL, LLE_ID INT DEFAULT NULL, KEY (LLE_ID), KEY(DOC_ID)) ENGINE=INNODB DEFAULT CHARSET=utf8;
 
MariaDB [CS0312790]> select t1.CHANGE_REQUEST_ID FROM ( AN_185324_T1 t1 join (select * FROM AN_185324_T2 WHERE LLE_ID IS NULL and DOC_ID IS NOT NULL GROUP BY DOC_ID) t2 ON (t1.GLOBAL_DOC_ID = t2.DOC_ID)) WHERE t1.CHANGE_REQUEST_ID=1;
+-------------------+
| CHANGE_REQUEST_ID |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.001 sec)



 Comments   
Comment by Alice Sherepa [ 2021-06-04 ]

Fixed by 663bc849b5a26a5325adf009a8e commit by Igor Babaev

commit 663bc849b5a26a5325adf009a8e8fa9155c6b833 (origin/bb-10.3-igor)
Author: Igor Babaev <igor@askmonty.org>
Date:   Wed May 26 23:41:59 2021 -0700
 
    MDEV-25714 Join using derived with aggregation returns incorrect results
    
    If a join query uses a derived table (view / CTE) with GROUP BY clause then
    the execution plan for such join may employ split optimization. When this
    optimization is employed the derived table is not materialized. Rather only
    some partitions of the derived table are subject to grouping. Split
    optimization can be applied only if:
    - there are some indexes over the tables used in the join specifying the
      derived table whose prefixes partially cover the field items used in the
      GROUP BY list (such indexes are called splitting indexes)
    - the WHERE condition of the join query contains conjunctive equalities
      between columns of the derived table that comprise major parts of
      splitting indexes and columns of the other join tables.
    
    When the optimizer evaluates extending of a partial join by the rows of the
    derived table it always considers a possibility of using split optimization.
    Different splitting indexes can be used depending on the extended partial
    join. At some rare conditions, for example, when there is a non-splitting
    covering index for a table joined in the join specifying the derived table
    usage of a splitting index to produce rows needed for grouping may be still
    less beneficial than usage of such covering index without any splitting
    technique. The function JOIN_TAB::choose_best_splitting() must take this
    into account.
    

Please use optimizer_switch='split_materialized=off' as a temporary workaround:

MariaDB [test]> select t1.CHANGE_REQUEST_ID FROM ( t1 t1 join (select distinct DOC_ID FROM t2 WHERE LLE_ID IS NULL and DOC_ID IS NOT NULL GROUP BY DOC_ID) t2 ON (t1.GLOBAL_DOC_ID = t2.DOC_ID)) WHERE t1.CHANGE_REQUEST_ID=1;
Empty set (0.001 sec)
 
MariaDB [test]> set optimizer_switch='split_materialized=off';
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> select t1.CHANGE_REQUEST_ID FROM ( t1 t1 join (select distinct DOC_ID FROM t2 WHERE LLE_ID IS NULL and DOC_ID IS NOT NULL GROUP BY DOC_ID) t2 ON (t1.GLOBAL_DOC_ID = t2.DOC_ID)) WHERE t1.CHANGE_REQUEST_ID=1;
+-------------------+
| CHANGE_REQUEST_ID |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.001 sec)

Generated at Thu Feb 08 09:40:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.