[MDEV-10006] optimizer doesn't convert outer join to inner on views with WHERE clause Created: 2016-04-28  Updated: 2017-05-29  Resolved: 2016-05-11

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.11
Fix Version/s: 10.1.15

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Sub-Tasks:
Key
Summary
Type
Status
Assignee
MDEV-10031 Testing for MDEV-10006 Technical task Closed Elena Stepanova  

 Description   

The optimizer can't reorder tables for OUTER JOINs, but when a later WHERE condition on a column of an outer table requires a non-NULL value it can convert the OUTER join into an INNER one, and then it can also make use of re-ordering.

The same works when the OUTER JOIN is inside a view definition, and a WHERE condition on a column of the outer table is applied on the view ... but apparently only if the view definition doesn't have a WHERE clause.

How to reproduce:

DROP TABLE IF EXISTS t1;
 
CREATE TABLE t1(i1 int primary key, v1 int, key(v1)) engine=myisam;
 
INSERT INTO t1 VALUES (1, 1);
INSERT INTO t1 VALUES (2, 2);
INSERT INTO t1 VALUES (3, 3);
INSERT INTO t1 VALUES (4, 4);
INSERT INTO t1 VALUES (5, 3);
INSERT INTO t1 VALUES (6, 6);
INSERT INTO t1 VALUES (7, 7);
INSERT INTO t1 VALUES (8, 8);
INSERT INTO t1 VALUES (9, 9);
 
DROP TABLE IF EXISTS t2;
 
CREATE TABLE t2(i2 int primary key, v2 int, key(v2)) engine=myisam;
 
INSERT INTO t2 VALUES (1, 1);
INSERT INTO t2 VALUES (2, 2);
INSERT INTO t2 VALUES (3, 3);
INSERT INTO t2 VALUES (4, 4);
INSERT INTO t2 VALUES (5, 3);
INSERT INTO t2 VALUES (6, 6);
INSERT INTO t2 VALUES (7, 7);
INSERT INTO t2 VALUES (8, 8);
INSERT INTO t2 VALUES (9, 9);
 
DROP TABLE IF EXISTS t3;
 
CREATE TABLE t3(i3 int primary key, v3 int, key(v3)) engine=myisam;
 
INSERT INTO t3 VALUES (2, 2);
INSERT INTO t3 VALUES (4, 4);
INSERT INTO t3 VALUES (6, 6);
INSERT INTO t3 VALUES (8, 8);
 
# view definition without WHERE
 
DROP VIEW IF EXISTS v1;
 
CREATE ALGORITHM=MERGE VIEW v1 AS
                  SELECT t1.i1 as i1, t1.v1 as v1,
                         t2.i2 as i2, t2.v2 as v2,
                         t3.i3 as i3, t3.v3 as v3
                    FROM t1
                    JOIN t2 on t1.i1 = t2.i2
               LEFT JOIN t3 on t2.i2 = t3.i3
                   ;
 
# view definition with WHERE
 
DROP VIEW IF EXISTS v2;
 
CREATE ALGORITHM=MERGE VIEW v2 AS
                  SELECT t1.i1 as i1, t1.v1 as v1,
                         t2.i2 as i2, t2.v2 as v2,
                         t3.i3 as i3, t3.v3 as v3
                    FROM t1 JOIN t2 on t1.i1 = t2.i2
               LEFT JOIN t3 on t2.i2 = t3.i3
                   WHERE t1.i1 = t2.i2
                     AND 1 = 1
                   ;
 
# query plan of plain query
 
EXPLAIN EXTENDED SELECT t1.i1 as i1, t1.v1 as v1,
                        t2.i2 as i2, t2.v2 as v2,
                        t3.i3 as i3, t3.v3 as v3
                   FROM t1
                   JOIN t2 on t1.i1 = t2.i2
              LEFT JOIN t3 on t2.i2 = t3.i3
                  WHERE 1 = 1
                    AND t3.v3 = 4
                  ;
 
# query plan for view without WHERE
 
EXPLAIN EXTENDED SELECT * FROM v1 WHERE v3 = 4;
 
# query plan for view with  WHERE
 
EXPLAIN EXTENDED SELECT * FROM v2 WHERE v3 = 4;

The plans for the plain query, and for the VIEW without a WHERE condition inside the definition, look like this:

Plain query:

id      select_type     table   type    possible_keys   key     key_len ref         rows    filtered    Extra
1       SIMPLE          t3      ref     PRIMARY,v3      v3      5       const       1       100.00  
1       SIMPLE          t1      eq_ref  PRIMARY         PRIMARY 4       test.t3.i3  1       100.00  
1       SIMPLE          t2      eq_ref  PRIMARY         PRIMARY 4       test.t3.i3  1       100.00  

View without WHERE:

id      select_type     table   type    possible_keys   key     key_len ref         rows    filtered    Extra
1       SIMPLE          t3      ref     PRIMARY,v3      v3      5       const       1       100.00  
1       SIMPLE          t1      eq_ref  PRIMARY         PRIMARY 4       test.t3.i3  1       100.00  
1       SIMPLE          t2      eq_ref  PRIMARY         PRIMARY 4       test.t3.i3  1       100.00  

So these two are identical.

For the VIEW with the extra WHERE 1=1 in its definition the plan looks like this though:

id      select_type     table   type    possible_keys   key     key_len ref         rows    filtered    Extra
1       SIMPLE          t1      ALL     PRIMARY         NULL    NULL    NULL        9       100.00  
1       SIMPLE          t2      eq_ref  PRIMARY         PRIMARY 4       test.t1.i1  1       100.00  
1       SIMPLE          t3      eq_ref  PRIMARY,v3      PRIMARY 4       test.t1.i1  1       100.00      Using where

So here table reordering didn't happen, and a "Using where" full scan of the VIEW results has to be done ....



 Comments   
Comment by Sergei Petrunia [ 2016-05-04 ]

Compaing these two queries:

EXPLAIN EXTENDED SELECT * FROM v1 WHERE v3 = 4;
EXPLAIN EXTENDED SELECT * FROM v2 WHERE v3 = 4;

relevant parts of EXPLAIN EXTENDED output:

Message: select ... from t1 join t2 join t3 where ((t3.v3 = 4) and (t1.i1 = t3.i3) and (t2.i2 = t3.i3))
Message: select ... from t1 join t2 join t3 where ((t3.v3 = 4) and (t2.i2 = t1.i1) and (t3.i3 = t1.i1))

So, it looks as if outer-to-inner join conversion worked the same way for both queries.

However, if I put a breakpoint into choose_plan(), before the my_qsort2 call, I see the difference:

EXPLAIN EXTENDED SELECT * FROM v1 WHERE v3 = 4;

(gdb) p join->best_ref[2]->table->alias.Ptr
  $25 = 0x7fff5c02d7d0 "t3"
(gdb) p join->best_ref[2]->dependent
  $26 = 0
(gdb) p join->best_ref[2]->found_records
  $27 = 1
(gdb) p join->best_ref[2]->table->pos_in_table_list.embedding
  $29 = (TABLE_LIST *) 0x0

EXPLAIN EXTENDED SELECT * FROM v2 WHERE v3 = 4;

(gdb) p join->best_ref[2]->table->alias.Ptr
  $24 = 0x7fff58026630 "t3"
(gdb) p join->best_ref[2]->dependent
  $25 = 2
(gdb) p join->best_ref[2]->table->pos_in_table_list.embedding
  $28 = (TABLE_LIST *) 0x0

Comment by Sergei Petrunia [ 2016-05-04 ]

That is, the join order of "t3,t1,t2" will not be picked for "FROM v3" query, because the optimizer receives a constraint that t3 must be after table t2.

Comment by Sergei Petrunia [ 2016-05-05 ]

The dependency comes from t3's TABLE::dep_tables. t3->dep_tables=2 (this is t2). TABLE::dep_tables is filled by simplify_joins.

Comment by Sergei Petrunia [ 2016-05-05 ]

VIEW handling is unified with derived table handling, so the issue can be seen
on a query that uses derived table:

EXPLAIN EXTENDED 
SELECT * FROM 
 (SELECT t1.i1 as i1, t1.v1 as v1,
                 t2.i2 as i2, t2.v2 as v2,
                 t3.i3 as i3, t3.v3 as v3
            FROM t1 JOIN t2 on t1.i1 = t2.i2
       LEFT JOIN t3 on t2.i2 = t3.i3
           WHERE t1.i1 = t2.i2
             AND 1 = 1
 ) as w2 
WHERE v3 = 4;

Comment by Sergei Petrunia [ 2016-05-05 ]

Debugging simplify_joins:

Enter simplify_joins(join_list= join->join_list= {w2}, conds= "t3.v3 = 4")
 
  table= w2, 
  table->on_expr= ((t1.i1 = t2.i2) and (1 = 1))
  table->nested_join= ...
 
  Enter simplify_joins(join_list= w2->nested_join={(nest_last_join)},
                       conds="((t1.i1 = t2.i2) and (1 = 1))"
  {
 
    table= (nest_last_join) // denote as $nest1
    table->on_expr=NULL,
 
    Enter simplify_joins(join_list= $nest1->nested_join={t3, (nest_last_join)},
                         conds="((t1.i1 = t2.i2) and (1 = 1))"
    {
      table= t3
      table->on_expr= (t2.i2 = t3.i3)
      table->outer_join=1
 
      // we assign:
      table->dep_tables |= table_on_expr_used_tables 
      // and it becomes:
      table->dep_tables=6
      // 6 = 4 (for t3 itself) + 2 (for t2)
      
      // interesting:
      table->dep_tables&= ~table->embedding->nested_join->used_tables;
      // here the right part has used_tables=4 
      // although embedding's nested join contains t3 and a join nest 
      // which includes t1 and t2.  I suppose, that used_tables value is incomplete
      // ATM.
      
      // Anyhow, after this, we have
      table->dep_tables=2
      // that is, "t3 must follow t2" which looks correct (outer join is not
      // converted to inner, yet).
 
 
      table= (nest_last_join); // denote as $nest2
      table->on_expr= 0;
      table->nested_join= {t1, t2}
 
 
      Enter simplify_joins(join_list= $nest2->nested_join={t2, t1},
                           conds="((t1.i1 = t2.i2) and (1 = 1))"
      {
        table= t2
        table->on_expr= "(t1.i1 = t2.i2)"
        table->outer_join= 0
        // inject table->on_expr into conds.
 
        table= t1
 
        retval= "((t1.i1 = t2.i2) and (1 = 1) and (t1.i1 = t2.i2))"
 
      } // Leave simplify_joins(join_list=$nest2...)
 
      // The list {t3, (nest_last_join)} is flattened into
      // {t3, t2, t1}
 
      retval= "((t1.i1 = t2.i2) and (1 = 1) and (t1.i1 = t2.i2))"
    } // Leave simplify_joins(join_list=$nest1...)
 
    // the list {nest_last_join(t3, t2, t1)} is flattened into
    // {t3,t2,t1}
 
    retval= "((t1.i1 = t2.i2) and (1 = 1) and (t1.i1 = t2.i2))"
  } // Leave simplify_joins
  
  // Now, the second call to simplify joins, with 'conds' as parameter:
 
  Enter simplify_joins(join_list= w2->nested_join={t3, t2, t1},
                       conds="(t3.v3 = 4)"
  {
 
    table= t3
    used_tables=4, not_null_tables=4
    // we set: table->outer_join= 0;
    // however, table->dep_tables=2 remains uncleaned

same content here: https://gist.github.com/spetrunia/d531a08dc748fb01c694710885ac2bc7

Summary:

  • the optimizer does convert t3's outer join to inner
  • however t3's dep_tables is not cleaned.
Comment by Sergei Petrunia [ 2016-05-05 ]

This patch fixes the issue and passes the test suite:

diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index e83f63d..d9849ce 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -14418,6 +14418,9 @@ static void update_const_equal_items(THD *thd, COND *cond, JOIN_TAB *tab,
       if (table->outer_join && !table->embedding && table->table)
         table->table->maybe_null= FALSE;
       table->outer_join= 0;
+      //psergey:
+      if (!(straight_join || table->straight))
+        table->dep_tables= table->embedding? table->embedding->dep_tables: 0;
       if (table->on_expr)
       {
         /* Add ON expression to the WHERE or upper-level ON condition. */

Comment by Sergei Petrunia [ 2016-05-05 ]

elenst, I need

  • a test of 10.1 vs 10.1 with the above patch
  • we need a grammar with nested LEFT/INNER joins, with various ON expressions
  • ON expression must be simple predicates over null-able as well as NOT NULL columns
  • The primary risk (thing to check for) is different query results. I don't expect crashes.
Comment by Elena Stepanova [ 2016-05-11 ]

I ran comparison tests on the patched version against 10.1 without the patch and transformation tests with ExecuteAsView, haven't got any mismatches.
The testing task MDEV-10031 will remain open until the patch passes the review, in case there are changes that require retesting.

Comment by Sergei Petrunia [ 2016-05-11 ]

No changes were requested during the code review. The patch will be pushed as is.

Generated at Thu Feb 08 07:38:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.