[MDEV-12838] Wrong results with materialization=on (with subquery and joins) Created: 2017-05-18  Updated: 2020-08-25  Resolved: 2017-06-07

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.0.30, 10.1.23, 10.2.5, 10.2
Fix Version/s: 10.1.25, 5.5.57, 10.0.32, 10.2.7

Type: Bug Priority: Critical
Reporter: Chris Calender (Inactive) Assignee: Igor Babaev
Resolution: Fixed Votes: 1
Labels: None

Attachments: File test_case_obfuscated.sql    
Issue Links:
Duplicate
duplicates MDEV-13296 Incorrect result without an index hint Closed
Relates
relates to MDEV-16751 Server crashes in st_join_table::clea... Closed

 Description   

With optimizer_switch materialization=on, the following test case fails with wrong results in MariaDB (tested 10.0.30, 10.1.23, and 10.2.5). This works with MySQL 5.6 and 5.7, and also with MariaDB if you set materialization=off.

Here is sample output from the sql file (attached) when piped into MariaDB 10.2.5:

>mysql -uroot -pxxx -P3317 -e"select @@version\Gselect @@optimizer_switch\G"
*************************** 1. row ***************************
@@version: 10.2.5-MariaDB
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,
firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=
on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on

C:\>mysql -uroot -pxxx -P3317 < test_case_obfuscated.sql
assignment_group
ebb4620037332000158bbfc8bcbe5d89

C:\>mysql -uroot -pxxx -P3317 -e"set @@global.optimizer_switch='materialization=off'"

C:\>mysql -uroot -pxxx -P3317 -e"select @@version\Gselect @@optimizer_switch\G"
*************************** 1. row ***************************
@@version: 10.2.5-MariaDB
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,
firstmatch=on,loosescan=on,materialization=off,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=
on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on

C:\>mysql -uroot -pxxx -P3317 < test_case_obfuscated.sql
assignment_group
ebb4620037332000158bbfc8bcbe5d89
ec70316637232000158bbfc8bcbe5d60
3120fd2637232000158bbfc8bcbe5d42
e08fad2637232000158bbfc8bcbe5d39



 Comments   
Comment by Igor Babaev [ 2017-05-22 ]

First of all it makes sense to simplify the offending query:

  • the ORDER BY clause does not matter here, so it's removed
  • LEFT JOIN is converted INNER JOIN because anyway the optimizer performs this conversion
  • all INNER JOINS are removed.
    Ultimately we have the following query

    SELECT sm_m2m_group_dependency0.assignment_group
    FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1
    WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND
          sm_m2m_group_dependency0.dispatch_group IN
          (SELECT sys_user_grmember0.group
           FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i
           WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND 
                 sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND
                 sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79');
    

    that still returns a wrong result with 'materialization=on'.

    MariaDB [matt_test]> set optimizer_switch='materialization=on';Query OK, 0 rows affected (0.00 sec)
     
    MariaDB [matt_test]> SELECT sm_m2m_group_dependency0.assignment_group
        -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1
        -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND
        ->       sm_m2m_group_dependency0.dispatch_group IN
        ->       (SELECT sys_user_grmember0.group
        ->        FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i
        ->        WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND 
        ->              sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND
        ->              sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79');
    +----------------------------------+
    | assignment_group                 |
    +----------------------------------+
    | ebb4620037332000158bbfc8bcbe5d89 |
    +----------------------------------+
    MariaDB [matt_test]> set optimizer_switch='materialization=off';
    Query OK, 0 rows affected (0.00 sec)
     
    MariaDB [matt_test]> SELECT sm_m2m_group_dependency0.assignment_group
        -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1
        -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND
        ->       sm_m2m_group_dependency0.dispatch_group IN
        ->       (SELECT sys_user_grmember0.group
        ->        FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i
        ->        WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND 
        ->              sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND
        ->              sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79');
    +----------------------------------+
    | assignment_group                 |
    +----------------------------------+
    | df50316637232000158bbfc8bcbe5d23 |
    | e08fad2637232000158bbfc8bcbe5d39 |
    | ec70316637232000158bbfc8bcbe5d60 |
    | 7b10fd2637232000158bbfc8bcbe5d30 |
    | 3120fd2637232000158bbfc8bcbe5d42 |
    | ebb4620037332000158bbfc8bcbe5d89 |
    +----------------------------------+
    

    The cause of the problem can be seen already in the output of EXPLAIN for the query:

    MariaDB [matt_test]> set optimizer_switch='materialization=on';Query OK, 0 rows affected (0.00 sec)
     
    MariaDB [matt_test]> EXPLAIN
        -> SELECT sm_m2m_group_dependency0.assignment_group
        -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1
        -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND
        ->       sm_m2m_group_dependency0.dispatch_group IN
        ->       (SELECT sys_user_grmember0.group
        ->        FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i
        ->        WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND 
        ->              sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND
        ->              sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79');
    +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+
    | id   | select_type  | table                    | type   | possible_keys                   | key            | key_len | ref                                                 | rows | Extra                              |
    +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+
    |    1 | PRIMARY      | <subquery2>              | ALL    | distinct_key                    | NULL           | NULL    | NULL                                                |    2 |                                    |
    |    1 | PRIMARY      | sm_m2m_group_dependency0 | ref    | dispatch_group,assignment_group | dispatch_group | 99      | matt_test.sys_user_group1_i.sys_id                  |    1 | Using index condition; Using where |
    |    1 | PRIMARY      | sys_user_group1          | eq_ref | PRIMARY                         | PRIMARY        | 96      | matt_test.sm_m2m_group_dependency0.assignment_group |    1 | Using where; Using index           |
    |    2 | MATERIALIZED | sys_user_grmember0       | ref    | group,user                      | user           | 99      | const                                               |    2 | Using index condition; Using where |
    |    2 | MATERIALIZED | sys_user_group1_i        | eq_ref | PRIMARY                         | PRIMARY        | 96      | matt_test.sys_user_grmember0.group                  |    1 | Using where                        |
    +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+
    

    We see that the subquery is materialized and the result of materialization is used to join it with the table sm_m2m_group_dependency0 employing sys_user_group1_i.sys_id as the join key. However the result of materialization of the subquery does not have any column sys_id.

Comment by Igor Babaev [ 2017-05-22 ]

Here's the explanation how it happens.
The IN subquery predicate produces an extra equality to the condition

sm_m2m_group_dependency0.dispatch_group=sys_user_grmember0.group

The subquery has the equality condition:

sys_user_grmember0.group = sys_user_group1_i.sys_id

As a result we have these two possible keys to access the table sm_m2m_group_dependency0 from the tables sys_user_grmember0 and sys_user_group1_i.:

sys_user_grmember0.group
sys_user_group1_i.sys_id

When the optimizer evaluates the partial join order:
(sys_user_grmember0, sys_user_group1_i, sm_m2m_group_dependency0)
it makes the estimates of the cost to access table sm_m2m_group_dependency0 by different keys. The second key is evaluated first. After this the optimizer sees that the access by the first key is not better.
These is ok if the optimizer chooses not to materialize the subquery, and it's not ok otherwise.
The optimizer chooses the key before it decides calling the function advance_sj_state() that the subquery should be materialized. The code that is responsible for this decision can be found in the function Sj_materialization_picker::check_qep().
So to choose the materialization we have to first choose the key to access the table
sm_m2m_group_dependency0, but the validity of this key depends on whether we choose materialization or not. A typical 'catch 22' situation.
Fortunately the code of Sj_materialization_picker::check_qep() allows us to reconsider the chosen key after the decision to choose materialization for the subquery have been made.

Comment by Igor Babaev [ 2017-05-23 ]

Here' s an explanation why using a field not from the materialized table as a key to access an outer table does not cause a crash. The fact is that with the current implementation fields from the materialized table are always copied into the buffers for the tables of the materialized subquery. When the access keys are formed they are constructed over the values from these buffers. If a field does not belong to the result of materialization it's not copied, and some old value is read for this field.

Comment by Igor Babaev [ 2017-05-31 ]

The patch with a fix of the problem was sent to Sergey Petrunia for a review.

Comment by Igor Babaev [ 2017-06-07 ]

The fix for this bug was pushed into the 5.5 tree.

Comment by Chris Calender (Inactive) [ 2017-06-08 ]

Why was this only fixed for 5.5.57, an old version most of our users are not using, and marked as closed.

Shouldn't this be fixed for 10.0, 10.1, and 10.2? Or am I missing something?

Comment by Elena Stepanova [ 2017-06-14 ]

It will be certainly merged up to all versions. I've updated the 'Fix version/s' field.

Generated at Thu Feb 08 08:00:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.