Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12838

Wrong results with materialization=on (with subquery and joins)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.0.30, 10.1.23, 10.2.5, 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL)
    • 10.1.25, 5.5.57, 10.0.32, 10.2.7
    • Optimizer
    • None

    Description

      With optimizer_switch materialization=on, the following test case fails with wrong results in MariaDB (tested 10.0.30, 10.1.23, and 10.2.5). This works with MySQL 5.6 and 5.7, and also with MariaDB if you set materialization=off.

      Here is sample output from the sql file (attached) when piped into MariaDB 10.2.5:

      >mysql -uroot -pxxx -P3317 -e"select @@version\Gselect @@optimizer_switch\G"
      *************************** 1. row ***************************
      @@version: 10.2.5-MariaDB
      *************************** 1. row ***************************
      @@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,
      firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=
      on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
      

      C:\>mysql -uroot -pxxx -P3317 < test_case_obfuscated.sql
      assignment_group
      ebb4620037332000158bbfc8bcbe5d89
      

      C:\>mysql -uroot -pxxx -P3317 -e"set @@global.optimizer_switch='materialization=off'"
      

      C:\>mysql -uroot -pxxx -P3317 -e"select @@version\Gselect @@optimizer_switch\G"
      *************************** 1. row ***************************
      @@version: 10.2.5-MariaDB
      *************************** 1. row ***************************
      @@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,
      firstmatch=on,loosescan=on,materialization=off,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=
      on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
      

      C:\>mysql -uroot -pxxx -P3317 < test_case_obfuscated.sql
      assignment_group
      ebb4620037332000158bbfc8bcbe5d89
      ec70316637232000158bbfc8bcbe5d60
      3120fd2637232000158bbfc8bcbe5d42
      e08fad2637232000158bbfc8bcbe5d39
      

      Attachments

        Issue Links

          Activity

            igor Igor Babaev (Inactive) added a comment - - edited

            First of all it makes sense to simplify the offending query:

            • the ORDER BY clause does not matter here, so it's removed
            • LEFT JOIN is converted INNER JOIN because anyway the optimizer performs this conversion
            • all INNER JOINS are removed.
              Ultimately we have the following query

              SELECT sm_m2m_group_dependency0.assignment_group
              FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1
              WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND
                    sm_m2m_group_dependency0.dispatch_group IN
                    (SELECT sys_user_grmember0.group
                     FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i
                     WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND 
                           sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND
                           sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79');
              

              that still returns a wrong result with 'materialization=on'.

              MariaDB [matt_test]> set optimizer_switch='materialization=on';Query OK, 0 rows affected (0.00 sec)
               
              MariaDB [matt_test]> SELECT sm_m2m_group_dependency0.assignment_group
                  -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1
                  -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND
                  ->       sm_m2m_group_dependency0.dispatch_group IN
                  ->       (SELECT sys_user_grmember0.group
                  ->        FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i
                  ->        WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND 
                  ->              sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND
                  ->              sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79');
              +----------------------------------+
              | assignment_group                 |
              +----------------------------------+
              | ebb4620037332000158bbfc8bcbe5d89 |
              +----------------------------------+
              MariaDB [matt_test]> set optimizer_switch='materialization=off';
              Query OK, 0 rows affected (0.00 sec)
               
              MariaDB [matt_test]> SELECT sm_m2m_group_dependency0.assignment_group
                  -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1
                  -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND
                  ->       sm_m2m_group_dependency0.dispatch_group IN
                  ->       (SELECT sys_user_grmember0.group
                  ->        FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i
                  ->        WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND 
                  ->              sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND
                  ->              sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79');
              +----------------------------------+
              | assignment_group                 |
              +----------------------------------+
              | df50316637232000158bbfc8bcbe5d23 |
              | e08fad2637232000158bbfc8bcbe5d39 |
              | ec70316637232000158bbfc8bcbe5d60 |
              | 7b10fd2637232000158bbfc8bcbe5d30 |
              | 3120fd2637232000158bbfc8bcbe5d42 |
              | ebb4620037332000158bbfc8bcbe5d89 |
              +----------------------------------+
              

              The cause of the problem can be seen already in the output of EXPLAIN for the query:

              MariaDB [matt_test]> set optimizer_switch='materialization=on';Query OK, 0 rows affected (0.00 sec)
               
              MariaDB [matt_test]> EXPLAIN
                  -> SELECT sm_m2m_group_dependency0.assignment_group
                  -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1
                  -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND
                  ->       sm_m2m_group_dependency0.dispatch_group IN
                  ->       (SELECT sys_user_grmember0.group
                  ->        FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i
                  ->        WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND 
                  ->              sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND
                  ->              sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79');
              +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+
              | id   | select_type  | table                    | type   | possible_keys                   | key            | key_len | ref                                                 | rows | Extra                              |
              +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+
              |    1 | PRIMARY      | <subquery2>              | ALL    | distinct_key                    | NULL           | NULL    | NULL                                                |    2 |                                    |
              |    1 | PRIMARY      | sm_m2m_group_dependency0 | ref    | dispatch_group,assignment_group | dispatch_group | 99      | matt_test.sys_user_group1_i.sys_id                  |    1 | Using index condition; Using where |
              |    1 | PRIMARY      | sys_user_group1          | eq_ref | PRIMARY                         | PRIMARY        | 96      | matt_test.sm_m2m_group_dependency0.assignment_group |    1 | Using where; Using index           |
              |    2 | MATERIALIZED | sys_user_grmember0       | ref    | group,user                      | user           | 99      | const                                               |    2 | Using index condition; Using where |
              |    2 | MATERIALIZED | sys_user_group1_i        | eq_ref | PRIMARY                         | PRIMARY        | 96      | matt_test.sys_user_grmember0.group                  |    1 | Using where                        |
              +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+
              

              We see that the subquery is materialized and the result of materialization is used to join it with the table sm_m2m_group_dependency0 employing sys_user_group1_i.sys_id as the join key. However the result of materialization of the subquery does not have any column sys_id.

            igor Igor Babaev (Inactive) added a comment - - edited First of all it makes sense to simplify the offending query: the ORDER BY clause does not matter here, so it's removed LEFT JOIN is converted INNER JOIN because anyway the optimizer performs this conversion all INNER JOINS are removed. Ultimately we have the following query SELECT sm_m2m_group_dependency0.assignment_group FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1 WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND sm_m2m_group_dependency0.dispatch_group IN (SELECT sys_user_grmember0.group FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79'); that still returns a wrong result with 'materialization=on'. MariaDB [matt_test]> set optimizer_switch='materialization=on';Query OK, 0 rows affected (0.00 sec)   MariaDB [matt_test]> SELECT sm_m2m_group_dependency0.assignment_group -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1 -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND -> sm_m2m_group_dependency0.dispatch_group IN -> (SELECT sys_user_grmember0.group -> FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i -> WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND -> sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND -> sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79'); +----------------------------------+ | assignment_group | +----------------------------------+ | ebb4620037332000158bbfc8bcbe5d89 | +----------------------------------+ MariaDB [matt_test]> set optimizer_switch='materialization=off'; Query OK, 0 rows affected (0.00 sec)   MariaDB [matt_test]> SELECT sm_m2m_group_dependency0.assignment_group -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1 -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND -> sm_m2m_group_dependency0.dispatch_group IN -> (SELECT sys_user_grmember0.group -> FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i -> WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND -> sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND -> sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79'); +----------------------------------+ | assignment_group | +----------------------------------+ | df50316637232000158bbfc8bcbe5d23 | | e08fad2637232000158bbfc8bcbe5d39 | | ec70316637232000158bbfc8bcbe5d60 | | 7b10fd2637232000158bbfc8bcbe5d30 | | 3120fd2637232000158bbfc8bcbe5d42 | | ebb4620037332000158bbfc8bcbe5d89 | +----------------------------------+ The cause of the problem can be seen already in the output of EXPLAIN for the query: MariaDB [matt_test]> set optimizer_switch='materialization=on';Query OK, 0 rows affected (0.00 sec)   MariaDB [matt_test]> EXPLAIN -> SELECT sm_m2m_group_dependency0.assignment_group -> FROM sm_m2m_group_dependency sm_m2m_group_dependency0, sys_user_group sys_user_group1 -> WHERE sm_m2m_group_dependency0.assignment_group = sys_user_group1.sys_id AND -> sm_m2m_group_dependency0.dispatch_group IN -> (SELECT sys_user_grmember0.group -> FROM sys_user_grmember sys_user_grmember0, sys_user_group sys_user_group1_i -> WHERE sys_user_grmember0.group = sys_user_group1_i.sys_id AND -> sys_user_group1_i.type LIKE '59e22fb137032000158bbfc8bcbe5d52' AND -> sys_user_grmember0.user = '86826bf03710200044e0bfc8bcbe5d79'); +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+ | 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 2 | | | 1 | PRIMARY | sm_m2m_group_dependency0 | ref | dispatch_group,assignment_group | dispatch_group | 99 | matt_test.sys_user_group1_i.sys_id | 1 | Using index condition; Using where | | 1 | PRIMARY | sys_user_group1 | eq_ref | PRIMARY | PRIMARY | 96 | matt_test.sm_m2m_group_dependency0.assignment_group | 1 | Using where; Using index | | 2 | MATERIALIZED | sys_user_grmember0 | ref | group,user | user | 99 | const | 2 | Using index condition; Using where | | 2 | MATERIALIZED | sys_user_group1_i | eq_ref | PRIMARY | PRIMARY | 96 | matt_test.sys_user_grmember0.group | 1 | Using where | +------+--------------+--------------------------+--------+---------------------------------+----------------+---------+-----------------------------------------------------+------+------------------------------------+ We see that the subquery is materialized and the result of materialization is used to join it with the table sm_m2m_group_dependency0 employing sys_user_group1_i.sys_id as the join key. However the result of materialization of the subquery does not have any column sys_id.
            igor Igor Babaev (Inactive) added a comment - - edited

            Here's the explanation how it happens.
            The IN subquery predicate produces an extra equality to the condition

            sm_m2m_group_dependency0.dispatch_group=sys_user_grmember0.group
            

            The subquery has the equality condition:

            sys_user_grmember0.group = sys_user_group1_i.sys_id
            

            As a result we have these two possible keys to access the table sm_m2m_group_dependency0 from the tables sys_user_grmember0 and sys_user_group1_i.:

            sys_user_grmember0.group
            sys_user_group1_i.sys_id
            

            When the optimizer evaluates the partial join order:
            (sys_user_grmember0, sys_user_group1_i, sm_m2m_group_dependency0)
            it makes the estimates of the cost to access table sm_m2m_group_dependency0 by different keys. The second key is evaluated first. After this the optimizer sees that the access by the first key is not better.
            These is ok if the optimizer chooses not to materialize the subquery, and it's not ok otherwise.
            The optimizer chooses the key before it decides calling the function advance_sj_state() that the subquery should be materialized. The code that is responsible for this decision can be found in the function Sj_materialization_picker::check_qep().
            So to choose the materialization we have to first choose the key to access the table
            sm_m2m_group_dependency0, but the validity of this key depends on whether we choose materialization or not. A typical 'catch 22' situation.
            Fortunately the code of Sj_materialization_picker::check_qep() allows us to reconsider the chosen key after the decision to choose materialization for the subquery have been made.

            igor Igor Babaev (Inactive) added a comment - - edited Here's the explanation how it happens. The IN subquery predicate produces an extra equality to the condition sm_m2m_group_dependency0.dispatch_group=sys_user_grmember0.group The subquery has the equality condition: sys_user_grmember0.group = sys_user_group1_i.sys_id As a result we have these two possible keys to access the table sm_m2m_group_dependency0 from the tables sys_user_grmember0 and sys_user_group1_i.: sys_user_grmember0.group sys_user_group1_i.sys_id When the optimizer evaluates the partial join order: (sys_user_grmember0, sys_user_group1_i, sm_m2m_group_dependency0) it makes the estimates of the cost to access table sm_m2m_group_dependency0 by different keys. The second key is evaluated first. After this the optimizer sees that the access by the first key is not better. These is ok if the optimizer chooses not to materialize the subquery, and it's not ok otherwise. The optimizer chooses the key before it decides calling the function advance_sj_state() that the subquery should be materialized. The code that is responsible for this decision can be found in the function Sj_materialization_picker::check_qep(). So to choose the materialization we have to first choose the key to access the table sm_m2m_group_dependency0, but the validity of this key depends on whether we choose materialization or not. A typical 'catch 22' situation. Fortunately the code of Sj_materialization_picker::check_qep() allows us to reconsider the chosen key after the decision to choose materialization for the subquery have been made.

            Here' s an explanation why using a field not from the materialized table as a key to access an outer table does not cause a crash. The fact is that with the current implementation fields from the materialized table are always copied into the buffers for the tables of the materialized subquery. When the access keys are formed they are constructed over the values from these buffers. If a field does not belong to the result of materialization it's not copied, and some old value is read for this field.

            igor Igor Babaev (Inactive) added a comment - Here' s an explanation why using a field not from the materialized table as a key to access an outer table does not cause a crash. The fact is that with the current implementation fields from the materialized table are always copied into the buffers for the tables of the materialized subquery. When the access keys are formed they are constructed over the values from these buffers. If a field does not belong to the result of materialization it's not copied, and some old value is read for this field.

            The patch with a fix of the problem was sent to Sergey Petrunia for a review.

            igor Igor Babaev (Inactive) added a comment - The patch with a fix of the problem was sent to Sergey Petrunia for a review.

            The fix for this bug was pushed into the 5.5 tree.

            igor Igor Babaev (Inactive) added a comment - The fix for this bug was pushed into the 5.5 tree.

            Why was this only fixed for 5.5.57, an old version most of our users are not using, and marked as closed.

            Shouldn't this be fixed for 10.0, 10.1, and 10.2? Or am I missing something?

            ccalender Chris Calender (Inactive) added a comment - Why was this only fixed for 5.5.57, an old version most of our users are not using, and marked as closed. Shouldn't this be fixed for 10.0, 10.1, and 10.2? Or am I missing something?

            It will be certainly merged up to all versions. I've updated the 'Fix version/s' field.

            elenst Elena Stepanova added a comment - It will be certainly merged up to all versions. I've updated the 'Fix version/s' field.

            People

              igor Igor Babaev (Inactive)
              ccalender Chris Calender (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.