Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-537

Make multi-column non-top level subqueries to be executed via index (index/unique subquery) instead of single_select_engine

Details

    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.2
    • None
    • None

    Description

      Multi-column non-top level subqueries can be executed via the unique_subquery/index_subquery methods instead of the general single_select engine.

      If the same queries are transformed into single-column INs, then unique_subquery/index_subquery is chosen. However in some cases the IN-EXISTS transformation for multi-column subqueries adds unnecessary null-rejecting conditions that prevent the use of the index-based subquery access methods. The problem is that the method {Item_in_subselect::create_row_in_to_exists_cond()}} adds Item_is_not_null_test and Item_func_trig_cond without looking at the left IN operand. At the same time, the analogous method for single columns does that, and doesn't add the above conditions if the left argument cannot be NULL.

      The proposed patch is:

      @@ -2290,7 +2303,7 @@ Item_in_subselect::create_row_in_to_exis
                                                ref_pointer_array+i,
                                                (char *)"<no matter>",
                                                (char *)"<list ref>"));
      -      if (!abort_on_null)
      +      if (!abort_on_null && select_lex->ref_pointer_array[i]->maybe_null)
             {
               Item *having_col_item=
                 new Item_is_not_null_test(this,
      @@ -2309,10 +2322,6 @@ Item_in_subselect::create_row_in_to_exis
                                                  (char *)"<no matter>",
                                                  (char *)"<list ref>"));
               item= new Item_cond_or(item, item_isnull);
      -        /* 
      -          TODO: why we create the above for cases where the right part
      -                cant be NULL?
      -        */
               if (left_expr->element_index(i)->maybe_null)
               {
                 if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
      @@ -2323,6 +2332,11 @@ Item_in_subselect::create_row_in_to_exis
               }
               *having_item= and_items(*having_item, having_col_item);
             }
      +      if (!abort_on_null && left_expr->element_index(i)->maybe_null)
      +      {
      +        if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
      +          DBUG_RETURN(true);
      +      }
             *where_item= and_items(*where_item, item);
           }
         }

      The change is proposed for 10.0 because it will change all affected query plans to use new access methods.

      Attachments

        Activity

          timour Timour Katchaounov (Inactive) created issue -
          timour Timour Katchaounov (Inactive) made changes -
          Field Original Value New Value
          Description While working on a bug fix in the subquery code, I noticed that some legacy has remained from the past, that can be removed to improve/simplify the code:

          1. Multi-column non-top level subqueries that could be executed via the unique_subquery/index_subquery methods are executed via the general single_select engine.

          If the same queries are transformed into single-column INs, then unique_subquery/index_subquery is chosen.

          The problem is that the method Item_in_subselect::create_row_in_to_exists_cond()
          adds Item_is_not_null_test and Item_func_trig_cond without looking at the left IN operand. At the same time, the analogous method for single columns does that, and
          doesn't add the above conditions if the left argument cannot be NULL.

          The proposed patch is:


          @@ -2290,7 +2303,7 @@ Item_in_subselect::create_row_in_to_exis
                                                    ref_pointer_array+i,
                                                    (char *)"<no matter>",
                                                    (char *)"<list ref>"));
          - if (!abort_on_null)
          + if (!abort_on_null && select_lex->ref_pointer_array[i]->maybe_null)
                 {
                   Item *having_col_item=
                     new Item_is_not_null_test(this,
          @@ -2309,10 +2322,6 @@ Item_in_subselect::create_row_in_to_exis
                                                      (char *)"<no matter>",
                                                      (char *)"<list ref>"));
                   item= new Item_cond_or(item, item_isnull);
          - /*
          - TODO: why we create the above for cases where the right part
          - cant be NULL?
          - */
                   if (left_expr->element_index(i)->maybe_null)
                   {
                     if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          @@ -2323,6 +2332,11 @@ Item_in_subselect::create_row_in_to_exis
                   }
                   *having_item= and_items(*having_item, having_col_item);
                 }
          + if (!abort_on_null && left_expr->element_index(i)->maybe_null)
          + {
          + if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          + DBUG_RETURN(true);
          + }
                 *where_item= and_items(*where_item, item);
               }
             }


          2. enum store_key_result can be transformed into boolean

          When analyzing subselect_uniquesubquery_engine::copy_ref_key I encountered
          enum store_key_result { STORE_KEY_OK, STORE_KEY_FATAL, STORE_KEY_CONV }
          It turns out that the last value STORE_KEY_CONV is not used anywhere, and
          the above enum is not needed, since two enums can be represented by bool.

          3. Unneeded extra call to engine->exec() in Item_subselect::exec

          In MariaDB 5.3 I introduced early subquery optimization. As a result the logic
          in Item_subselect::exec that calls engine->exec second time if the engine was
          changed is not needed. The reason is that in the past optimization and engine
          change was done lazily during the first call to engine->exec(). From 5.3 this is
          not true, the engine is changed before execution, so even the first execution is
          done with the right engine.

          We cannot simply remove this logic completely, because there are still few
          border cases when optimization is done lazily. However Item_subselect::exec
          should check if the engine was chosen within itself (and then re-execute),
          or if the engine was chosen before execution, and then do not re-execute.
          TODO: the task should be split into two - one for 5.5, one for 10.0

          While working on a bug fix in the subquery code, I noticed that some legacy has remained from the past, that can be removed to improve/simplify the code:

          1. Multi-column non-top level subqueries that could be executed via the unique_subquery/index_subquery methods are executed via the general single_select engine.

          If the same queries are transformed into single-column INs, then unique_subquery/index_subquery is chosen.

          The problem is that the method Item_in_subselect::create_row_in_to_exists_cond()
          adds Item_is_not_null_test and Item_func_trig_cond without looking at the left IN operand. At the same time, the analogous method for single columns does that, and
          doesn't add the above conditions if the left argument cannot be NULL.

          The proposed patch is:


          @@ -2290,7 +2303,7 @@ Item_in_subselect::create_row_in_to_exis
                                                    ref_pointer_array+i,
                                                    (char *)"<no matter>",
                                                    (char *)"<list ref>"));
          - if (!abort_on_null)
          + if (!abort_on_null && select_lex->ref_pointer_array[i]->maybe_null)
                 {
                   Item *having_col_item=
                     new Item_is_not_null_test(this,
          @@ -2309,10 +2322,6 @@ Item_in_subselect::create_row_in_to_exis
                                                      (char *)"<no matter>",
                                                      (char *)"<list ref>"));
                   item= new Item_cond_or(item, item_isnull);
          - /*
          - TODO: why we create the above for cases where the right part
          - cant be NULL?
          - */
                   if (left_expr->element_index(i)->maybe_null)
                   {
                     if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          @@ -2323,6 +2332,11 @@ Item_in_subselect::create_row_in_to_exis
                   }
                   *having_item= and_items(*having_item, having_col_item);
                 }
          + if (!abort_on_null && left_expr->element_index(i)->maybe_null)
          + {
          + if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          + DBUG_RETURN(true);
          + }
                 *where_item= and_items(*where_item, item);
               }
             }


          2. enum store_key_result can be transformed into boolean

          When analyzing subselect_uniquesubquery_engine::copy_ref_key I encountered
          enum store_key_result { STORE_KEY_OK, STORE_KEY_FATAL, STORE_KEY_CONV }
          It turns out that the last value STORE_KEY_CONV is not used anywhere, and
          the above enum is not needed, since two enums can be represented by bool.

          3. Unneeded extra call to engine->exec() in Item_subselect::exec

          In MariaDB 5.3 I introduced early subquery optimization. As a result the logic
          in Item_subselect::exec that calls engine->exec second time if the engine was
          changed is not needed. The reason is that in the past optimization and engine
          change was done lazily during the first call to engine->exec(). From 5.3 this is
          not true, the engine is changed before execution, so even the first execution is
          done with the right engine.

          We cannot simply remove this logic completely, because there are still few
          border cases when optimization is done lazily. However Item_subselect::exec
          should check if the engine was chosen within itself (and then re-execute),
          or if the engine was chosen before execution, and then do not re-execute.
          Due Date 2012-09-28 2012-10-12
          serg Sergei Golubchik made changes -
          Description TODO: the task should be split into two - one for 5.5, one for 10.0

          While working on a bug fix in the subquery code, I noticed that some legacy has remained from the past, that can be removed to improve/simplify the code:

          1. Multi-column non-top level subqueries that could be executed via the unique_subquery/index_subquery methods are executed via the general single_select engine.

          If the same queries are transformed into single-column INs, then unique_subquery/index_subquery is chosen.

          The problem is that the method Item_in_subselect::create_row_in_to_exists_cond()
          adds Item_is_not_null_test and Item_func_trig_cond without looking at the left IN operand. At the same time, the analogous method for single columns does that, and
          doesn't add the above conditions if the left argument cannot be NULL.

          The proposed patch is:


          @@ -2290,7 +2303,7 @@ Item_in_subselect::create_row_in_to_exis
                                                    ref_pointer_array+i,
                                                    (char *)"<no matter>",
                                                    (char *)"<list ref>"));
          - if (!abort_on_null)
          + if (!abort_on_null && select_lex->ref_pointer_array[i]->maybe_null)
                 {
                   Item *having_col_item=
                     new Item_is_not_null_test(this,
          @@ -2309,10 +2322,6 @@ Item_in_subselect::create_row_in_to_exis
                                                      (char *)"<no matter>",
                                                      (char *)"<list ref>"));
                   item= new Item_cond_or(item, item_isnull);
          - /*
          - TODO: why we create the above for cases where the right part
          - cant be NULL?
          - */
                   if (left_expr->element_index(i)->maybe_null)
                   {
                     if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          @@ -2323,6 +2332,11 @@ Item_in_subselect::create_row_in_to_exis
                   }
                   *having_item= and_items(*having_item, having_col_item);
                 }
          + if (!abort_on_null && left_expr->element_index(i)->maybe_null)
          + {
          + if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          + DBUG_RETURN(true);
          + }
                 *where_item= and_items(*where_item, item);
               }
             }


          2. enum store_key_result can be transformed into boolean

          When analyzing subselect_uniquesubquery_engine::copy_ref_key I encountered
          enum store_key_result { STORE_KEY_OK, STORE_KEY_FATAL, STORE_KEY_CONV }
          It turns out that the last value STORE_KEY_CONV is not used anywhere, and
          the above enum is not needed, since two enums can be represented by bool.

          3. Unneeded extra call to engine->exec() in Item_subselect::exec

          In MariaDB 5.3 I introduced early subquery optimization. As a result the logic
          in Item_subselect::exec that calls engine->exec second time if the engine was
          changed is not needed. The reason is that in the past optimization and engine
          change was done lazily during the first call to engine->exec(). From 5.3 this is
          not true, the engine is changed before execution, so even the first execution is
          done with the right engine.

          We cannot simply remove this logic completely, because there are still few
          border cases when optimization is done lazily. However Item_subselect::exec
          should check if the engine was chosen within itself (and then re-execute),
          or if the engine was chosen before execution, and then do not re-execute.
          *TODO:* the task should be split into two - one for 5.5, one for 10.0

          While working on a bug fix in the subquery code, I noticed that some legacy has remained from the past, that can be removed to improve/simplify the code:

          1. Multi-column non-top level subqueries that could be executed via the unique_subquery/index_subquery methods are executed via the general single_select engine.

          If the same queries are transformed into single-column INs, then unique_subquery/index_subquery is chosen.

          The problem is that the method {{Item_in_subselect::create_row_in_to_exists_cond()}}
          adds {{Item_is_not_null_test}} and {{Item_func_trig_cond}} without looking at the left IN operand. At the same time, the analogous method for single columns does that, and
          doesn't add the above conditions if the left argument cannot be NULL.

          The proposed patch is:


          {noformat}
          @@ -2290,7 +2303,7 @@ Item_in_subselect::create_row_in_to_exis
                                                    ref_pointer_array+i,
                                                    (char *)"<no matter>",
                                                    (char *)"<list ref>"));
          - if (!abort_on_null)
          + if (!abort_on_null && select_lex->ref_pointer_array[i]->maybe_null)
                 {
                   Item *having_col_item=
                     new Item_is_not_null_test(this,
          @@ -2309,10 +2322,6 @@ Item_in_subselect::create_row_in_to_exis
                                                      (char *)"<no matter>",
                                                      (char *)"<list ref>"));
                   item= new Item_cond_or(item, item_isnull);
          - /*
          - TODO: why we create the above for cases where the right part
          - cant be NULL?
          - */
                   if (left_expr->element_index(i)->maybe_null)
                   {
                     if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          @@ -2323,6 +2332,11 @@ Item_in_subselect::create_row_in_to_exis
                   }
                   *having_item= and_items(*having_item, having_col_item);
                 }
          + if (!abort_on_null && left_expr->element_index(i)->maybe_null)
          + {
          + if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          + DBUG_RETURN(true);
          + }
                 *where_item= and_items(*where_item, item);
               }
             }
          {noformat}

          2. {{enum store_key_result}} can be transformed into boolean

          When analyzing {{subselect_uniquesubquery_engine::copy_ref_key}} I encountered
          {noformat}
          enum store_key_result { STORE_KEY_OK, STORE_KEY_FATAL, STORE_KEY_CONV }
          {noformat}
          It turns out that the last value {{STORE_KEY_CONV}} is not used anywhere, and
          the above {{enum}} is not needed, since two {{enum}}'s can be represented by {{bool}}.

          3. Unneeded extra call to {{engine->exec()}} in {{Item_subselect::exec}}

          In MariaDB 5.3 I introduced early subquery optimization. As a result the logic
          in {{Item_subselect::exec}} that calls {{engine->exec}} second time if the engine was
          changed is not needed. The reason is that in the past optimization and engine
          change was done lazily during the first call to {{engine->exec()}}. From 5.3 this is
          not true, the engine is changed before execution, so even the first execution is
          done with the right engine.

          We cannot simply remove this logic completely, because there are still few
          border cases when optimization is done lazily. However {{Item_subselect::exec}}
          should check if the engine was chosen within itself (and then re-execute),
          or if the engine was chosen before execution, and then do not re-execute.
          serg Sergei Golubchik made changes -
          Fix Version/s 5.5.29 [ 11701 ]
          Fix Version/s 5.5.28 [ 11200 ]
          timour Timour Katchaounov (Inactive) made changes -
          Summary Minor non-semijoin subquery improvements Make multi-column non-top level subqueries to be executed via index (index/unique subquery) instead of single_select_engine
          timour Timour Katchaounov (Inactive) made changes -
          Description *TODO:* the task should be split into two - one for 5.5, one for 10.0

          While working on a bug fix in the subquery code, I noticed that some legacy has remained from the past, that can be removed to improve/simplify the code:

          1. Multi-column non-top level subqueries that could be executed via the unique_subquery/index_subquery methods are executed via the general single_select engine.

          If the same queries are transformed into single-column INs, then unique_subquery/index_subquery is chosen.

          The problem is that the method {{Item_in_subselect::create_row_in_to_exists_cond()}}
          adds {{Item_is_not_null_test}} and {{Item_func_trig_cond}} without looking at the left IN operand. At the same time, the analogous method for single columns does that, and
          doesn't add the above conditions if the left argument cannot be NULL.

          The proposed patch is:


          {noformat}
          @@ -2290,7 +2303,7 @@ Item_in_subselect::create_row_in_to_exis
                                                    ref_pointer_array+i,
                                                    (char *)"<no matter>",
                                                    (char *)"<list ref>"));
          - if (!abort_on_null)
          + if (!abort_on_null && select_lex->ref_pointer_array[i]->maybe_null)
                 {
                   Item *having_col_item=
                     new Item_is_not_null_test(this,
          @@ -2309,10 +2322,6 @@ Item_in_subselect::create_row_in_to_exis
                                                      (char *)"<no matter>",
                                                      (char *)"<list ref>"));
                   item= new Item_cond_or(item, item_isnull);
          - /*
          - TODO: why we create the above for cases where the right part
          - cant be NULL?
          - */
                   if (left_expr->element_index(i)->maybe_null)
                   {
                     if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          @@ -2323,6 +2332,11 @@ Item_in_subselect::create_row_in_to_exis
                   }
                   *having_item= and_items(*having_item, having_col_item);
                 }
          + if (!abort_on_null && left_expr->element_index(i)->maybe_null)
          + {
          + if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          + DBUG_RETURN(true);
          + }
                 *where_item= and_items(*where_item, item);
               }
             }
          {noformat}

          2. {{enum store_key_result}} can be transformed into boolean

          When analyzing {{subselect_uniquesubquery_engine::copy_ref_key}} I encountered
          {noformat}
          enum store_key_result { STORE_KEY_OK, STORE_KEY_FATAL, STORE_KEY_CONV }
          {noformat}
          It turns out that the last value {{STORE_KEY_CONV}} is not used anywhere, and
          the above {{enum}} is not needed, since two {{enum}}'s can be represented by {{bool}}.

          3. Unneeded extra call to {{engine->exec()}} in {{Item_subselect::exec}}

          In MariaDB 5.3 I introduced early subquery optimization. As a result the logic
          in {{Item_subselect::exec}} that calls {{engine->exec}} second time if the engine was
          changed is not needed. The reason is that in the past optimization and engine
          change was done lazily during the first call to {{engine->exec()}}. From 5.3 this is
          not true, the engine is changed before execution, so even the first execution is
          done with the right engine.

          We cannot simply remove this logic completely, because there are still few
          border cases when optimization is done lazily. However {{Item_subselect::exec}}
          should check if the engine was chosen within itself (and then re-execute),
          or if the engine was chosen before execution, and then do not re-execute.
          Multi-column non-top level subqueries can be executed via the unique_subquery/index_subquery methods instead of the general single_select engine.

          If the same queries are transformed into single-column INs, then unique_subquery/index_subquery is chosen. However in some cases the IN-EXISTS transformation for multi-column subqueries adds unnecessary null-rejecting conditions that prevent the use of the index-based subquery access methods. The problem is that the method {Item_in_subselect::create_row_in_to_exists_cond()}} adds {{Item_is_not_null_test}} and {{Item_func_trig_cond}} without looking at the left IN operand. At the same time, the analogous method for single columns does that, and doesn't add the above conditions if the left argument cannot be NULL.

          The proposed patch is:

          {noformat}
          @@ -2290,7 +2303,7 @@ Item_in_subselect::create_row_in_to_exis
                                                    ref_pointer_array+i,
                                                    (char *)"<no matter>",
                                                    (char *)"<list ref>"));
          - if (!abort_on_null)
          + if (!abort_on_null && select_lex->ref_pointer_array[i]->maybe_null)
                 {
                   Item *having_col_item=
                     new Item_is_not_null_test(this,
          @@ -2309,10 +2322,6 @@ Item_in_subselect::create_row_in_to_exis
                                                      (char *)"<no matter>",
                                                      (char *)"<list ref>"));
                   item= new Item_cond_or(item, item_isnull);
          - /*
          - TODO: why we create the above for cases where the right part
          - cant be NULL?
          - */
                   if (left_expr->element_index(i)->maybe_null)
                   {
                     if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          @@ -2323,6 +2332,11 @@ Item_in_subselect::create_row_in_to_exis
                   }
                   *having_item= and_items(*having_item, having_col_item);
                 }
          + if (!abort_on_null && left_expr->element_index(i)->maybe_null)
          + {
          + if (!(item= new Item_func_trig_cond(item, get_cond_guard(i))))
          + DBUG_RETURN(true);
          + }
                 *where_item= and_items(*where_item, item);
               }
             }
          {noformat}

          The change is proposed for 10.0 because it will change all affected query plans to use new access methods.
          timour Timour Katchaounov (Inactive) made changes -
          Fix Version/s 10.0.1 [ 11400 ]
          Fix Version/s 5.5.29 [ 11701 ]
          Due Date 2012-10-12 2012-10-26
          timour Timour Katchaounov (Inactive) made changes -
          Due Date 2012-10-26 2012-11-02
          timour Timour Katchaounov (Inactive) made changes -
          Due Date 2012-11-02 2012-11-16
          serg Sergei Golubchik made changes -
          Fix Version/s 10.0.2 [ 11900 ]
          Fix Version/s 10.0.1 [ 11400 ]
          timour Timour Katchaounov (Inactive) made changes -
          Due Date 2012-11-16 2012-12-21
          timour Timour Katchaounov (Inactive) made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          timour Timour Katchaounov (Inactive) made changes -
          Status In Progress [ 3 ] Open [ 1 ]
          serg Sergei Golubchik made changes -
          Due Date 2012-12-21

          The implementation has been tested by Elena, and is waiting for 10.0.1 to be released in order to be pushed to 10.0.2.

          timour Timour Katchaounov (Inactive) added a comment - The implementation has been tested by Elena, and is waiting for 10.0.1 to be released in order to be pushed to 10.0.2.

          merged & tested with latest 10.0, pushed to 10.0.02

          timour Timour Katchaounov (Inactive) added a comment - merged & tested with latest 10.0, pushed to 10.0.02
          timour Timour Katchaounov (Inactive) made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow defaullt [ 14301 ] MariaDB v2 [ 46381 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 46381 ] MariaDB v3 [ 64382 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 64382 ] MariaDB v4 [ 131980 ]

          People

            timour Timour Katchaounov (Inactive)
            timour Timour Katchaounov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.