Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29395

Review input for 10.7-selectivity tree

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.10
    • N/A
    • Optimizer
    • None

    Description

      Review comments for the 10.7-selectivity tree.

      Request to fix the trace

      commit 7b93762bb8a7ab9fbc89158db4025060531cd66e
      Author: Monty <monty@mariadb.org>
      Date:   Sat Oct 9 16:16:10 2021 +0300
       
          Update cost for hash and cached joins
      

      This piece:

      --- a/sql/opt_trace.cc
      +++ b/sql/opt_trace.cc
      @@ -677,6 +677,11 @@ void print_final_join_order(JOIN *join)
         for (j= join->join_tab,i=0 ; i < join->top_join_tab_count;
              i++, j++)
           best_order.add_table_name(j);
      +  best_order.end();
      +
      +  /* Write information about the resulting join */
      +  Json_writer_object obj(join->thd, "best_access_method");
      +  obj.add("cost", join->best_read);
       }
      

      makes the trace have this:

                {
                  "best_join_order": ["<derived2>"],
                  "best_access_method": {
                    "cost": 2.799
                  }
                },
      

      Why do we need "best_access_method" ? I think just "cost" would be ok.

      Also, rename "found_matching_rows_cost" to cond_check_cost?

      Wrong code in distinct keys patch:

      commit 5997f57a9d2d3a3bc30a38ccb815aceff1ce94dc
      Author: Monty <monty@mariadb.org>
      Date:   Wed May 4 17:26:43 2022 +0300
       
          Derived tables and union can now create distinct keys
          
          The idea is that instead of marking all select_lex's with DISTINCT, we
          only mark those that really need distinct result.
      

      has

      @@ -1677,6 +1678,13 @@ Item_in_subselect::Item_in_subselect(THD *thd, Item * left_exp,
             Item_row(thd, static_cast<Item_row*>(left_exp));
         func= &eq_creator;
         init(select_lex, new (thd->mem_root) select_exists_subselect(thd, this));
      +  select_lex->distinct= 1;
      +  /*
      +    If this is is 'xxx IN (SELECT ...) mark that the we are only interested in
      +    unique values for the select
      +  */
      +  if (select_lex->first_inner_unit())
      +    select_lex->first_inner_unit()->distinct= 1;
         max_columns= UINT_MAX;
         set_maybe_null();
         reset();
      

      touching select_lex->first_inner_unit() is incorrect!

      Consider an example:

       
      create table t10 (a int, b int, c int);
      insert into t10 select seq, seq, seq from seq_1_to_10;
       
      create table t11 as select * from t10;
      create table t12 as select * from t11;
       
      explain 
      select 
        a in (select count(*)
              from 
                t10,
                (select a,b from t11) T
              where
                T.b=t10.b
              group by
                t10.c
              )
      from
       t10;
      

      When we get to the select_lex->first_inner_unit()->distinct= 1; line, we have:

      (gdb) p select_lex->master_unit()
        $50 = (st_select_lex_unit *) 0x7fffa0019df0
      (gdb) p select_lex->first_inner_unit()
        $51 = (st_select_lex_unit *) 0x7fffa00188c0
       
      (gdb) p select_lex->first_inner_unit()->first_select()->select_number
        $53 = 3
       
      (gdb) p select_lex->master_unit()->first_select()->select_number
        $54 = 2
       
      (gdb) p select_lex->first_inner_unit()->first_select()->join_list->elem(0)->alias
        $62 = {str = 0x7fffa00181b0 "t11", length = 3}
      

      That is, we're setting distinct=1 for the "(select a,b from t11)" !

      maybe instead of

      +    select_lex->first_inner_unit()->distinct= 1;
      

      we should do

      +    select_lex->master_unit()->distinct= 1;
      

      If I remove the above line (in Item_in_subselect and Item_allany_subselect), I get test result changes in:
      main.derived_cond_pushdown main.opt_tvc main.derived_opt main.myisam_explain_non_select_all main.table_value_constr main.cte_nonrecursive.

      ... Actually Item_exists_subselect::Item_exists subselect has the variant of the code I'm suggesting:

        select_lex->distinct= 1;
        select_lex->master_unit()->distinct= 1;
      

      It was added by Monty'e earlier patch:

      commit 5997f57a9d2d3a3bc30a38ccb815aceff1ce94dc
      Author:	Monty <monty@mariadb.org>  Wed May  4 17:26:43 2022
      Committer:	Monty <monty@mariadb.org>  Sun Jul 10 15:09:22 2022
       
      Derived tables and union can now create distinct keys
      

      A testcase showing new code causes wrong results: See attached _derived_unique.test.

      A fix in convert_subq_to_sj won't work

      Looking at the changed testcases, one could imagine that it would be reasonable to try setting select_lex->distinct=1 for the derived tables inside semi-join subqueries... alas, convert_subq_to_sj is called after the derived table was created.

      (Why convert_subq_to_sj? because for mergeable semi-joins, we do know that one duplicate row of each table will be sufficient...)

      Comment#2: what about UNIONs?

      following the above, should we set distinct=1 for other UNION members?

      Odd code in ha_partition::rndpos_time

      commit 07ffb3abc1004a102c2c605c7c280913741d5d87
      Author: Monty <monty@mariadb.org>
      Date:   Thu Aug 11 13:05:23 2022 +0300
       
          TEMPORARY PUSH: Changing all cost calculation to be given in ms
      

      +IO_AND_CPU_COST ha_partition::rndpos_time(ha_rows rows)
      +    IO_AND_CPU_COST cost= m_file[i]->keyread_time(inx, ranges, rows_per_part,
      +                                                  blocks);
      +    read_time.io+= cost.io;
      +    read_time.cpu= cost.cpu;
      +  }
      

      why is io added while cpu is not?

      Attachments

        Issue Links

          Activity

            People

              monty Michael Widenius
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.