Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20740

Odd computations in calculate_cond_selectivity_for_table

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2(EOL), 10.3(EOL), 10.4(EOL)
    • 10.11
    • Optimizer

    Description

      calculate_cond_selectivity_for_table has the code to compute table->cond_selectivity from selectivities of potential range accesses.

      Selectivities of range accesses are multiplied, but there is code to handle the situation where range accesses use restrictions on the same column. In that case, the code has additional multiplier to account for this fact (and avoid counting the same selectivity twice).

      This bug is about that code not working for a fairly basic example where I think it should work.

      The testcase (not necessarily minimal):

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int primary key);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t1 (
        col1 int not null,
        col2 int not null,
        useless_1 int not null, 
        useless_2 int not null,
        index i0(useless_1, useless_2, col2),
        INDEX i1(col1, col2)
      ) engine=myisam;
       
      insert into t1 
      select 
        mod(A.a+1000*B.a, 100),
        A.a+1000*B.a,
        1,
        2
      from
        one_k A,
        one_k B;
       
      # Just MyISAM's ANALYZE, we don't need EITS data.
      analyze table t1;
      

      The query:

      explain extended 
      select * from t1  
      where  
        col2 >= 500000 and  col1 >= 50 and useless_1=1 and useless_2=2;
      

      Checking the optimizer trace, I can see that this will create potential range accesses:

        "range_scan_alternatives": [
          {
            "index": "i0",
            "ranges": ["(1,2,500000) <= (useless_1,useless_2,col2)"],
            "rowid_ordered": false,
            "using_mrr": false,
            "index_only": false,
            "rows": 538797,
            "cost": 695451,
            "chosen": false,
            "cause": "cost"
          },
          {
            "index": "i1",
            "ranges": ["(50,500000) <= (col1,col2)"],
            "rowid_ordered": false,
            "using_mrr": false,
            "index_only": false,
            "rows": 507454,
            "cost": 650303,
            "chosen": false,
            "cause": "cost"
          }
        ],
      

      Both range accesses use column col2, both have selectivity around 0.5

      But when I step through calculate_cond_selectivity_for_table() function, I can see that

      • table->cond_selectivity is assigned the value of 1
      • table->cond_selectivity is multiplied by first range access selectivity
      • table->cond_selectivity is multiplied by second range access selectivity

      I don't see any adjustments to the selectivity that are due to the fact that both potential range acccesses use the same column. I think this is not what was intended.

      Attachments

        Issue Links

          Activity

            EXPLAIN output, just in case. filtered shows the product of quick selects' selectivities.

            mysql> explain extended select * from t1  where  col2 >= 500000 and  col1 >= 50 and useless_1=1 and    useless_2=2;
             +------+-------------+-------+------+---------------+------+---------+------+---------+----------+-------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
            +------+-------------+-------+------+---------------+------+---------+------+---------+----------+-------------+
            |    1 | SIMPLE      | t1    | ALL  | i0,i1         | NULL | NULL    | NULL | 1000000 |    27.34 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+---------+----------+-------------+
             

            psergei Sergei Petrunia added a comment - EXPLAIN output, just in case. filtered shows the product of quick selects' selectivities. mysql> explain extended select * from t1 where col2 >= 500000 and col1 >= 50 and useless_1=1 and useless_2=2; +------+-------------+-------+------+---------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | t1 | ALL | i0,i1 | NULL | NULL | NULL | 1000000 | 27.34 | Using where | +------+-------------+-------+------+---------------+------+---------+------+---------+----------+-------------+

            (gdb) p table->quick_key_parts
              $37 = {3, 1, 0 <repeats 62 times>}
            

            and when we reach this:

                      if (i != used_key_parts)
            	  {
                        /*
                          Range access got us estimate for #used_key_parts.
                          We need estimate for #(i-1) key parts.
                        */
                        double f1= key_info->actual_rec_per_key(i-1);
                        double f2= key_info->actual_rec_per_key(i);
            

            we have used_key_parts=1, i=1

            I think the issue is somewhere around here.

            psergei Sergei Petrunia added a comment - (gdb) p table->quick_key_parts $37 = {3, 1, 0 <repeats 62 times>} and when we reach this: if (i != used_key_parts) { /* Range access got us estimate for #used_key_parts. We need estimate for #(i-1) key parts. */ double f1= key_info->actual_rec_per_key(i-1); double f2= key_info->actual_rec_per_key(i); we have used_key_parts=1, i=1 I think the issue is somewhere around here.

            Wait, why is this:

            (gdb) p table->quick_key_parts
              $37 = {3, 1, 0 <repeats 62 times>}
            

            The trace excerpt shows that i0 uses 3 key parts (matches gdb output), while i1 uses 2 key parts (doesn't match it).

            psergei Sergei Petrunia added a comment - Wait, why is this: (gdb) p table->quick_key_parts $37 = {3, 1, 0 <repeats 62 times>} The trace excerpt shows that i0 uses 3 key parts (matches gdb output), while i1 uses 2 key parts (doesn't match it).
            psergei Sergei Petrunia added a comment - - edited

            opt_range code doesn't max_key_part correctly for certain kinds of range scans.

            Setting the value to be correct causes this change:

            --- /home/psergey/dev-git/10.4-cl/mysql-test/main/index_intersect_innodb.result 2019-02-26 14:08:24.665835501 +0200
            +++ /home/psergey/dev-git/10.4-cl/mysql-test/main/index_intersect_innodb.reject 2019-10-04 19:24:15.590832947 +0300
            @@ -470,17 +470,17 @@
             SELECT * FROM City
             WHERE ID BETWEEN 501 AND 1000 AND Population > 700000 AND Country LIKE 'C%';
             id     select_type     table   type    possible_keys   key     key_len ref     rows    Extra
            -1      SIMPLE  City    index_merge     PRIMARY,Population,Country      PRIMARY,Country,Population      4,7,4   NULL    #       Using sort_intersect(PRIMARY,Country,Population); Using where
            +1      SIMPLE  City    index_merge     PRIMARY,Population,Country      PRIMARY,Population,Country      4,4,7   NULL    #       Using sort_intersect(PRIMARY,Population,Country); Using where
             EXPLAIN
             SELECT * FROM City
             WHERE ID BETWEEN 1 AND 500 AND Population > 700000 AND Country LIKE 'C%';
             id     select_type     table   type    possible_keys   key     key_len ref     rows    Extra
            -1      SIMPLE  City    index_merge     PRIMARY,Population,Country      PRIMARY,Country,Population      4,7,4   NULL    #       Using sort_intersect(PRIMARY,Country,Population); Using where
            +1      SIMPLE  City    index_merge     PRIMARY,Population,Country      PRIMARY,Population,Country      4,4,7   NULL    #       Using sort_intersect(PRIMARY,Population,Country); Using where
             EXPLAIN
             SELECT * FROM City 
             WHERE ID BETWEEN 2001 AND 2500 AND Population > 300000 AND Country LIKE 'H%';
             id     select_type     table   type    possible_keys   key     key_len ref     rows    Extra
            -1      SIMPLE  City    index_merge     PRIMARY,Population,Country      PRIMARY,Country 4,7     NULL    #       Using sort_intersect(PRIMARY,Country); Using where
            +1      SIMPLE  City    range   PRIMARY,Population,Country      Country 7       NULL    #       Using index condition; Using where
             EXPLAIN
             SELECT * FROM City 
             WHERE ID BETWEEN 3701 AND 4000 AND Population > 1000000
            @@ -724,7 +724,7 @@
             SELECT * FROM City
             WHERE ID BETWEEN 1 AND 500 AND Population > 700000 AND Country LIKE 'C%';
             id     select_type     table   type    possible_keys   key     key_len ref     rows    Extra
            -1      SIMPLE  City    index_merge     PRIMARY,Population,Country      PRIMARY,Country,Population      4,7,4   NULL    #       Using sort_intersect(PRIMARY,Country,Population); Using where
            +1      SIMPLE  City    index_merge     PRIMARY,Population,Country      PRIMARY,Population,Country      4,4,7   NULL    #       Using sort_intersect(PRIMARY,Population,Country); Using where
            

            Debugging the range vs index merge difference (3rd).

            Relevant part of the trace:

                                "analyzing_sort_intersect": {
                                  "potential_index_scans": [
                                    {
                                      "index": "PRIMARY",
                                      "chosen": "false",
                                      "cause": "clustered index used for filtering"
                                    },
                                    {
                                      "index": "Population",
                                      "cost": 55.454,
                                      "chosen": false,
                                      "cause": "cost"
                                    },
                                    {
                                      "index": "Country",
                                      "cost": 2.3519,
                                      "chosen": true,
                                      "cause": "first occurence of index prefix"
                                    }
                                  ],
             
            -                     "selected_index_scans": [
            -                       {
            -                         "index": "Country",
            -                         "keyparts": ["Country"],
            -                         "records": 22,
            -                         "filtered_records": 20
            -                       }
            -                     ],
            -                     "rows": 2,
            -                     "cost": 4.4102,
            -                     "chosen": true
            -                   },
            -                   "analyzing_index_merge_union": []
            -                 },
            -                 "chosen_range_access_summary": {
            -                   "range_access_plan": {
            -                     "type": "index_sort_intersect",
            -                     "index_sort_intersect_of": [
            -                       {
            -                         "type": "range_scan",
            -                         "index": "PRIMARY",
            -                         "rows": 500,
            -                         "ranges": ["(2001) <= (ID) <= (2500)"]
            -                       },
            -                       {
            -                         "type": "range_scan",
            -                         "index": "Country",
            -                         "rows": 22,
            -                         "ranges": [
            -                           "(H\0\0,2001) <= (Country,ID) <= (Hÿÿ,2500)"
            -                         ]
            -                       }
            -                     ]
            -                   },
            +                     "selected_index_scans": [
            +                       {
            +                         "index": "Country",
            +                         "keyparts": ["Country", "ID"],
            +                         "records": 22,
            +                         "filtered_records": 0
            +                       }
            +                     ]
            +
            

            filtered_records is set like so:

                  ha_rows records= records_in_index_intersect_extension(&curr, *scan_ptr);
                  (*scan_ptr)->filtered_out= records >= scan_records ?
                                               0 : scan_records-records;
            

            The computation in records_in_index_intersect_extension depends on how many key parts merged key scans use

            psergei Sergei Petrunia added a comment - - edited opt_range code doesn't max_key_part correctly for certain kinds of range scans. Setting the value to be correct causes this change: --- /home/psergey/dev-git/10.4-cl/mysql-test/main/index_intersect_innodb.result 2019-02-26 14:08:24.665835501 +0200 +++ /home/psergey/dev-git/10.4-cl/mysql-test/main/index_intersect_innodb.reject 2019-10-04 19:24:15.590832947 +0300 @@ -470,17 +470,17 @@ SELECT * FROM City WHERE ID BETWEEN 501 AND 1000 AND Population > 700000 AND Country LIKE 'C%'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE City index_merge PRIMARY,Population,Country PRIMARY,Country,Population 4,7,4 NULL # Using sort_intersect(PRIMARY,Country,Population); Using where +1 SIMPLE City index_merge PRIMARY,Population,Country PRIMARY,Population,Country 4,4,7 NULL # Using sort_intersect(PRIMARY,Population,Country); Using where EXPLAIN SELECT * FROM City WHERE ID BETWEEN 1 AND 500 AND Population > 700000 AND Country LIKE 'C%'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE City index_merge PRIMARY,Population,Country PRIMARY,Country,Population 4,7,4 NULL # Using sort_intersect(PRIMARY,Country,Population); Using where +1 SIMPLE City index_merge PRIMARY,Population,Country PRIMARY,Population,Country 4,4,7 NULL # Using sort_intersect(PRIMARY,Population,Country); Using where EXPLAIN SELECT * FROM City WHERE ID BETWEEN 2001 AND 2500 AND Population > 300000 AND Country LIKE 'H%'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE City index_merge PRIMARY,Population,Country PRIMARY,Country 4,7 NULL # Using sort_intersect(PRIMARY,Country); Using where +1 SIMPLE City range PRIMARY,Population,Country Country 7 NULL # Using index condition; Using where EXPLAIN SELECT * FROM City WHERE ID BETWEEN 3701 AND 4000 AND Population > 1000000 @@ -724,7 +724,7 @@ SELECT * FROM City WHERE ID BETWEEN 1 AND 500 AND Population > 700000 AND Country LIKE 'C%'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE City index_merge PRIMARY,Population,Country PRIMARY,Country,Population 4,7,4 NULL # Using sort_intersect(PRIMARY,Country,Population); Using where +1 SIMPLE City index_merge PRIMARY,Population,Country PRIMARY,Population,Country 4,4,7 NULL # Using sort_intersect(PRIMARY,Population,Country); Using where Debugging the range vs index merge difference (3rd). Relevant part of the trace: "analyzing_sort_intersect": { "potential_index_scans": [ { "index": "PRIMARY", "chosen": "false", "cause": "clustered index used for filtering" }, { "index": "Population", "cost": 55.454, "chosen": false, "cause": "cost" }, { "index": "Country", "cost": 2.3519, "chosen": true, "cause": "first occurence of index prefix" } ],   - "selected_index_scans": [ - { - "index": "Country", - "keyparts": ["Country"], - "records": 22, - "filtered_records": 20 - } - ], - "rows": 2, - "cost": 4.4102, - "chosen": true - }, - "analyzing_index_merge_union": [] - }, - "chosen_range_access_summary": { - "range_access_plan": { - "type": "index_sort_intersect", - "index_sort_intersect_of": [ - { - "type": "range_scan", - "index": "PRIMARY", - "rows": 500, - "ranges": ["(2001) <= (ID) <= (2500)"] - }, - { - "type": "range_scan", - "index": "Country", - "rows": 22, - "ranges": [ - "(H\0\0,2001) <= (Country,ID) <= (Hÿÿ,2500)" - ] - } - ] - }, + "selected_index_scans": [ + { + "index": "Country", + "keyparts": ["Country", "ID"], + "records": 22, + "filtered_records": 0 + } + ] + filtered_records is set like so: ha_rows records= records_in_index_intersect_extension(&curr, *scan_ptr); (*scan_ptr)->filtered_out= records >= scan_records ? 0 : scan_records-records; The computation in records_in_index_intersect_extension depends on how many key parts merged key scans use
            psergei Sergei Petrunia added a comment - Patch for the max_key_parts issue: http://lists.askmonty.org/pipermail/commits/2019-October/014019.html .

            Ok, trying the example again after the patch for max_key_part.
            Stepping through calculate_cond_selectivity_for_table.

            Execution reaches this point twice:

                      table->cond_selectivity*= quick_cond_selectivity;
            

            both times the quick select selectivity is around 0.5, so table->cond_selectivity becomes around 0.25

            Then we reach this code with i=1

                        double f1= key_info->actual_rec_per_key(i-1);
                        double f2= key_info->actual_rec_per_key(i);
                        if (f1 > 0 && f2 > 0)
                          selectivity_mult= f1 / f2;
            

            the stats are realistic:

            (gdb) print f1
              $62 = 10000
            (gdb) print f2
              $63 = 1
            

            (gdb) print selectivity_mult
              $64 = 10000
            

            Then we execute this line

                        table->cond_selectivity*= selectivity_mult;
            

            and get

            (gdb) p table->cond_selectivity
            $65 = 2734.1469283799997

            psergei Sergei Petrunia added a comment - Ok, trying the example again after the patch for max_key_part . Stepping through calculate_cond_selectivity_for_table. Execution reaches this point twice: table->cond_selectivity*= quick_cond_selectivity; both times the quick select selectivity is around 0.5, so table->cond_selectivity becomes around 0.25 Then we reach this code with i=1 double f1= key_info->actual_rec_per_key(i-1); double f2= key_info->actual_rec_per_key(i); if (f1 > 0 && f2 > 0) selectivity_mult= f1 / f2; the stats are realistic: (gdb) print f1 $62 = 10000 (gdb) print f2 $63 = 1 (gdb) print selectivity_mult $64 = 10000 Then we execute this line table->cond_selectivity*= selectivity_mult; and get (gdb) p table->cond_selectivity $65 = 2734.1469283799997

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.