Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20740

Odd computations in calculate_cond_selectivity_for_table

    XMLWordPrintable

    Details

      Description

      calculate_cond_selectivity_for_table has the code to compute table->cond_selectivity from selectivities of potential range accesses.

      Selectivities of range accesses are multiplied, but there is code to handle the situation where range accesses use restrictions on the same column. In that case, the code has additional multiplier to account for this fact (and avoid counting the same selectivity twice).

      This bug is about that code not working for a fairly basic example where I think it should work.

      The testcase (not necessarily minimal):

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int primary key);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t1 (
        col1 int not null,
        col2 int not null,
        useless_1 int not null, 
        useless_2 int not null,
        index i0(useless_1, useless_2, col2),
        INDEX i1(col1, col2)
      ) engine=myisam;
       
      insert into t1 
      select 
        mod(A.a+1000*B.a, 100),
        A.a+1000*B.a,
        1,
        2
      from
        one_k A,
        one_k B;
       
      # Just MyISAM's ANALYZE, we don't need EITS data.
      analyze table t1;
      

      The query:

      explain extended 
      select * from t1  
      where  
        col2 >= 500000 and  col1 >= 50 and useless_1=1 and useless_2=2;
      

      Checking the optimizer trace, I can see that this will create potential range accesses:

        "range_scan_alternatives": [
          {
            "index": "i0",
            "ranges": ["(1,2,500000) <= (useless_1,useless_2,col2)"],
            "rowid_ordered": false,
            "using_mrr": false,
            "index_only": false,
            "rows": 538797,
            "cost": 695451,
            "chosen": false,
            "cause": "cost"
          },
          {
            "index": "i1",
            "ranges": ["(50,500000) <= (col1,col2)"],
            "rowid_ordered": false,
            "using_mrr": false,
            "index_only": false,
            "rows": 507454,
            "cost": 650303,
            "chosen": false,
            "cause": "cost"
          }
        ],
      

      Both range accesses use column col2, both have selectivity around 0.5

      But when I step through calculate_cond_selectivity_for_table() function, I can see that

      • table->cond_selectivity is assigned the value of 1
      • table->cond_selectivity is multiplied by first range access selectivity
      • table->cond_selectivity is multiplied by second range access selectivity

      I don't see any adjustments to the selectivity that are due to the fact that both potential range acccesses use the same column. I think this is not what was intended.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              igor Igor Babaev
              Reporter:
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration