Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16188

Use in-memory PK filters built from range index scans

    Details

      Description

      For a given join query Q over tables T1,...,Tn a PK filter built for Ti is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti is a subset of all records from Ti whose primary keys are contained in F

      If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial.

      Consider this query for the dbt3_s001 database
      (see mysql-test/include/dbt3_s001.inc).

      SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey
      WHERE  l_shipdate BETWEEN '1997-01-01' AND '1997-02-01'
                     AND
                     o_totalprice > 200000;
      

      Assume that we have indexes on lineitem(l_shipdate) and
      orders(o_totalprice) : i_l_shipdate and i_o_totalprice.

      total number of orders: 1500,
      number of orders where o_totalprice > 200000: 87
      selectivity of the predicate: 87/1500=0.058

      total number of lineitems: 6005
      number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101
      selectivity of the predicate: 101/6005=0.0168

      There are two possible join orders to execute this query:

      1. orders -> lineitem
      2. lineitem ->orders

      1. orders -> lineitem
      ---------------------

      How it can be executed in InnoDB
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      1.1
      using range scan by the index i_l_shipdate collect PK for the rows
      from this range into a container (e.g. ordered array) F1

      1.2
      use range scan by i_o_totalprice.
      for each key from this range take PK for this key pk_orders_i

      1.3
      using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i
      for each such key take pk_lineitem_j and check it against F1:
      if in: consider pk_orders, pk_lineitem_i as possible match

      1.4
      for each matched pair pk_orders_i, pk_lineitem_j
      access rows of orders and lineitem

      Cost:
      1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
      1.2. 1 random i/o in index i_o_totalprice +
      1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 +
      1.4. 13 random i/o by pk in orders +
      13 random i/o by pk in lineitem +

      Cost of the current execution with this join order:
      1 random i/o in index i_o_totalprice +
      87 random i/o by pk in orders +
      87 random i/o in index i_l_orderkey +
      87 * 4 random i/o by pk in lineitem

      Loss:
      1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
      87 * 4 lookups in F1
      vs
      Gain:
      (87-13)=74 random i/o by pk in orders +
      (87*4-13)=325 random i/o by pk in lineitem

      How it can be executed in MyISAM
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      1.1
      using range scan by the index i_l_shipdate collect for the rows
      from this range into a container (e.g. ordered array) F1

      1.2
      use range scan by i_o_totalprice.
      for each key from this range access orders, get row orders_i
      and take o_orderkey_i from it

      1.3
      using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i
      for each such key take RID_lineitem_j and check it against F1:
      if in: consider o_orders_i, RID_lineitem_j as possible match

      1.4 for each matched pair o_orders_i, RID_lineitem_j
      access rows of lineitem

      Cost:
      1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
      1.2. 1 random i_o in index i_o_totalprice +
      87 random i/o in orders
      1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 +
      1.4. 13 random i/o by pk in lineitem

      2. lineitem -> orders

      How it can be executed in InnoDB
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      2.1
      using range scan by the index i_o_totalprice collect PK for the rows
      from this range into a container (e.g. ordered array) F2

      2.2
      use range scan by i_l_shipdate.
      for each key from this range take PK for this key pk_lineitem_i
      take row lineitem_i and take l_orderkey_i from this row

      2.3
      access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row
      check l_orderkey_i against F2
      if in: consider lineitem_i, l_orderkey_i as possible match

      2.4
      for each matched pair lineitem_i, l_orderkey_i
      access row of orders

      Cost:
      2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) +
      2.2. 1 random i_o in index i_l_shipdate +
      2.3. 101 random i/o by pk in lineitem + 101 lookups in F2
      2.4. 13 random i/o by pk in orders

      Cost of the current execution with this join order:
      1 random i/o in index i_o_shipdate +
      101 random i/o by pk in lineitem +
      101 random i/o by pk in orders

      Loss:
      1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
      101 lookups in F2
      vs
      Gain:
      (101-13)=88 random i/o by pk in orders

      How it can be executed in MyISAM
      --------------------------------

      2.1
      using range scan by the index i_o_totalprice collect RID for the rows
      from this range into a container (e.g. ordered array) F2

      2.2
      use range scan by i_l_shipdate.
      for each key from this range take RID for this key rid_lineitem_i

      2.3
      access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row

      2.4
      look up into the primary index i_o_orderkey by l_orderkey_i
      get rid_orders_i and check rid_orders_i against F2
      if in: consider lineitem_i, rid_orders_i as possible match

      2.5
      for each matched pair lineitem_i, rid_orderkey_i
      access row of orders

      Cost:
      2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) +
      2.2. 1 random i_o in index i_l_shipdate +
      2.3. 101 random i/o by pk in lineitem +
      2.4. 101 random lookups in the pk index i_o_orderkey +
      2.5. 13 random i/o by pk in orders

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                igor Igor Babaev
                Reporter:
                igor Igor Babaev
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: