For a given join query Q over tables T1,...,Tn a PK filter built for Ti is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti is a subset of all records from Ti whose primary keys are contained in F
If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial.
Consider this query for the dbt3_s001 database
(see mysql-test/include/dbt3_s001.inc).
SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey
WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01'
AND
o_totalprice > 200000;
Assume that we have indexes on lineitem(l_shipdate) and
orders(o_totalprice) : i_l_shipdate and i_o_totalprice.
total number of orders: 1500,
number of orders where o_totalprice > 200000: 87
selectivity of the predicate: 87/1500=0.058
total number of lineitems: 6005
number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101
selectivity of the predicate: 101/6005=0.0168
There are two possible join orders to execute this query:
1. orders -> lineitem
2. lineitem ->orders
1. orders -> lineitem
---------------------
How it can be executed in InnoDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.1
using range scan by the index i_l_shipdate collect PK for the rows
from this range into a container (e.g. ordered array) F1
1.2
use range scan by i_o_totalprice.
for each key from this range take PK for this key pk_orders_i
1.3
using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i
for each such key take pk_lineitem_j and check it against F1:
if in: consider pk_orders, pk_lineitem_i as possible match
1.4
for each matched pair pk_orders_i, pk_lineitem_j
access rows of orders and lineitem
Cost:
1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
1.2. 1 random i/o in index i_o_totalprice +
1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 +
1.4. 13 random i/o by pk in orders +
13 random i/o by pk in lineitem +
Cost of the current execution with this join order:
1 random i/o in index i_o_totalprice +
87 random i/o by pk in orders +
87 random i/o in index i_l_orderkey +
87 * 4 random i/o by pk in lineitem
Loss:
1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
87 * 4 lookups in F1
vs
Gain:
(87-13)=74 random i/o by pk in orders +
(87*4-13)=325 random i/o by pk in lineitem
How it can be executed in MyISAM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.1
using range scan by the index i_l_shipdate collect for the rows
from this range into a container (e.g. ordered array) F1
1.2
use range scan by i_o_totalprice.
for each key from this range access orders, get row orders_i
and take o_orderkey_i from it
1.3
using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i
for each such key take RID_lineitem_j and check it against F1:
if in: consider o_orders_i, RID_lineitem_j as possible match
1.4 for each matched pair o_orders_i, RID_lineitem_j
access rows of lineitem
Cost:
1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
1.2. 1 random i_o in index i_o_totalprice +
87 random i/o in orders
1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 +
1.4. 13 random i/o by pk in lineitem
2. lineitem -> orders
How it can be executed in InnoDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2.1
using range scan by the index i_o_totalprice collect PK for the rows
from this range into a container (e.g. ordered array) F2
2.2
use range scan by i_l_shipdate.
for each key from this range take PK for this key pk_lineitem_i
take row lineitem_i and take l_orderkey_i from this row
2.3
access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row
check l_orderkey_i against F2
if in: consider lineitem_i, l_orderkey_i as possible match
2.4
for each matched pair lineitem_i, l_orderkey_i
access row of orders
Cost:
2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) +
2.2. 1 random i_o in index i_l_shipdate +
2.3. 101 random i/o by pk in lineitem + 101 lookups in F2
2.4. 13 random i/o by pk in orders
Cost of the current execution with this join order:
1 random i/o in index i_o_shipdate +
101 random i/o by pk in lineitem +
101 random i/o by pk in orders
Loss:
1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
101 lookups in F2
vs
Gain:
(101-13)=88 random i/o by pk in orders
How it can be executed in MyISAM
--------------------------------
2.1
using range scan by the index i_o_totalprice collect RID for the rows
from this range into a container (e.g. ordered array) F2
2.2
use range scan by i_l_shipdate.
for each key from this range take RID for this key rid_lineitem_i
2.3
access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row
2.4
look up into the primary index i_o_orderkey by l_orderkey_i
get rid_orders_i and check rid_orders_i against F2
if in: consider lineitem_i, rid_orders_i as possible match
2.5
for each matched pair lineitem_i, rid_orderkey_i
access row of orders
Cost:
2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) +
2.2. 1 random i_o in index i_l_shipdate +
2.3. 101 random i/o by pk in lineitem +
2.4. 101 random lookups in the pk index i_o_orderkey +
2.5. 13 random i/o by pk in orders