Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
Description
For a given join query Q over tables T1,...,Tn a PK filter built for Ti is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti is a subset of all records from Ti whose primary keys are contained in F
If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial.
Consider this query for the dbt3_s001 database
(see mysql-test/include/dbt3_s001.inc).
SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey
|
WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01'
|
AND
|
o_totalprice > 200000;
|
Assume that we have indexes on lineitem(l_shipdate) and
orders(o_totalprice) : i_l_shipdate and i_o_totalprice.
total number of orders: 1500,
number of orders where o_totalprice > 200000: 87
selectivity of the predicate: 87/1500=0.058
total number of lineitems: 6005
number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101
selectivity of the predicate: 101/6005=0.0168
There are two possible join orders to execute this query:
1. orders -> lineitem
2. lineitem ->orders
1. orders -> lineitem
---------------------
How it can be executed in InnoDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.1
using range scan by the index i_l_shipdate collect PK for the rows
from this range into a container (e.g. ordered array) F1
1.2
use range scan by i_o_totalprice.
for each key from this range take PK for this key pk_orders_i
1.3
using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i
for each such key take pk_lineitem_j and check it against F1:
if in: consider pk_orders, pk_lineitem_i as possible match
1.4
for each matched pair pk_orders_i, pk_lineitem_j
access rows of orders and lineitem
Cost:
1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
1.2. 1 random i/o in index i_o_totalprice +
1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 +
1.4. 13 random i/o by pk in orders +
13 random i/o by pk in lineitem +
Cost of the current execution with this join order:
1 random i/o in index i_o_totalprice +
87 random i/o by pk in orders +
87 random i/o in index i_l_orderkey +
87 * 4 random i/o by pk in lineitem
Loss:
1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
87 * 4 lookups in F1
vs
Gain:
(87-13)=74 random i/o by pk in orders +
(87*4-13)=325 random i/o by pk in lineitem
How it can be executed in MyISAM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.1
using range scan by the index i_l_shipdate collect for the rows
from this range into a container (e.g. ordered array) F1
1.2
use range scan by i_o_totalprice.
for each key from this range access orders, get row orders_i
and take o_orderkey_i from it
1.3
using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i
for each such key take RID_lineitem_j and check it against F1:
if in: consider o_orders_i, RID_lineitem_j as possible match
1.4 for each matched pair o_orders_i, RID_lineitem_j
access rows of lineitem
Cost:
1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
1.2. 1 random i_o in index i_o_totalprice +
87 random i/o in orders
1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 +
1.4. 13 random i/o by pk in lineitem
2. lineitem -> orders
How it can be executed in InnoDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2.1
using range scan by the index i_o_totalprice collect PK for the rows
from this range into a container (e.g. ordered array) F2
2.2
use range scan by i_l_shipdate.
for each key from this range take PK for this key pk_lineitem_i
take row lineitem_i and take l_orderkey_i from this row
2.3
access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row
check l_orderkey_i against F2
if in: consider lineitem_i, l_orderkey_i as possible match
2.4
for each matched pair lineitem_i, l_orderkey_i
access row of orders
Cost:
2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) +
2.2. 1 random i_o in index i_l_shipdate +
2.3. 101 random i/o by pk in lineitem + 101 lookups in F2
2.4. 13 random i/o by pk in orders
Cost of the current execution with this join order:
1 random i/o in index i_o_shipdate +
101 random i/o by pk in lineitem +
101 random i/o by pk in orders
Loss:
1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
101 lookups in F2
vs
Gain:
(101-13)=88 random i/o by pk in orders
How it can be executed in MyISAM
--------------------------------
2.1
using range scan by the index i_o_totalprice collect RID for the rows
from this range into a container (e.g. ordered array) F2
2.2
use range scan by i_l_shipdate.
for each key from this range take RID for this key rid_lineitem_i
2.3
access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row
2.4
look up into the primary index i_o_orderkey by l_orderkey_i
get rid_orders_i and check rid_orders_i against F2
if in: consider lineitem_i, rid_orders_i as possible match
2.5
for each matched pair lineitem_i, rid_orderkey_i
access row of orders
Cost:
2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) +
2.2. 1 random i_o in index i_l_shipdate +
2.3. 101 random i/o by pk in lineitem +
2.4. 101 random lookups in the pk index i_o_orderkey +
2.5. 13 random i/o by pk in orders
Attachments
Issue Links
- causes
-
MDEV-18477 Implement FNV-1a hash function
-
- Stalled
-
-
MDEV-20407 mysqld got signal 11; rowid filter
-
- Closed
-
-
MDEV-21040 InnoDB: fulltext search with ` in(...)` crashed on specific combination
-
- Closed
-
-
MDEV-21191 MariaDB crashes executing SELECT request after upgrading to 10.4.10
-
- Closed
-
-
MDEV-21535 Unnecessarily large ha_innobase::records_in_range() scans
-
- Closed
-
-
MDEV-23861 Execution plan
-
- Closed
-
-
MDEV-28846 Poor performance when rowid filter contains no elements
-
- Closed
-
- includes
-
MDEV-18144 ANALYZE for statement support for PK filters
-
- Closed
-
- is blocked by
-
MDEV-18332 Tests for PK filters
-
- Closed
-
- relates to
-
MDEV-19919 Assertion `!prebuilt->index->is_primary()' failed in row_search_idx_cond_check
-
- Closed
-
-
MDEV-20056 Assertion `!prebuilt->index->is_primary()' failed in row_search_idx_cond_check
-
- Closed
-
-
MDEV-21794 Optimizer flag rowid_filter leads to long query
-
- Closed
-
-
MDEV-23035 INNODB assertion failure -> crash
-
- Closed
-
-
MDEV-22761 InnoDB: Assertion failure row0sel.cc line 3966 exception 0x80000003
-
- Closed
-
-
MDEV-33522 Mariadb process crashes when running a simple join query
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Assignee | Galina Shalygina [ shagalla ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Fix Version/s | 10.4 [ 22408 ] |
Description |
For a given join query Q over tables T1,...,Tn a PK filter built for Ti is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti belongs to F. If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial. Consider this query for the dbt3_s001 database (see mysql-test/include/dbt3_s001.inc). {noformat} SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01' AND o_totalprice > 200000; {noformat} Assume that we have indexes on lineitem(l_shipdate) and orders(o_totalprice) : i_l_shipdate and i_o_totalprice. total number of orders: 1500, number of orders where o_totalprice > 200000: 87 selectivity of the predicate: 87/1500=0.058 total number of lineitems: 6005 number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101 selectivity of the predicate: 101/6005=0.0168 There are two possible join orders to execute this query: 1. orders -> lineitem 2. lineitem ->orders 1. orders -> lineitem --------------------- How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect PK for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range take PK for this key pk_orders_i 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i for each such key take pk_lineitem_j and check it against F1: if in: consider pk_orders, pk_lineitem_i as possible match 1.4 for each matched pair pk_orders_i, pk_lineitem_j access rows of orders and lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in orders + 13 random i/o by pk in lineitem + Cost of the current execution with this join order: 1 random i/o in index i_o_totalprice + 87 random i/o by pk in orders + 87 random i/o in index i_l_orderkey + 87 * 4 random i/o by pk in lineitem Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 87 * 4 lookups in F1 vs Gain: (87-13)=74 random i/o by pk in orders + (87*4-13)=325 random i/o by pk in lineitem How it can be executed in MyISAM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range access orders, get row orders_i and take o_orderkey_i from it 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i for each such key take RID_lineitem_j and check it against F1: if in: consider o_orders_i, RID_lineitem_j as possible match 1.4 for each matched pair o_orders_i, RID_lineitem_j access rows of lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 87 random i/o in orders 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in lineitem 2. lineitem -> orders How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 using range scan by the index i_o_totalprice collect PK for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take PK for this key pk_lineitem_i take row lineitem_i and take l_orderkey_i from this row 2.3 access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row check l_orderkey_i against F2 if in: consider lineitem_i, l_orderkey_i as possible match 2.4 for each matched pair lineitem_i, l_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 101 lookups in F2 2.4. 13 random i/o by pk in orders Cost of the current execution with this join order: 1 random i/o in index i_o_shipdate + 101 random i/o by pk in lineitem + 101 random i/o by pk in orders Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 101 lookups in F2 vs Gain: (101-13)=88 random i/o by pk in orders How it can be executed in MyISAM -------------------------------- 2.1 using range scan by the index i_o_totalprice collect RID for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take RID for this key rid_lineitem_i 2.3 access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row 2.4 look up into the primary index i_o_orderkey by l_orderkey_i get rid_orders_i and check rid_orders_i against F2 if in: consider lineitem_i, rid_orders_i as possible match 2.5 for each matched pair lineitem_i, rid_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 2.4. 101 random lookups in the pk index i_o_orderkey + 2.5. 13 random i/o by pk in orders |
For a given join query Q over tables T1,...,Tn a PK filter built for Ti
is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti belongs to F. If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial. Consider this query for the dbt3_s001 database (see mysql-test/include/dbt3_s001.inc). {noformat} SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01' AND o_totalprice > 200000; {noformat} Assume that we have indexes on lineitem(l_shipdate) and orders(o_totalprice) : i_l_shipdate and i_o_totalprice. total number of orders: 1500, number of orders where o_totalprice > 200000: 87 selectivity of the predicate: 87/1500=0.058 total number of lineitems: 6005 number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101 selectivity of the predicate: 101/6005=0.0168 There are two possible join orders to execute this query: 1. orders -> lineitem 2. lineitem ->orders 1. orders -> lineitem --------------------- How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect PK for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range take PK for this key pk_orders_i 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i for each such key take pk_lineitem_j and check it against F1: if in: consider pk_orders, pk_lineitem_i as possible match 1.4 for each matched pair pk_orders_i, pk_lineitem_j access rows of orders and lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in orders + 13 random i/o by pk in lineitem + Cost of the current execution with this join order: 1 random i/o in index i_o_totalprice + 87 random i/o by pk in orders + 87 random i/o in index i_l_orderkey + 87 * 4 random i/o by pk in lineitem Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 87 * 4 lookups in F1 vs Gain: (87-13)=74 random i/o by pk in orders + (87*4-13)=325 random i/o by pk in lineitem How it can be executed in MyISAM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range access orders, get row orders_i and take o_orderkey_i from it 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i for each such key take RID_lineitem_j and check it against F1: if in: consider o_orders_i, RID_lineitem_j as possible match 1.4 for each matched pair o_orders_i, RID_lineitem_j access rows of lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 87 random i/o in orders 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in lineitem 2. lineitem -> orders How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 using range scan by the index i_o_totalprice collect PK for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take PK for this key pk_lineitem_i take row lineitem_i and take l_orderkey_i from this row 2.3 access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row check l_orderkey_i against F2 if in: consider lineitem_i, l_orderkey_i as possible match 2.4 for each matched pair lineitem_i, l_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 101 lookups in F2 2.4. 13 random i/o by pk in orders Cost of the current execution with this join order: 1 random i/o in index i_o_shipdate + 101 random i/o by pk in lineitem + 101 random i/o by pk in orders Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 101 lookups in F2 vs Gain: (101-13)=88 random i/o by pk in orders How it can be executed in MyISAM -------------------------------- 2.1 using range scan by the index i_o_totalprice collect RID for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take RID for this key rid_lineitem_i 2.3 access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row 2.4 look up into the primary index i_o_orderkey by l_orderkey_i get rid_orders_i and check rid_orders_i against F2 if in: consider lineitem_i, rid_orders_i as possible match 2.5 for each matched pair lineitem_i, rid_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 2.4. 101 random lookups in the pk index i_o_orderkey + 2.5. 13 random i/o by pk in orders |
Epic Link | PT-76 [ 68557 ] |
Description |
For a given join query Q over tables T1,...,Tn a PK filter built for Ti
is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti belongs to F. If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial. Consider this query for the dbt3_s001 database (see mysql-test/include/dbt3_s001.inc). {noformat} SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01' AND o_totalprice > 200000; {noformat} Assume that we have indexes on lineitem(l_shipdate) and orders(o_totalprice) : i_l_shipdate and i_o_totalprice. total number of orders: 1500, number of orders where o_totalprice > 200000: 87 selectivity of the predicate: 87/1500=0.058 total number of lineitems: 6005 number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101 selectivity of the predicate: 101/6005=0.0168 There are two possible join orders to execute this query: 1. orders -> lineitem 2. lineitem ->orders 1. orders -> lineitem --------------------- How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect PK for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range take PK for this key pk_orders_i 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i for each such key take pk_lineitem_j and check it against F1: if in: consider pk_orders, pk_lineitem_i as possible match 1.4 for each matched pair pk_orders_i, pk_lineitem_j access rows of orders and lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in orders + 13 random i/o by pk in lineitem + Cost of the current execution with this join order: 1 random i/o in index i_o_totalprice + 87 random i/o by pk in orders + 87 random i/o in index i_l_orderkey + 87 * 4 random i/o by pk in lineitem Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 87 * 4 lookups in F1 vs Gain: (87-13)=74 random i/o by pk in orders + (87*4-13)=325 random i/o by pk in lineitem How it can be executed in MyISAM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range access orders, get row orders_i and take o_orderkey_i from it 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i for each such key take RID_lineitem_j and check it against F1: if in: consider o_orders_i, RID_lineitem_j as possible match 1.4 for each matched pair o_orders_i, RID_lineitem_j access rows of lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 87 random i/o in orders 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in lineitem 2. lineitem -> orders How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 using range scan by the index i_o_totalprice collect PK for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take PK for this key pk_lineitem_i take row lineitem_i and take l_orderkey_i from this row 2.3 access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row check l_orderkey_i against F2 if in: consider lineitem_i, l_orderkey_i as possible match 2.4 for each matched pair lineitem_i, l_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 101 lookups in F2 2.4. 13 random i/o by pk in orders Cost of the current execution with this join order: 1 random i/o in index i_o_shipdate + 101 random i/o by pk in lineitem + 101 random i/o by pk in orders Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 101 lookups in F2 vs Gain: (101-13)=88 random i/o by pk in orders How it can be executed in MyISAM -------------------------------- 2.1 using range scan by the index i_o_totalprice collect RID for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take RID for this key rid_lineitem_i 2.3 access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row 2.4 look up into the primary index i_o_orderkey by l_orderkey_i get rid_orders_i and check rid_orders_i against F2 if in: consider lineitem_i, rid_orders_i as possible match 2.5 for each matched pair lineitem_i, rid_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 2.4. 101 random lookups in the pk index i_o_orderkey + 2.5. 13 random i/o by pk in orders |
For a given join query Q over tables T1,...,Tn a PK filter built for Ti
is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti belongs to F. If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial. Consider this query for the dbt3_s001 database (see mysql-test/include/dbt3_s001.inc). {noformat} SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01' AND o_totalprice > 200000; {noformat} Assume that we have indexes on lineitem(l_shipdate) and orders(o_totalprice) : i_l_shipdate and i_o_totalprice. total number of orders: 1500, number of orders where o_totalprice > 200000: 87 selectivity of the predicate: 87/1500=0.058 total number of lineitems: 6005 number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101 selectivity of the predicate: 101/6005=0.0168 There are two possible join orders to execute this query: 1. orders -> lineitem 2. lineitem ->orders 1. orders -> lineitem --------------------- How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect PK for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range take PK for this key pk_orders_i 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i for each such key take pk_lineitem_j and check it against F1: if in: consider pk_orders, pk_lineitem_i as possible match 1.4 for each matched pair pk_orders_i, pk_lineitem_j access rows of orders and lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i/o in index i_o_totalprice + 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in orders + 13 random i/o by pk in lineitem + Cost of the current execution with this join order: 1 random i/o in index i_o_totalprice + 87 random i/o by pk in orders + 87 random i/o in index i_l_orderkey + 87 * 4 random i/o by pk in lineitem Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 87 * 4 lookups in F1 vs Gain: (87-13)=74 random i/o by pk in orders + (87*4-13)=325 random i/o by pk in lineitem How it can be executed in MyISAM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range access orders, get row orders_i and take o_orderkey_i from it 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i for each such key take RID_lineitem_j and check it against F1: if in: consider o_orders_i, RID_lineitem_j as possible match 1.4 for each matched pair o_orders_i, RID_lineitem_j access rows of lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 87 random i/o in orders 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in lineitem 2. lineitem -> orders How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 using range scan by the index i_o_totalprice collect PK for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take PK for this key pk_lineitem_i take row lineitem_i and take l_orderkey_i from this row 2.3 access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row check l_orderkey_i against F2 if in: consider lineitem_i, l_orderkey_i as possible match 2.4 for each matched pair lineitem_i, l_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 101 lookups in F2 2.4. 13 random i/o by pk in orders Cost of the current execution with this join order: 1 random i/o in index i_o_shipdate + 101 random i/o by pk in lineitem + 101 random i/o by pk in orders Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 101 lookups in F2 vs Gain: (101-13)=88 random i/o by pk in orders How it can be executed in MyISAM -------------------------------- 2.1 using range scan by the index i_o_totalprice collect RID for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take RID for this key rid_lineitem_i 2.3 access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row 2.4 look up into the primary index i_o_orderkey by l_orderkey_i get rid_orders_i and check rid_orders_i against F2 if in: consider lineitem_i, rid_orders_i as possible match 2.5 for each matched pair lineitem_i, rid_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 2.4. 101 random lookups in the pk index i_o_orderkey + 2.5. 13 random i/o by pk in orders |
Description |
For a given join query Q over tables T1,...,Tn a PK filter built for Ti
is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti belongs to F. If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial. Consider this query for the dbt3_s001 database (see mysql-test/include/dbt3_s001.inc). {noformat} SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01' AND o_totalprice > 200000; {noformat} Assume that we have indexes on lineitem(l_shipdate) and orders(o_totalprice) : i_l_shipdate and i_o_totalprice. total number of orders: 1500, number of orders where o_totalprice > 200000: 87 selectivity of the predicate: 87/1500=0.058 total number of lineitems: 6005 number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101 selectivity of the predicate: 101/6005=0.0168 There are two possible join orders to execute this query: 1. orders -> lineitem 2. lineitem ->orders 1. orders -> lineitem --------------------- How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect PK for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range take PK for this key pk_orders_i 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i for each such key take pk_lineitem_j and check it against F1: if in: consider pk_orders, pk_lineitem_i as possible match 1.4 for each matched pair pk_orders_i, pk_lineitem_j access rows of orders and lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i/o in index i_o_totalprice + 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in orders + 13 random i/o by pk in lineitem + Cost of the current execution with this join order: 1 random i/o in index i_o_totalprice + 87 random i/o by pk in orders + 87 random i/o in index i_l_orderkey + 87 * 4 random i/o by pk in lineitem Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 87 * 4 lookups in F1 vs Gain: (87-13)=74 random i/o by pk in orders + (87*4-13)=325 random i/o by pk in lineitem How it can be executed in MyISAM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range access orders, get row orders_i and take o_orderkey_i from it 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i for each such key take RID_lineitem_j and check it against F1: if in: consider o_orders_i, RID_lineitem_j as possible match 1.4 for each matched pair o_orders_i, RID_lineitem_j access rows of lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 87 random i/o in orders 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in lineitem 2. lineitem -> orders How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 using range scan by the index i_o_totalprice collect PK for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take PK for this key pk_lineitem_i take row lineitem_i and take l_orderkey_i from this row 2.3 access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row check l_orderkey_i against F2 if in: consider lineitem_i, l_orderkey_i as possible match 2.4 for each matched pair lineitem_i, l_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 101 lookups in F2 2.4. 13 random i/o by pk in orders Cost of the current execution with this join order: 1 random i/o in index i_o_shipdate + 101 random i/o by pk in lineitem + 101 random i/o by pk in orders Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 101 lookups in F2 vs Gain: (101-13)=88 random i/o by pk in orders How it can be executed in MyISAM -------------------------------- 2.1 using range scan by the index i_o_totalprice collect RID for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take RID for this key rid_lineitem_i 2.3 access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row 2.4 look up into the primary index i_o_orderkey by l_orderkey_i get rid_orders_i and check rid_orders_i against F2 if in: consider lineitem_i, rid_orders_i as possible match 2.5 for each matched pair lineitem_i, rid_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 2.4. 101 random lookups in the pk index i_o_orderkey + 2.5. 13 random i/o by pk in orders |
For a given join query Q over tables T1,...,Tn a PK filter built for Ti is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti is a subset of all records from Ti whose primary keys are contained in F
If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial. Consider this query for the dbt3_s001 database (see mysql-test/include/dbt3_s001.inc). {noformat} SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01' AND o_totalprice > 200000; {noformat} Assume that we have indexes on lineitem(l_shipdate) and orders(o_totalprice) : i_l_shipdate and i_o_totalprice. total number of orders: 1500, number of orders where o_totalprice > 200000: 87 selectivity of the predicate: 87/1500=0.058 total number of lineitems: 6005 number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101 selectivity of the predicate: 101/6005=0.0168 There are two possible join orders to execute this query: 1. orders -> lineitem 2. lineitem ->orders 1. orders -> lineitem --------------------- How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect PK for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range take PK for this key pk_orders_i 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i for each such key take pk_lineitem_j and check it against F1: if in: consider pk_orders, pk_lineitem_i as possible match 1.4 for each matched pair pk_orders_i, pk_lineitem_j access rows of orders and lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i/o in index i_o_totalprice + 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in orders + 13 random i/o by pk in lineitem + Cost of the current execution with this join order: 1 random i/o in index i_o_totalprice + 87 random i/o by pk in orders + 87 random i/o in index i_l_orderkey + 87 * 4 random i/o by pk in lineitem Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 87 * 4 lookups in F1 vs Gain: (87-13)=74 random i/o by pk in orders + (87*4-13)=325 random i/o by pk in lineitem How it can be executed in MyISAM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1 using range scan by the index i_l_shipdate collect for the rows from this range into a container (e.g. ordered array) F1 1.2 use range scan by i_o_totalprice. for each key from this range access orders, get row orders_i and take o_orderkey_i from it 1.3 using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i for each such key take RID_lineitem_j and check it against F1: if in: consider o_orders_i, RID_lineitem_j as possible match 1.4 for each matched pair o_orders_i, RID_lineitem_j access rows of lineitem Cost: 1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 1.2. 1 random i_o in index i_o_totalprice + 87 random i/o in orders 1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 + 1.4. 13 random i/o by pk in lineitem 2. lineitem -> orders How it can be executed in InnoDB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 using range scan by the index i_o_totalprice collect PK for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take PK for this key pk_lineitem_i take row lineitem_i and take l_orderkey_i from this row 2.3 access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row check l_orderkey_i against F2 if in: consider lineitem_i, l_orderkey_i as possible match 2.4 for each matched pair lineitem_i, l_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 101 lookups in F2 2.4. 13 random i/o by pk in orders Cost of the current execution with this join order: 1 random i/o in index i_o_shipdate + 101 random i/o by pk in lineitem + 101 random i/o by pk in orders Loss: 1 random i/o in index i_l_shipdate + creation of F1 (in memory) + 101 lookups in F2 vs Gain: (101-13)=88 random i/o by pk in orders How it can be executed in MyISAM -------------------------------- 2.1 using range scan by the index i_o_totalprice collect RID for the rows from this range into a container (e.g. ordered array) F2 2.2 use range scan by i_l_shipdate. for each key from this range take RID for this key rid_lineitem_i 2.3 access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row 2.4 look up into the primary index i_o_orderkey by l_orderkey_i get rid_orders_i and check rid_orders_i against F2 if in: consider lineitem_i, rid_orders_i as possible match 2.5 for each matched pair lineitem_i, rid_orderkey_i access row of orders Cost: 2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) + 2.2. 1 random i_o in index i_l_shipdate + 2.3. 101 random i/o by pk in lineitem + 2.4. 101 random lookups in the pk index i_o_orderkey + 2.5. 13 random i/o by pk in orders |
Assignee | Galina Shalygina [ shagalla ] | Igor Babaev [ igor ] |
Link |
This issue includes |
Link |
This issue is blocked by |
Link | This issue causes MDEV-18477 [ MDEV-18477 ] |
Fix Version/s | 10.4.3 [ 23230 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue causes |
Link |
This issue relates to |
Link | This issue relates to MENT-635 [ MENT-635 ] |
Link |
This issue relates to |
Link |
This issue causes |
Link |
This issue relates to |
Workflow | MariaDB v3 [ 87272 ] | MariaDB v4 [ 133547 ] |
Link |
This issue causes |
Link |
This issue relates to |
The patches for this task were pushed into 10.4