[MDEV-16188] Use in-memory PK filters built from range index scans Created: 2018-05-16  Updated: 2022-08-23  Resolved: 2019-02-24

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 10.4.3

Type: Task Priority: Major
Reporter: Igor Babaev Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-18332 Tests for PK filters Closed
PartOf
includes MDEV-18144 ANALYZE for statement support for PK ... Closed
Problem/Incident
causes MDEV-18477 Implement FNV-1a hash function Stalled
causes MDEV-20407 mysqld got signal 11; rowid filter Closed
causes MDEV-21040 InnoDB: fulltext search with ` in(...... Closed
causes MDEV-21191 MariaDB crashes executing SELECT requ... Closed
causes MDEV-21535 Unnecessarily large ha_innobase::reco... Closed
causes MDEV-23861 Execution plan Closed
causes MDEV-28846 Poor performance when rowid filter co... Closed
Relates
relates to MDEV-19919 Assertion `!prebuilt->index->is_prima... Closed
relates to MDEV-20056 Assertion `!prebuilt->index->is_prima... Closed
relates to MDEV-21794 Optimizer flag rowid_filter leads to ... Closed
relates to MDEV-23035 INNODB assertion failure -> crash Closed
relates to MDEV-22761 InnoDB: Assertion failure row0sel.cc... Closed

 Description   

For a given join query Q over tables T1,...,Tn a PK filter built for Ti is a collection of primary keys of Ti F such that the projection of the result set of Q on table Ti is a subset of all records from Ti whose primary keys are contained in F

If the cardinality of filter F is small enough in comparison with the cardinality of Ti and F can be easily built then using the filter to execute Q may be quite beneficial.

Consider this query for the dbt3_s001 database
(see mysql-test/include/dbt3_s001.inc).

SELECT * FROM orders JOIN lineitem ON o_orderkey=l_orderkey
WHERE  l_shipdate BETWEEN '1997-01-01' AND '1997-02-01'
               AND
               o_totalprice > 200000;

Assume that we have indexes on lineitem(l_shipdate) and
orders(o_totalprice) : i_l_shipdate and i_o_totalprice.

total number of orders: 1500,
number of orders where o_totalprice > 200000: 87
selectivity of the predicate: 87/1500=0.058

total number of lineitems: 6005
number of lineites where l_shipdate BETWEEN '1997-01-01' AND '1997-02-01': 101
selectivity of the predicate: 101/6005=0.0168

There are two possible join orders to execute this query:

1. orders -> lineitem
2. lineitem ->orders

1. orders -> lineitem
---------------------

How it can be executed in InnoDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1.1
using range scan by the index i_l_shipdate collect PK for the rows
from this range into a container (e.g. ordered array) F1

1.2
use range scan by i_o_totalprice.
for each key from this range take PK for this key pk_orders_i

1.3
using foreign key i_l_orderkey scan keys with l_orderkey=pk_orders_i
for each such key take pk_lineitem_j and check it against F1:
if in: consider pk_orders, pk_lineitem_i as possible match

1.4
for each matched pair pk_orders_i, pk_lineitem_j
access rows of orders and lineitem

Cost:
1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
1.2. 1 random i/o in index i_o_totalprice +
1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 +
1.4. 13 random i/o by pk in orders +
13 random i/o by pk in lineitem +

Cost of the current execution with this join order:
1 random i/o in index i_o_totalprice +
87 random i/o by pk in orders +
87 random i/o in index i_l_orderkey +
87 * 4 random i/o by pk in lineitem

Loss:
1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
87 * 4 lookups in F1
vs
Gain:
(87-13)=74 random i/o by pk in orders +
(87*4-13)=325 random i/o by pk in lineitem

How it can be executed in MyISAM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1.1
using range scan by the index i_l_shipdate collect for the rows
from this range into a container (e.g. ordered array) F1

1.2
use range scan by i_o_totalprice.
for each key from this range access orders, get row orders_i
and take o_orderkey_i from it

1.3
using foreign key i_l_orderkey scan keys with l_orderkey=o_orderkr_i
for each such key take RID_lineitem_j and check it against F1:
if in: consider o_orders_i, RID_lineitem_j as possible match

1.4 for each matched pair o_orders_i, RID_lineitem_j
access rows of lineitem

Cost:
1.1. 1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
1.2. 1 random i_o in index i_o_totalprice +
87 random i/o in orders
1.3. 87 random i/o in index i_l_orderkey + 87 * 4 lookups in F1 +
1.4. 13 random i/o by pk in lineitem

2. lineitem -> orders

How it can be executed in InnoDB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2.1
using range scan by the index i_o_totalprice collect PK for the rows
from this range into a container (e.g. ordered array) F2

2.2
use range scan by i_l_shipdate.
for each key from this range take PK for this key pk_lineitem_i
take row lineitem_i and take l_orderkey_i from this row

2.3
access row lineitem_i by pk_lineitem_i and take l_orderkey_i from this row
check l_orderkey_i against F2
if in: consider lineitem_i, l_orderkey_i as possible match

2.4
for each matched pair lineitem_i, l_orderkey_i
access row of orders

Cost:
2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) +
2.2. 1 random i_o in index i_l_shipdate +
2.3. 101 random i/o by pk in lineitem + 101 lookups in F2
2.4. 13 random i/o by pk in orders

Cost of the current execution with this join order:
1 random i/o in index i_o_shipdate +
101 random i/o by pk in lineitem +
101 random i/o by pk in orders

Loss:
1 random i/o in index i_l_shipdate + creation of F1 (in memory) +
101 lookups in F2
vs
Gain:
(101-13)=88 random i/o by pk in orders

How it can be executed in MyISAM
--------------------------------

2.1
using range scan by the index i_o_totalprice collect RID for the rows
from this range into a container (e.g. ordered array) F2

2.2
use range scan by i_l_shipdate.
for each key from this range take RID for this key rid_lineitem_i

2.3
access row lineitem_i by rid_lineitem_i and take l_orderkey_i from this row

2.4
look up into the primary index i_o_orderkey by l_orderkey_i
get rid_orders_i and check rid_orders_i against F2
if in: consider lineitem_i, rid_orders_i as possible match

2.5
for each matched pair lineitem_i, rid_orderkey_i
access row of orders

Cost:
2.1. 1 random i/o in index i_o_totalprice + creation of F2 (in memory) +
2.2. 1 random i_o in index i_l_shipdate +
2.3. 101 random i/o by pk in lineitem +
2.4. 101 random lookups in the pk index i_o_orderkey +
2.5. 13 random i/o by pk in orders



 Comments   
Comment by Igor Babaev [ 2019-02-24 ]

The patches for this task were pushed into 10.4

Generated at Thu Feb 08 08:27:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.