[MDEV-20819] index_merge/[sort_]intersect should not scan non-overlapping rowid ranges - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Optimizer
Labels:
- index_merge

Description

This is a followup to the discussion here:
https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html

The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans.

sort-intersect algorithm

The idea is:

Start computing the intersection with the quick select that is the most selective.
Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead)

When starting a scan on a non-first range access, attach the [min_rowid, max_rowid]
to each range.

Limitation #1: the range must use all keyparts

if an index is defined as IDX(col1, col2, col3)

Then it actually is

  IDX(col1, col2, col3, rowid)

so ranges like

(c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3)

can be changed into

(c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID)

This is only possible when the range endpoints include col3. If this is not the case, we can't do that.

Limitation #2: No open endpoints

Consider a key IDX(col1) and a range:

"col1 > 10"

The range starts with col1=11 and does not include any rows with col1=10.

If we add the rowid, we will get

(col1, rowid) > (10, rowid_ptr)

which does include rows with col1=10, which will make it worse, not better.

ROR-intersect algorithm

For each select, get the first row, get its rowid. Find the biggest rowid $R_START.
Re-start the quick select scans, adding rowid $R_START to the left bound.

ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means

"Range must use all keyparts" limitation is met.
"No open endpoints" is also met.

Other issues

under what condition can SQL layer perform index reads with index lookup tuples that have rowids attached to their ends?
When the storage engine supports "Extended Keys" and there is an explicitly-defined PK. What about tables with no explicitly-defined PK?
(Q: what about partitioned tables? Is this allowed for them?)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

index-merge1.jpg
200 kB
2019-10-13 15:21

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia created issue - 2019-10-13 14:49

Sergei Petrunia made changes - 2019-10-13 15:05

Field	Original Value	New Value
Description	This is a followup to the discussion here: https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans. h3. sort-intersect algorithm The idea is: * Start computing the intersection with the quick select that is the most selective. * Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead) * When starting a scan on a non-first range access, attach the {{\[min_rowid, max_rowid\]}} to each range. h4. Limitation #1: range must use all keyparts if an index is defined as {{IDX(col1, col2, col3)}} Then it actually is {code} IDX(col1, col2, col3, rowid) {code} so ranges like {code} (c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3) {code} can be changed into {code} (c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID) {code} This is only possible when the range endpoints include col3. If this is not the case, we can't do that. h4. Limitation #2: No open endpoints Consider a key IDX(col1) and a range: {code} "col1 > 10" {code} The range starts with col1=11 and does not include any rows with col1=10. If we add the rowid, we will get {code} (col1, rowid) > (10, rowid_ptr) {code} which does include rows with col1=10, which will make it worse, not better. h3. ROR-intersect algorithm * For each select, get the first row, get its rowid. Find the biggest rowid $R_START. * Re-start the quick select scans, adding rowid $R_START to the left bound. ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means * "Range must use all keyparts" limitation is met. *- "No open endpoints" is also met.	This is a followup to the discussion here: https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans. h3. sort-intersect algorithm The idea is: * Start computing the intersection with the quick select that is the most selective. * Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead) * When starting a scan on a non-first range access, attach the {{\[min_rowid, max_rowid\]}} to each range. h4. Limitation #1: range must use all keyparts if an index is defined as {{IDX(col1, col2, col3)}} Then it actually is {code} IDX(col1, col2, col3, rowid) {code} so ranges like {code} (c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3) {code} can be changed into {code} (c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID) {code} This is only possible when the range endpoints include col3. If this is not the case, we can't do that. h4. Limitation #2: No open endpoints Consider a key IDX(col1) and a range: {code} "col1 > 10" {code} The range starts with col1=11 and does not include any rows with col1=10. If we add the rowid, we will get {code} (col1, rowid) > (10, rowid_ptr) {code} which does include rows with col1=10, which will make it worse, not better. h3. ROR-intersect algorithm * For each select, get the first row, get its rowid. Find the biggest rowid $R_START. * Re-start the quick select scans, adding rowid $R_START to the left bound. ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means * "Range must use all keyparts" limitation is met. * "No open endpoints" is also met. h3. Other issues under what condition can SQL layer perform index reads with index lookup tuples that have rowids attached to their ends? When the storage engine supports "Extended Keys" and there is an explicitly-defined PK. What about tables with no explicitly-defined PK? (Q: what about partitioned tables? Is this allowed for them?)

Sergei Petrunia made changes - 2019-10-13 15:21

Attachment

index-merge1.jpg [ 49229 ]

Sergei Petrunia made changes - 2019-10-13 15:21

Labels

index_merge

Sergei Petrunia made changes - 2019-10-13 19:28

Description

This is a followup to the discussion here:
https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html

The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans.

h3. sort-intersect algorithm

The idea is:
* Start computing the intersection with the quick select that is the most selective.
* Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead)

* When starting a scan on a non-first range access, attach the {{\[min_rowid, max_rowid\]}}
to each range.

h4. Limitation #1: range must use all keyparts

if an index is defined as {{IDX(col1, col2, col3)}}

Then it actually is

{code}
IDX(col1, col2, col3, rowid)
{code}
so ranges like
{code}
(c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3)
{code}
can be changed into
{code}
(c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID)
{code}
This is only possible when the range endpoints include col3. If this is not the case, we can't do that.

h4. Limitation #2: No open endpoints

Consider a key IDX(col1) and a range:
{code}
"col1 > 10"
{code}
The range starts with col1=11 and does not include any rows with col1=10.

If we add the rowid, we will get
{code}
(col1, rowid) > (10, rowid_ptr)
{code}

which *does* include rows with col1=10, which will make it worse, not better.

h3. ROR-intersect algorithm

* For each select, get the first row, get its rowid. Find the biggest rowid $R_START.
* Re-start the quick select scans, adding rowid $R_START to the left bound.

ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means
* "Range must use all keyparts" limitation is met.
* "No open endpoints" is also met.

h3. Other issues
under what condition can SQL layer perform index reads with index lookup tuples that have rowids attached to their ends?
When the storage engine supports "Extended Keys" and there is an explicitly-defined PK. What about tables with no explicitly-defined PK?
(Q: what about partitioned tables? Is this allowed for them?)

This is a followup to the discussion here:
https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html

The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans.

h3. sort-intersect algorithm

The idea is:
* Start computing the intersection with the quick select that is the most selective.
* Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead)

* When starting a scan on a non-first range access, attach the {{\[min_rowid, max_rowid\]}}
to each range.

h4. Limitation #1: the range must use all keyparts

if an index is defined as {{IDX(col1, col2, col3)}}

Then it actually is

{code}
IDX(col1, col2, col3, rowid)
{code}
so ranges like
{code}
(c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3)
{code}
can be changed into
{code}
(c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID)
{code}
This is only possible when the range endpoints include col3. If this is not the case, we can't do that.

h4. Limitation #2: No open endpoints

Consider a key IDX(col1) and a range:
{code}
"col1 > 10"
{code}
The range starts with col1=11 and does not include any rows with col1=10.

If we add the rowid, we will get
{code}
(col1, rowid) > (10, rowid_ptr)
{code}

which *does* include rows with col1=10, which will make it worse, not better.

h3. ROR-intersect algorithm

* For each select, get the first row, get its rowid. Find the biggest rowid $R_START.
* Re-start the quick select scans, adding rowid $R_START to the left bound.

ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means
* "Range must use all keyparts" limitation is met.
* "No open endpoints" is also met.

h3. Other issues
under what condition can SQL layer perform index reads with index lookup tuples that have rowids attached to their ends?
When the storage engine supports "Extended Keys" and there is an explicitly-defined PK. What about tables with no explicitly-defined PK?
(Q: what about partitioned tables? Is this allowed for them?)

Julien Fritsch made changes - 2020-01-23 13:58

Fixing Priority

250

Sergei Golubchik made changes - 2020-02-18 11:45

Fix Version/s

10.5 [ 23123 ]

Sergei Golubchik made changes - 2021-12-06 21:21

Workflow

MariaDB v3 [ 100382 ]

MariaDB v4 [ 131172 ]

People

Assignee:: Unassigned

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2019-10-13 14:49

Updated:: 2020-02-18 11:45

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

sort-intersect algorithm

Limitation #1: the range must use all keyparts

Limitation #2: No open endpoints

ROR-intersect algorithm

Other issues

Attachments

Attachments

Activity

People

Dates

Git Integration