Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20819

index_merge/[sort_]intersect should not scan non-overlapping rowid ranges

Details

    Description

      This is a followup to the discussion here:
      https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html

      The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans.

      sort-intersect algorithm

      The idea is:

      • Start computing the intersection with the quick select that is the most selective.
      • Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead)
      • When starting a scan on a non-first range access, attach the [min_rowid, max_rowid]
        to each range.

      Limitation #1: the range must use all keyparts

      if an index is defined as IDX(col1, col2, col3)

      Then it actually is

        IDX(col1, col2, col3, rowid)
      

      so ranges like

      (c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3)
      

      can be changed into

      (c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID)
      

      This is only possible when the range endpoints include col3. If this is not the case, we can't do that.

      Limitation #2: No open endpoints

      Consider a key IDX(col1) and a range:

      "col1 > 10"
      

      The range starts with col1=11 and does not include any rows with col1=10.

      If we add the rowid, we will get

      (col1, rowid) > (10, rowid_ptr)
      

      which does include rows with col1=10, which will make it worse, not better.

      ROR-intersect algorithm

      • For each select, get the first row, get its rowid. Find the biggest rowid $R_START.
      • Re-start the quick select scans, adding rowid $R_START to the left bound.

      ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means

      • "Range must use all keyparts" limitation is met.
      • "No open endpoints" is also met.

      Other issues

      under what condition can SQL layer perform index reads with index lookup tuples that have rowids attached to their ends?
      When the storage engine supports "Extended Keys" and there is an explicitly-defined PK. What about tables with no explicitly-defined PK?
      (Q: what about partitioned tables? Is this allowed for them?)

      Attachments

        Activity

          psergei Sergei Petrunia created issue -
          psergei Sergei Petrunia made changes -
          Field Original Value New Value
          Description This is a followup to the discussion here:
          https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html

          The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans.

          h3. sort-intersect algorithm

          The idea is:
          * Start computing the intersection with the quick select that is the most selective.
          * Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead)

          * When starting a scan on a non-first range access, attach the {{\[min_rowid, max_rowid\]}}
            to each range.

          h4. Limitation #1: range must use all keyparts

          if an index is defined as {{IDX(col1, col2, col3)}}

          Then it actually is

          {code}
            IDX(col1, col2, col3, rowid)
          {code}
          so ranges like
          {code}
          (c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3)
          {code}
          can be changed into
          {code}
          (c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID)
          {code}
          This is only possible when the range endpoints include col3. If this is not the case, we can't do that.

          h4. Limitation #2: No open endpoints

          Consider a key IDX(col1) and a range:
          {code}
          "col1 > 10"
          {code}
          The range starts with col1=11 and does not include any rows with col1=10.

          If we add the rowid, we will get
          {code}
          (col1, rowid) > (10, rowid_ptr)
          {code}

          which *does* include rows with col1=10, which will make it worse, not better.

          h3. ROR-intersect algorithm

          * For each select, get the first row, get its rowid. Find the biggest rowid $R_START.
          * Re-start the quick select scans, adding rowid $R_START to the left bound.

          ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means
          * "Range must use all keyparts" limitation is met.
          *- "No open endpoints" is also met.
          This is a followup to the discussion here:
          https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html

          The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans.

          h3. sort-intersect algorithm

          The idea is:
          * Start computing the intersection with the quick select that is the most selective.
          * Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead)

          * When starting a scan on a non-first range access, attach the {{\[min_rowid, max_rowid\]}}
            to each range.

          h4. Limitation #1: range must use all keyparts

          if an index is defined as {{IDX(col1, col2, col3)}}

          Then it actually is

          {code}
            IDX(col1, col2, col3, rowid)
          {code}
          so ranges like
          {code}
          (c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3)
          {code}
          can be changed into
          {code}
          (c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID)
          {code}
          This is only possible when the range endpoints include col3. If this is not the case, we can't do that.

          h4. Limitation #2: No open endpoints

          Consider a key IDX(col1) and a range:
          {code}
          "col1 > 10"
          {code}
          The range starts with col1=11 and does not include any rows with col1=10.

          If we add the rowid, we will get
          {code}
          (col1, rowid) > (10, rowid_ptr)
          {code}

          which *does* include rows with col1=10, which will make it worse, not better.

          h3. ROR-intersect algorithm

          * For each select, get the first row, get its rowid. Find the biggest rowid $R_START.
          * Re-start the quick select scans, adding rowid $R_START to the left bound.

          ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means
          * "Range must use all keyparts" limitation is met.
          * "No open endpoints" is also met.

          h3. Other issues
          under what condition can SQL layer perform index reads with index lookup tuples that have rowids attached to their ends?
          When the storage engine supports "Extended Keys" and there is an explicitly-defined PK. What about tables with no explicitly-defined PK?
          (Q: what about partitioned tables? Is this allowed for them?)
          psergei Sergei Petrunia made changes -
          Attachment index-merge1.jpg [ 49229 ]
          psergei Sergei Petrunia made changes -
          Labels index_merge
          psergei Sergei Petrunia made changes -
          Description This is a followup to the discussion here:
          https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html

          The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans.

          h3. sort-intersect algorithm

          The idea is:
          * Start computing the intersection with the quick select that is the most selective.
          * Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead)

          * When starting a scan on a non-first range access, attach the {{\[min_rowid, max_rowid\]}}
            to each range.

          h4. Limitation #1: range must use all keyparts

          if an index is defined as {{IDX(col1, col2, col3)}}

          Then it actually is

          {code}
            IDX(col1, col2, col3, rowid)
          {code}
          so ranges like
          {code}
          (c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3)
          {code}
          can be changed into
          {code}
          (c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID)
          {code}
          This is only possible when the range endpoints include col3. If this is not the case, we can't do that.

          h4. Limitation #2: No open endpoints

          Consider a key IDX(col1) and a range:
          {code}
          "col1 > 10"
          {code}
          The range starts with col1=11 and does not include any rows with col1=10.

          If we add the rowid, we will get
          {code}
          (col1, rowid) > (10, rowid_ptr)
          {code}

          which *does* include rows with col1=10, which will make it worse, not better.

          h3. ROR-intersect algorithm

          * For each select, get the first row, get its rowid. Find the biggest rowid $R_START.
          * Re-start the quick select scans, adding rowid $R_START to the left bound.

          ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means
          * "Range must use all keyparts" limitation is met.
          * "No open endpoints" is also met.

          h3. Other issues
          under what condition can SQL layer perform index reads with index lookup tuples that have rowids attached to their ends?
          When the storage engine supports "Extended Keys" and there is an explicitly-defined PK. What about tables with no explicitly-defined PK?
          (Q: what about partitioned tables? Is this allowed for them?)
          This is a followup to the discussion here:
          https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11521.html

          The idea is that index_merge intersection and sort-intersection can avoid scanning the ranges of rowids that do not overlap with ranges of rowids produced by other scans.

          h3. sort-intersect algorithm

          The idea is:
          * Start computing the intersection with the quick select that is the most selective.
          * Modify the Unique object to keep min/max rowid it has seen (This can be done with very little extra overhead)

          * When starting a scan on a non-first range access, attach the {{\[min_rowid, max_rowid\]}}
            to each range.

          h4. Limitation #1: the range must use all keyparts

          if an index is defined as {{IDX(col1, col2, col3)}}

          Then it actually is

          {code}
            IDX(col1, col2, col3, rowid)
          {code}
          so ranges like
          {code}
          (c1min, c2min, c3min) <= (col1,col2,col3) <= (const1,const2,const3)
          {code}
          can be changed into
          {code}
          (c1min, c2min, c3min, MIN_ROWID) <= (col1,col2,col3, rowid) <= (const1,const2,const3, MAX_ROWID)
          {code}
          This is only possible when the range endpoints include col3. If this is not the case, we can't do that.

          h4. Limitation #2: No open endpoints

          Consider a key IDX(col1) and a range:
          {code}
          "col1 > 10"
          {code}
          The range starts with col1=11 and does not include any rows with col1=10.

          If we add the rowid, we will get
          {code}
          (col1, rowid) > (10, rowid_ptr)
          {code}

          which *does* include rows with col1=10, which will make it worse, not better.

          h3. ROR-intersect algorithm

          * For each select, get the first row, get its rowid. Find the biggest rowid $R_START.
          * Re-start the quick select scans, adding rowid $R_START to the left bound.

          ROR ranges always have the form "t.key=const" (where equalities on all key parts are present), which means
          * "Range must use all keyparts" limitation is met.
          * "No open endpoints" is also met.

          h3. Other issues
          under what condition can SQL layer perform index reads with index lookup tuples that have rowids attached to their ends?
          When the storage engine supports "Extended Keys" and there is an explicitly-defined PK. What about tables with no explicitly-defined PK?
          (Q: what about partitioned tables? Is this allowed for them?)
          julien.fritsch Julien Fritsch made changes -
          Fixing Priority 250
          serg Sergei Golubchik made changes -
          Fix Version/s 10.5 [ 23123 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 100382 ] MariaDB v4 [ 131172 ]

          People

            Unassigned Unassigned
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.