# Take into account the selectivity of single-table range predicates on non-indexed columns when searching for the best execution plan

XMLWordPrintable

#### Details

• Status: Closed
• Major
• Resolution: Fixed
• None
• None

#### Description

Description of the problem
==========================

Currently the optimizer completely ignores any conditions on non-indexed columns when searching for the best execution plan. As a result in many cases it chooses a suboptimal plan.

Let's consider the query:
SELECT * FROM t1,t2 WHERE t1.a=t2.a and t2.b BETWEEN 1 AND 3
assuming that:
-table t1 contains 100 records
-table t2 contains 1000 records
-there is a primary index on t1(a)
-there is a secondary index on t2(a)
-there is no index defined on column t2.b
-the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Under these conditions currently the optimizer will choose the plan that

• accesses t1 using a table scan
• accesses t2 using index t2(a)
• checks the condition t2.b BETWEEN 1 AND 3

This plan examines all rows of both tables and it performs 100 index look-ups.

The alternative plan that:

• accesses table t2 in a table scan
• checks the condition t2.b BETWEEN 1 AND 3
• accesses t1 using index t1(a)
also would examine all rows of t2, but it performs only ~10 look-ups
to access ~10 rows of table t1.

The second plan is expected to be more efficient and it is so.

The plan to resolve the problem
===============================

1. Build single-table range conditions over different columns of all tables of the join query.

2. Calculate the selectivity of each such condition using persistent statistics on table columns.

3. Take this statistics into account when calculated the cardinality of each partial join that are evaluated by the optimizer.

#### People

Igor Babaev
Igor Babaev