[MDEV-4145] Take into account the selectivity of single-table range predicates on non-indexed columns when searching for the best execution plan Created: 2012-08-14  Updated: 2013-10-29  Resolved: 2013-06-05

Status: Closed
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Igor Babaev Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: mwl#253

Issue Links:
Blocks
blocks MDEV-83 Cost-based choice for the pushdown of... Stalled
Relates
relates to MDEV-4348 MWL#253: Server crashes in Item_equal... Closed
relates to MDEV-4349 MWL#253: Server crashes in calculate_... Closed
relates to MDEV-4350 Explain shows negative selectivity fo... Closed
relates to MDEV-4357 Histogram_type is not saved in the hi... Closed
relates to MDEV-4359 Unable to create a DOUBLE_PREC_HB his... Closed
relates to MDEV-4360 ANALYZE shows "Table is already up to... Closed
relates to MDEV-4362 Selectivity estimates for IN (...) do... Closed
relates to MDEV-4363 Bad selectivity for col IS NULL OR co... Closed
relates to MDEV-4364 Histograms show the same selectivity ... Closed
relates to MDEV-4366 MWL#253: Server crashes in calculate_... Closed
relates to MDEV-4367 MWL#253: Assertion `join->best_read <... Closed
relates to MDEV-4369 column_stats.histogram contents doesn... Closed
relates to MDEV-4370 MWL#253: Server crashes in Histogram:... Closed
relates to MDEV-4371 MWL#253: Assertion `join->best_read <... Closed
relates to MDEV-4372 MWL#253: Server crashes in calculate_... Closed
relates to MDEV-4373 MWL#253: Valgrind warnings 'Use of un... Closed
relates to MDEV-4378 MWL#253: Assertion `join->best_read <... Closed
relates to MDEV-4380 MWL#253: Server hangs in table_cond_s... Closed
relates to MDEV-4389 MWL#253: Assertion `join->best_read <... Closed
relates to MDEV-5200 Server crashes in table_multi_eq_cond... Closed

 Description   

Description of the problem
==========================

Currently the optimizer completely ignores any conditions on non-indexed columns when searching for the best execution plan. As a result in many cases it chooses a suboptimal plan.

Let's consider the query:
SELECT * FROM t1,t2 WHERE t1.a=t2.a and t2.b BETWEEN 1 AND 3
assuming that:
-table t1 contains 100 records
-table t2 contains 1000 records
-there is a primary index on t1(a)
-there is a secondary index on t2(a)
-there is no index defined on column t2.b
-the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Under these conditions currently the optimizer will choose the plan that

  • accesses t1 using a table scan
  • accesses t2 using index t2(a)
  • checks the condition t2.b BETWEEN 1 AND 3

This plan examines all rows of both tables and it performs 100 index look-ups.

The alternative plan that:

  • accesses table t2 in a table scan
  • checks the condition t2.b BETWEEN 1 AND 3
  • accesses t1 using index t1(a)
    also would examine all rows of t2, but it performs only ~10 look-ups
    to access ~10 rows of table t1.

The second plan is expected to be more efficient and it is so.

The plan to resolve the problem
===============================

1. Build single-table range conditions over different columns of all tables of the join query.

2. Calculate the selectivity of each such condition using persistent statistics on table columns.

3. Take this statistics into account when calculated the cardinality of each partial join that are evaluated by the optimizer.



 Comments   
Comment by Igor Babaev [ 2013-06-05 ]

The task was pushed into 10.0-base and appeared in 10.0.2

Comment by roberto spadim [ 2013-06-10 ]

please consider doing this: MDEV-4419 too, since ENUM() values is non-indexed but a very good information about values that should be in fields, example ENUM('a','b','c'), will never have a 'x' value, but can have a '' value or NULL value

Generated at Thu Feb 08 06:54:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.