Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
One approach that one can use is to test the query optimizer is:
Add a capability for optimizer to generate "random" query plans (to be defined below).
Given a query, generate a few random query plans and see how they compare to the non-random query plan the optimizer generates.
Test dataset and query
It is clear that random query plans will beat the non-random one when the query+dataset contain some properties that the optimizer doesn't take into account:
- data correlation
- violation of "inclusion assumption"
- etc.
This means test database and query should avoid having this.
What is a random query plan
"Random" here means a query plan that is
- correct for this query
- Has one or more differences from what the optimizer would pick by default: some table uses a different access method, a join uses a different join order and/or different subquery strategy, etc.
- Other than the above, the query plan is "reasonable": plan refinement stages are done with an aim to produce the best query plan. Parts of the query where we don't have differences are optimized by default.
Test setup
- We should measure execution speed (or may be #cpu-instructions or #rows-read).
- We should limit total query execution time
- To weed out the noise: do N query runs, pick the best for comparison. When comparing random to non-random, speed difference of 2x is required to declare the queries unequal.
How to pick a random access method from the available.
Suppose the optimizer considers N access methods and we want to pick the random one.
Currently the process is
best= null;
|
while (X= get_next_access_method()) {
|
if (!best || x.cost < best.cost)
|
best= X;
|
}
|
For picking a random method, giving equal chance to each we can use this:
best= null;
|
n=0;
|
while (X = get_next_access_method()) {
|
n++;
|
if ( toss_n_way_dice(n) == 0)
|
best= X;
|
}
|
 |
int toss_n_way_dice(n) {
|
if (n==1) return 0; // 1-sided "dice" has one side
|
// return a number from [0...n-1] range, each number having the same probability 1/n
|
}
|