Details
-
New Feature
-
Status: In Review (View Workflow)
-
Critical
-
Resolution: Unresolved
Description
An index on expression means something like
CREATE TABLE t1 (a int, b int, INDEX (a/2+b)); |
...
|
SELECT * FROM t1 WHERE a/2+b=100; |
in this case the optimizer should be able to use an index.
This task naturally splits in two steps:
- add expression matching into the optimizer, use it for generated columns. Like in
CREATE TABLE t1 (a int, b int, c INT GENERATED ALWAYS AS (a/2+b), INDEX (c)); - support the syntax to create an index on expression directly, this will automatically create a hidden generated column under the hood
original task description is visible in the history
Details about the patch in the Pull Request
contents
SQL Syntax changes
|
Optimizer support
|
ORDER BY is not handled
|
Optimizer support - details
|
Problems in the patch
|
Minor issues
|
Details about the patch in the https://github.com/MariaDB/server/pull/1381 :
SQL Syntax changes
It adds syntax to support creating indexes directly on expressions: One can do this:
CREATE TABLE t ( |
...
|
INDEX(col1+col2) |
);
|
or this:
alter table t1 add index ((a*2-b)); |
In current MariaDB, one has to define the column first:
CREATE TABLE t ( |
col1 int, col2 int, |
...
|
col3 int as (col1+col2) virtual, |
INDEX(col3) |
);
|
(Interesting, the datatype for the indexed column is auto-inferred?)
(Note that the patch is incomplete - e.g. SHOW CREATE TABLE doesn't work correctly)
Optimizer support
The basic approach is as follows:
- Create a clone of a WHERE/ON clause.
- In the copy, search for expressions identical to virtual column definition and replace them with Item_field referring to the virtual column.
- If we've made any such substitution(s), replace the origin WHERE/ON expression with orig_expression AND clone_with_modifications.
This rewrite is performed for all WHERE/ON clauses and for all indexed virtual columns
that are in the scope. No speedups seem to be employed, we just use nested loops:
for each indexed virtual column $VCOL in any table {
|
for each WHERE/ON clause $COND {
|
// note that we didn't even check used_tables() to see if $COND can have references to $VCOL
|
}
|
}
|
The result will typically contain clauses with duplication.
After this, the optimizer can process any sargable uses of virtual_column_name in a regular way.
ORDER BY is not handled
Note that this doesn't handle ORDER BY LIMIT.
The following will not be able to skip sorting:
ALTER TABLE t1 ADD INDEX(FUNC(t1.col)); |
SELECT * FROM t1 ORDER BY FUNC(t1.col) LIMIT 1; |
Optimizer support - details
- there is something fancy done about cloning items. Something like "We can't clone Item_subselect so let the clone have the same Item_subselect* object".
- TODO: would this work with VIEWs?
Problems in the patch
Duplication of WHERE/ON clauses looks like an overkill.
Modifying the clone() call to clone some items but not others seems like a recipe for bugs.
(The following "law" was suggested by Igor: an Item tree may not have multiple references to the same Item* object. There are some exceptions to this: one can refer to the same item through Item_direct_view_ref objects, one can refer to Items representing constant literals, etc.
These exceptions do not cover directly referring to the same Item_subselect object )
Minor issues
The tree with the patch doesn't pass test suite, it crashes in the bootstrap.
SHOW CREATE TABLE doesn't work correctly.
Alternative suggestion 1 - make optimizer recognize function calls
How else could this be implemented?
Determine if the index can be used at all. Traverse the WHERE/ON conditions and find occurrences of Virtual Column Expression. If yes, add the index into table->keys_usable_for_query.
For ref access, hook into
- Item_bool_func2::add_key_fields_optimize_op() // for equalities
- Item_func_null_predicate::add_key_fields() // for IS NULL
- Item_func_in::add_key_fields() // for degenerate "x IN (CONST)"
- and serveral other items.
Look for is_local_field() call everywhere.
for range access, hook into
- Item_bool_func::get_full_func_mm_tree_for_args()
- Item_func_in::get_mm_tree
for skipping sorting, hook somewhere in test_if_skip_sort_order().
*Serg's objection*: what if the optimizer rewrites the expression into something that's not identical to the vcol definition? Why don't we do a "local" rewrite of "(expr ... vcol_expr)" with "(expr ... vcol_expr) AND (expr ... vcol_column)" instead? We can mark the injected expression as redundant so it is removed at final plan fix-up stage.
Alternative suggestion 2 - "local" rewrite
.
Attachments
Issue Links
- blocks
-
MDEV-4414 INDEX - SUBSTRING, LEFT and others string functions that could be optimized with index
- Open
-
MDEV-4430 INDEX Optimization - Virtual Column, Dynamic Column and Index optimization
- Closed
-
MXS-3624 Use indexed virtual columns if such exist
- Closed
-
MXS-5346 Take indexes on expressions into use
- Open
- duplicates
-
MDEV-17993 WHERE F(X) = constY should used F(X) generated index and query WHERE G=constY
- Closed
- includes
-
MDEV-5631 Implement Reverse Key Index
- Closed
- is blocked by
-
MDEV-5800 indexes on virtual (not materialized) columns
- Closed