Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6017

Add support for Indexes on Expressions

    XMLWordPrintable

Details

    Description

      An index on expression means something like

      CREATE TABLE t1 (a int, b int, INDEX (a/2+b));
      ...
      SELECT * FROM t1 WHERE a/2+b=100;
      

      in this case the optimizer should be able to use an index.

      This task naturally splits in two steps:

      1. add expression matching into the optimizer, use it for generated columns. Like in
        CREATE TABLE t1 (a int, b int, c INT GENERATED ALWAYS AS (a/2+b), INDEX (c));
      2. support the syntax to create an index on expression directly, this will automatically create a hidden generated column under the hood

      original task description is visible in the history

      Details about the patch in the Pull Request

      contents

      SQL Syntax changes
        Optimizer support
          ORDER BY is not handled
          Optimizer support - details
      Problems in the patch
        Minor issues
      

      Details about the patch in the https://github.com/MariaDB/server/pull/1381 :

      SQL Syntax changes

      It adds syntax to support creating indexes directly on expressions: One can do this:

      CREATE TABLE t ( 
        ...
        INDEX(col1+col2)
      );
      

      or this:

      alter table t1 add index ((a*2-b));
      

      In current MariaDB, one has to define the column first:

      CREATE TABLE t (
        col1 int, col2 int, 
        ...
        col3 int as (col1+col2) virtual, 
        INDEX(col3)
      );
      

      (Interesting, the datatype for the indexed column is auto-inferred?)
      (Note that the patch is incomplete - e.g. SHOW CREATE TABLE doesn't work correctly)

      Optimizer support

      The basic approach is as follows:

      • Create a clone of a WHERE/ON clause.
      • In the copy, search for expressions identical to virtual column definition and replace them with Item_field referring to the virtual column.
      • If we've made any such substitution(s), replace the origin WHERE/ON expression with orig_expression AND clone_with_modifications.

      This rewrite is performed for all WHERE/ON clauses and for all indexed virtual columns
      that are in the scope. No speedups seem to be employed, we just use nested loops:

      for each indexed virtual column  $VCOL in any table {
         for each WHERE/ON clause $COND {
            // note that we didn't even check used_tables() to see if $COND can have references to $VCOL
        }
      }
      

      The result will typically contain clauses with duplication.

      After this, the optimizer can process any sargable uses of virtual_column_name in a regular way.

      ORDER BY is not handled

      Note that this doesn't handle ORDER BY LIMIT.
      The following will not be able to skip sorting:

      ALTER TABLE t1 ADD INDEX(FUNC(t1.col));
      SELECT * FROM t1 ORDER BY FUNC(t1.col) LIMIT 1;
      

      Optimizer support - details

      • there is something fancy done about cloning items. Something like "We can't clone Item_subselect so let the clone have the same Item_subselect* object".
      • TODO: would this work with VIEWs?

      Problems in the patch

      Duplication of WHERE/ON clauses looks like an overkill.

      Modifying the clone() call to clone some items but not others seems like a recipe for bugs.

      (The following "law" was suggested by Igor: an Item tree may not have multiple references to the same Item* object. There are some exceptions to this: one can refer to the same item through Item_direct_view_ref objects, one can refer to Items representing constant literals, etc.
      These exceptions do not cover directly referring to the same Item_subselect object )

      Minor issues

      The tree with the patch doesn't pass test suite, it crashes in the bootstrap.
      SHOW CREATE TABLE doesn't work correctly.

      Alternative suggestion 1 - make optimizer recognize function calls

      How else could this be implemented?

      Determine if the index can be used at all. Traverse the WHERE/ON conditions and find occurrences of Virtual Column Expression. If yes, add the index into table->keys_usable_for_query.

      For ref access, hook into

      • Item_bool_func2::add_key_fields_optimize_op() // for equalities
      • Item_func_null_predicate::add_key_fields() // for IS NULL
      • Item_func_in::add_key_fields() // for degenerate "x IN (CONST)"
      • and serveral other items.
        Look for is_local_field() call everywhere.

      for range access, hook into

      • Item_bool_func::get_full_func_mm_tree_for_args()
      • Item_func_in::get_mm_tree

      for skipping sorting, hook somewhere in test_if_skip_sort_order().

      *Serg's objection*: what if the optimizer rewrites the expression into something that's not identical to the vcol definition? Why don't we do a "local" rewrite of "(expr ... vcol_expr)" with "(expr ... vcol_expr) AND (expr ... vcol_column)" instead? We can mark the injected expression as redundant so it is removed at final plan fix-up stage.

      Alternative suggestion 2 - "local" rewrite

      .

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              fale Fabio Alessandro Locati
              Votes:
              21 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.