Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36132

Optimizer support for functional indexes: handle GROUP/ORDER BY

    XMLWordPrintable

Details

    • Q2/2025 Development, Q3/2025 Maintenance

    Description

      See MDEV-35615, section Support ORDER BY/GROUP BY:

      We need to rewrite ORDER BY vcol_expr into ORDER BY vcol_field so that the optimizer can use an index to skip sorting.
      (MySQL does this, this should be easy to do)

      Contents

      1. Rewrite makes covering indexes non-covering
      1.1 Solving the Rewrite-makes-indexes-non-covering problem
      1.1.1 Execution code doesnt re-compute stored VCOLs
      1.1.2 Working around execution code shortcomings.
      2. ShouldRewriteModifyAllFields problem

      1. Rewrite makes covering indexes non-covering

      Consider a testcase:

      create table t (c int, d varchar(100), index IDX_C(c));
      insert into t select seq,seq from seq_1_to_10000;
      alter table t
        add column vc int as (c + 1),
        add index IDX_VC(vc);
      

      A query using vcol_expr will use index-only scan on IDX_C :

      explain select c+1 from t order by c+1 limit 2;
      +------+-------------+-------+-------+---------------+-------+---------+------+-------+-----------------------------+
      | id   | select_type | table | type  | possible_keys | key   | key_len | ref  | rows  | Extra                       |
      +------+-------------+-------+-------+---------------+-------+---------+------+-------+-----------------------------+
      |    1 | SIMPLE      | t     | index | NULL          | IDX_C | 5       | NULL | 10330 | Using index; Using filesort |
      +------+-------------+-------+-------+---------------+-------+---------+------+-------+-----------------------------+
      

      (and will not consider IDX_VC).
      If we do rewrite in ORDER BY, it will use IDX_VC, but in a non-index-only way:

      MariaDB [j4]> explain select c+1 from t order by vc limit 2;
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------+
      | id   | select_type | table | type  | possible_keys | key    | key_len | ref  | rows | Extra |
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------+
      |    1 | SIMPLE      | t     | index | NULL          | IDX_VC | 5       | NULL | 2    |       |
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------+
      

      neither IDX_C, nor IDX_VC are covering anymore.

      That is, rewriting ORDER BY vcol_expr into ORDER BY vcol_field makes covering indexes non-covering.

      (TODO: what about the rewrite done in MDEV-35616? Won't it have the same problem? )

      1.1 Solving the Rewrite-makes-indexes-non-covering problem

      It seems, the problem is that the optimizer is not aware that it has an option to compute vcol_field from vcol_expr.

      What if we made it aware about that? We could take the code that computes the set of covering indexes and make sure
      that vcol_field returns these indexes as covering:

      • indexes that include vcol_field – this is already done now
      • (vcol_expr)->covering_keys – this is new

      Then, changing vcol_expr to vcol_field would not reduce the set of query's covering_keys.

      1.1.1 Execution code doesnt re-compute stored VCOLs

      Can we do it? Not yet: the execution side is missing some code:
      https://mariadb.slack.com/archives/C021E77G7K2/p1746525196637819

      It seems, for stored virtual columns it is missing the logic to "if the employed access method didn't read the value for column VCOL, compute the value of VCOL from VCOL_EXPR" .

      1.1.2 Working around execution code shortcomings (NOT NEEDED?)

      What if we only handled non-stored virtual columns? That is:
      1. Solve the Rewrite-makes-indexes-non-covering problem for non-stored vcols (vcol_field->covering_keys would include vcol->expr->covering_keys).
      2. Do the ORDER BY rewrite only for non-stored virtual columns.

      2. ShouldRewriteModifyAllFields problem

      ORDER BY clause is processed as follows:

      The item is added in JOIN::all_fields.
      The item is added into JOIN::ref_pointer_array.
      The ORDER BY expression in the ORDER structure points to an element in the JOIN::ref_pointer_array.

      We do rewrite by changing the element in JOIN::ref_pointer_array. However, the element in JOIN::all_fields remains unchanged. Should it be changed?

      I have a patch that changes it. It can cause a problem.

      create table t1 (a int, b int);
      insert into t1 select seq, seq from seq_1_to_1000;
      alter table t1 add vcol int as (a+1), add index(vcol);
       
      explain 
      select (a+1),b from t1 order by 1 limit 2;
      select (a+1),b from t1 order by 1 limit 2;
      

      Current ORDER BY code (b91a6da292550b26d8e1901eaf72086465476459):

      MariaDB [test]> select (a+1),b from t1 order by 1 limit 2;
      +-------+------+
      | (a+1) | b    |
      +-------+------+
      |     2 |    1 |
      |     3 |    2 |
      +-------+------+
      

      code that changes all_fields : https://gist.github.com/spetrunia/c6eb87344ef992f99455ff73e8c8e327. causes this: column name becomes vcol:

      MariaDB [j1]> select (a+1),b from t1 order by 1 limit 2;
      +------+------+
      | vcol | b    |
      +------+------+
      |    2 |    1 |
      |    3 |    2 |
      +------+------+
      

      I'm not sure if that's the only issue.

      Attachments

        Issue Links

          Activity

            People

              ycp Yuchen Pei
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.