Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36132

Optimizer support for functional indexes: handle GROUP/ORDER BY

Details

    • Q2/2025 Development

    Description

      See MDEV-35615, section Support ORDER BY/GROUP BY:

      We need to rewrite ORDER BY vcol_expr into ORDER BY vcol_field so that the optimizer can use an index to skip sorting.
      (MySQL does this, this should be easy to do)

      Problem: rewrite makes covering indexes non-covering

      Consider a testcase:

      create table t (c int, d varchar(100), index IDX_C(c));
      insert into t select seq,seq from seq_1_to_10000;
      alter table t
        add column vc int as (c + 1),
        add index IDX_VC(vc);
      

      A query using vcol_expr will use index-only scan on IDX_C :

      explain select c+1 from t order by c+1 limit 2;
      +------+-------------+-------+-------+---------------+-------+---------+------+-------+-----------------------------+
      | id   | select_type | table | type  | possible_keys | key   | key_len | ref  | rows  | Extra                       |
      +------+-------------+-------+-------+---------------+-------+---------+------+-------+-----------------------------+
      |    1 | SIMPLE      | t     | index | NULL          | IDX_C | 5       | NULL | 10330 | Using index; Using filesort |
      +------+-------------+-------+-------+---------------+-------+---------+------+-------+-----------------------------+
      

      (and will not consider IDX_VC).
      If we do rewrite in ORDER BY, it will use IDX_VC, but in a non-index-only way:

      MariaDB [j4]> explain select c+1 from t order by vc limit 2;
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------+
      | id   | select_type | table | type  | possible_keys | key    | key_len | ref  | rows | Extra |
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------+
      |    1 | SIMPLE      | t     | index | NULL          | IDX_VC | 5       | NULL | 2    |       |
      +------+-------------+-------+-------+---------------+--------+---------+------+------+-------+
      

      neither IDX_C, nor IDX_VC are covering anymore.

      That is, rewriting ORDER BY vcol_expr into ORDER BY vcol_field makes covering indexes non-covering.

      (TODO: what about the rewrite done in MDEV-35616? Won't it have the same problem? )

      Solving the Rewrite-makes-indexes-non-covering problem

      It seems, the problem is that the optimizer is not aware that it has an option to compute vcol_field from vcol_expr.

      What if we made it aware about that? We could take the code that computes the set of covering indexes and make sure
      that vcol_field returns these indexes as covering:

      • indexes that include vcol_field – this is already done now
      • (vcol_expr)->covering_keys – this is new

      Then, changing vcol_expr to vcol_field would not reduce the set of query's covering_keys.

      Execution code doesnt re-compute stored VCOLs

      Can we do it? Not yet: the execution side is missing some code:
      https://mariadb.slack.com/archives/C021E77G7K2/p1746525196637819

      It seems, for stored virtual columns it is missing the logic to "if the employed access method didn't read the value for column VCOL, compute the value of VCOL from VCOL_EXPR" .

      Working around execution code shortcomings.

      What if we only handled non-stored virtual columns? That is:
      1. Solve the Rewrite-makes-indexes-non-covering problem for non-stored vcols (vcol_field->covering_keys would include vcol->expr->covering_keys).
      2. Do theORDER BY rewrite only for non-stored virtual columns.

      ycp, it seems that doing this would introduce no regressions?

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            I hacked the idea of earlier read_set update into a little commit (bb-12.1-mdev-36132 ddd0d65aa652073a22d7d80cb41a4f357723f095), and it does fix the testcase and there seem to be fewer regression test failures. Will continue tomorrow.

            ycp Yuchen Pei added a comment - I hacked the idea of earlier read_set update into a little commit (bb-12.1-mdev-36132 ddd0d65aa652073a22d7d80cb41a4f357723f095), and it does fix the testcase and there seem to be fewer regression test failures. Will continue tomorrow.
            ycp Yuchen Pei added a comment -

            With a further fix that extends update_vcol_key_covering to all leaf fields, the patch has no regression in existing tests, see bb-12.1-mdev-36132 b29045eedd3b9a4eebd9346a873deb704b462ceb.

            GROUP BY is not trivial, however, see bb-12.1-mdev-36132-group-by.

            I am first cleaning up the ORDER BY patches.

            ycp Yuchen Pei added a comment - With a further fix that extends update_vcol_key_covering to all leaf fields, the patch has no regression in existing tests, see bb-12.1-mdev-36132 b29045eedd3b9a4eebd9346a873deb704b462ceb. GROUP BY is not trivial, however, see bb-12.1-mdev-36132-group-by. I am first cleaning up the ORDER BY patches.
            ycp Yuchen Pei added a comment -

            OK, I've squashed all the current ORDER BY commits into one, see bb-12.1-mdev-36132 1fef970e9c609f34eae974ce771dc11a57a50a71.

            I tried, without success to de-duplicate checks of "nonconventional vcol keyread" by adding a field TABLE::keyread_with_vcol, see bb-12.1-mdev-36132-keyread-with-vcol 0aac5c8118545d895ae07a1ca08abe04eae84ef4. May need to investigate why it fails certain tests.

            ycp Yuchen Pei added a comment - OK, I've squashed all the current ORDER BY commits into one, see bb-12.1-mdev-36132 1fef970e9c609f34eae974ce771dc11a57a50a71. I tried, without success to de-duplicate checks of "nonconventional vcol keyread" by adding a field TABLE::keyread_with_vcol, see bb-12.1-mdev-36132-keyread-with-vcol 0aac5c8118545d895ae07a1ca08abe04eae84ef4. May need to investigate why it fails certain tests.
            ycp Yuchen Pei added a comment - - edited

            Hi psergei, as discussed, ptal thanks:

            04254ffed11 upstream/bb-12.1-mdev-36132 MDEV-36132 Substitute vcol expressions with indexed vcol fields in ORDER BY

            I will address some of the remaining issues meanwhile:

            • [X] if/how mysql handles this
            • [ ] TODO items in the commit
            • [ ] flag a bool when marking read_set, which can be used later in update_virtual_fields to decide whether to compute vcol values
            • [ ] failure in msan builder: gcol.innodb_partition https://buildbot.mariadb.org/#/builders/640/builds/11836
            • [ ] group by
            ycp Yuchen Pei added a comment - - edited Hi psergei , as discussed, ptal thanks: 04254ffed11 upstream/bb-12.1-mdev-36132 MDEV-36132 Substitute vcol expressions with indexed vcol fields in ORDER BY I will address some of the remaining issues meanwhile: [X] if/how mysql handles this [ ] TODO items in the commit [ ] flag a bool when marking read_set, which can be used later in update_virtual_fields to decide whether to compute vcol values [ ] failure in msan builder: gcol.innodb_partition https://buildbot.mariadb.org/#/builders/640/builds/11836 [ ] group by
            ycp Yuchen Pei added a comment -

            Running the same tests in mysql suggests that the ORDER BY substitution is implemented, but without index covering updates.

            Compared against our base 12.1 commit (see attached base-mysql.diff) we see that with "order by vcol_expr" with a "limit 2" the plan improved from full table scan to using index:

            explain select vc from t order by c + 1 limit 2;
            -id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            -1	SIMPLE	t	ALL	NULL	NULL	NULL	NULL	10000	Using filesort
            +id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
            +1	SIMPLE	t	NULL	index	NULL	vc	5	NULL	2	100.00	NULL
            +Warnings:
            +Note	1003	/* select#1 */ select `test`.`t`.`vc` AS `vc` from `test`.`t` order by (`test`.`t`.`c` + 1) limit 2
             explain select c + 1 from t order by c + 1 limit 2;
            -id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            -1	SIMPLE	t	index	NULL	c	5	NULL	10000	Using index; Using filesort
            +id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
            +1	SIMPLE	t	NULL	index	NULL	vc	5	NULL	2	100.00	NULL
            +Warnings:
            +Note	1003	/* select#1 */ select (`test`.`t`.`c` + 1) AS `c + 1` from `test`.`t` order by (`test`.`t`.`c` + 1) limit 2
             explain select c + 1 from t order by vc limit 2;

            However, without "limit 2" the same regression I reported before implementing index covering updates is present too:

             explain select c + 1 from t order by c + 1;
            -id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            -1	SIMPLE	t	index	NULL	c	5	NULL	10000	Using index; Using filesort
            +id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
            +1	SIMPLE	t	NULL	ALL	NULL	NULL	NULL	NULL	9980	100.00	Using filesort
            +Warnings:
            +Note	1003	/* select#1 */ select (`test`.`t`.`c` + 1) AS `c + 1` from `test`.`t` order by (`test`.`t`.`c` + 1)
            ...
             explain select vc from t order by c + 1;
            -id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            -1	SIMPLE	t	ALL	NULL	NULL	NULL	NULL	10000	Using filesort
            +id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
            +1	SIMPLE	t	NULL	ALL	NULL	NULL	NULL	NULL	9980	100.00	Using filesort
            +Warnings:
            +Note	1003	/* select#1 */ select `test`.`t`.`vc` AS `vc` from `test`.`t` order by (`test`.`t`.`c` + 1)

            When compared against the current patch under review for this task, none of the tests show a better plan, and many show a worse plan, see attached mdev-36132-mysql.diff.

            ycp Yuchen Pei added a comment - Running the same tests in mysql suggests that the ORDER BY substitution is implemented, but without index covering updates. Compared against our base 12.1 commit (see attached base-mysql.diff) we see that with "order by vcol_expr" with a "limit 2" the plan improved from full table scan to using index: explain select vc from t order by c + 1 limit 2; -id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t ALL NULL NULL NULL NULL 10000 Using filesort +id select_type table partitions type possible_keys key key_len ref rows filtered Extra +1 SIMPLE t NULL index NULL vc 5 NULL 2 100.00 NULL +Warnings: +Note 1003 /* select#1 */ select `test`.`t`.`vc` AS `vc` from `test`.`t` order by (`test`.`t`.`c` + 1) limit 2 explain select c + 1 from t order by c + 1 limit 2; -id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t index NULL c 5 NULL 10000 Using index; Using filesort +id select_type table partitions type possible_keys key key_len ref rows filtered Extra +1 SIMPLE t NULL index NULL vc 5 NULL 2 100.00 NULL +Warnings: +Note 1003 /* select#1 */ select (`test`.`t`.`c` + 1) AS `c + 1` from `test`.`t` order by (`test`.`t`.`c` + 1) limit 2 explain select c + 1 from t order by vc limit 2; However, without "limit 2" the same regression I reported before implementing index covering updates is present too: explain select c + 1 from t order by c + 1; -id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t index NULL c 5 NULL 10000 Using index; Using filesort +id select_type table partitions type possible_keys key key_len ref rows filtered Extra +1 SIMPLE t NULL ALL NULL NULL NULL NULL 9980 100.00 Using filesort +Warnings: +Note 1003 /* select#1 */ select (`test`.`t`.`c` + 1) AS `c + 1` from `test`.`t` order by (`test`.`t`.`c` + 1) ... explain select vc from t order by c + 1; -id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t ALL NULL NULL NULL NULL 10000 Using filesort +id select_type table partitions type possible_keys key key_len ref rows filtered Extra +1 SIMPLE t NULL ALL NULL NULL NULL NULL 9980 100.00 Using filesort +Warnings: +Note 1003 /* select#1 */ select `test`.`t`.`vc` AS `vc` from `test`.`t` order by (`test`.`t`.`c` + 1) When compared against the current patch under review for this task, none of the tests show a better plan, and many show a worse plan, see attached mdev-36132-mysql.diff.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.