Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27576

Use DESC indexes for MIN/MAX optimization

Details

    Description

      As of c10e10c6, opt_sum.cc is still stubbed for DESC indexes

            if (part->key_part_flag & HA_REVERSE_SORT)
              break; // TODO MDEV-13756
      

      Given that we are about to close MDEV-13756, it looks strange: either it was forgotten and needs to be enabled, or maybe it requires a better reference.

      Given the above, the optimization expectedly doesn't work.

      --source include/have_sequence.inc
       
      create or replace table t1 (id int, key(id));
      insert into t1 select seq from seq_1_to_100 order by rand(1);
      explain extended select max(id) from t1 where id > 50;
       
      create or replace table t1 (id int, key(id desc));
      insert into t1 select seq from seq_1_to_100 order by rand(1);
      explain extended select max(id) from t1 where id > 50;
       
      # Cleanup
      drop table t1;
      

      With the ASC index, it is

      preview-10.8-MDEV-13756-desc-indexes c10e10c6

      +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------+
      |    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------+
      

      With the DESC index, it is

      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
      |    1 | SIMPLE      | t1    | range | id            | id   | 5       | NULL | 50   |   100.00 | Using where; Using index |
      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
      

      Attachments

        Issue Links

          Activity

            I don't know what it will take to enable it, but it doesn't appear to be as simple as removing that if – having done so locally, I ran a quick set of tests and it returned quite a few failures. I can't say whether they genuinely relate to DESC indexes or just reveal some old problems, known or not, but they would certainly need to be investigated at least. So, given the release schedule, it may be safer to keep it as is now and, if at all needed, create a separate task for 10.9 about using DESC indexes in the optimizations. Or converting this item into such a task.

            However, there are at least some effects of the disabled optimization which can come as a surprise. For example, this

            CREATE TABLE t1 (a FLOAT, KEY (a DESC));
            INSERT INTO t1 VALUES (0.1234),(0.6789);
            SELECT MAX(a) FROM t1 WHERE a <= 0.6789;
            

            returns

            preview-10.8-MDEV-13756-desc-indexes c10e10c6

            MAX(a)
            0.1234
            

            I can't raise it to a bug level since it relies on float comparison which is not to be relied upon, but looking at existing MTR tests, for "normal" ascending indexes it is assumed that it will return the bigger value. Thus a surprise.

            elenst Elena Stepanova added a comment - I don't know what it will take to enable it, but it doesn't appear to be as simple as removing that if – having done so locally, I ran a quick set of tests and it returned quite a few failures. I can't say whether they genuinely relate to DESC indexes or just reveal some old problems, known or not, but they would certainly need to be investigated at least. So, given the release schedule, it may be safer to keep it as is now and, if at all needed, create a separate task for 10.9 about using DESC indexes in the optimizations. Or converting this item into such a task. However, there are at least some effects of the disabled optimization which can come as a surprise. For example, this CREATE TABLE t1 (a FLOAT , KEY (a DESC )); INSERT INTO t1 VALUES (0.1234),(0.6789); SELECT MAX (a) FROM t1 WHERE a <= 0.6789; returns preview-10.8-MDEV-13756-desc-indexes c10e10c6 MAX (a) 0.1234 I can't raise it to a bug level since it relies on float comparison which is not to be relied upon, but looking at existing MTR tests, for "normal" ascending indexes it is assumed that it will return the bigger value. Thus a surprise.
            ycp Yuchen Pei added a comment - - edited

            Trying to understand where the issue comes from. If I remove the if, I get NULL with descending index. The code path looks similar between the two, but after the call to find_key_for_maxmin(), get_index_max_value() assigns the max (*((Item *) conds)->cmp->a)->val_int(), which is 100 in the usual index and 1 in the descending index. So I assume the fix should come before or during the call to get_index_max_value().

            The main actions of finding the max happens in the storage engine, though it is probably independent of storage engines as defined by the contract of index_last(), and the change for this ticket presumably should only touch the optimizer / sql layer. The storage engine code is rather low level, but to find out why things don't work as expected I have to look into it. With InnoDB, I notice that one difference between the two scenarios is page_rec_get_prev_const() counts up to 100 in the usual index and counts down to 1 in the descending index. To be further investigated...

            ycp Yuchen Pei added a comment - - edited Trying to understand where the issue comes from. If I remove the if , I get NULL with descending index. The code path looks similar between the two, but after the call to find_key_for_maxmin() , get_index_max_value() assigns the max (*((Item *) conds)->cmp->a)->val_int() , which is 100 in the usual index and 1 in the descending index. So I assume the fix should come before or during the call to get_index_max_value() . The main actions of finding the max happens in the storage engine, though it is probably independent of storage engines as defined by the contract of index_last() , and the change for this ticket presumably should only touch the optimizer / sql layer. The storage engine code is rather low level, but to find out why things don't work as expected I have to look into it. With InnoDB, I notice that one difference between the two scenarios is page_rec_get_prev_const() counts up to 100 in the usual index and counts down to 1 in the descending index. To be further investigated...
            ycp Yuchen Pei added a comment -

            The difference between counting up and down is simply because
            index_last() looks for the last value according to the index: asc
            -> last is max; desc -> last is min. So what needs to be done is to
            invert these logic when doing desc, i.e. use index_first() etc.
            Here's a PoC patch

            upstream/bb-11.3-mdev-27576 5bd89cc733f4553612e9eb4ae74161059154a683
            MDEV-27576 [PoC] Use reverse index for max/min optimization

            ycp Yuchen Pei added a comment - The difference between counting up and down is simply because index_last() looks for the last value according to the index: asc -> last is max; desc -> last is min. So what needs to be done is to invert these logic when doing desc, i.e. use index_first() etc. Here's a PoC patch upstream/bb-11.3-mdev-27576 5bd89cc733f4553612e9eb4ae74161059154a683 MDEV-27576 [PoC] Use reverse index for max/min optimization
            ycp Yuchen Pei added a comment - - edited

            Hi psergei, ptal thanks

            upstream/bb-11.3-mdev-27576 c4c4bd8e47e5bfd4d0a650e286ea9daa3e6276f0
            MDEV-27576 Use reverse index for max/min optimization
             
            We add a field in st_table_ref to indicate that the key used is
            a descending index, which will flip the functions and flags used in
            get_index_max_value() and get_index_min_value(), that allows correct
            optimization for max/min for descending index.
            

            ycp Yuchen Pei added a comment - - edited Hi psergei , ptal thanks upstream/bb-11.3-mdev-27576 c4c4bd8e47e5bfd4d0a650e286ea9daa3e6276f0 MDEV-27576 Use reverse index for max/min optimization   We add a field in st_table_ref to indicate that the key used is a descending index, which will flip the functions and flags used in get_index_max_value() and get_index_min_value(), that allows correct optimization for max/min for descending index.
            psergei Sergei Petrunia added a comment - ycp , replied: https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/IWKG5FXYNP44SUCNRVJU2RLMZGMARIH5/
            ycp Yuchen Pei added a comment -

            Thanks for the comments psergei. I addressed them and updated my commit. ptal thanks (I understand the fixversion is 11.5 so no rush):

            upstream/bb-11.3-mdev-27576 8398e5c153320efbbeeeeac9d9029e0f932aaaa2
            MDEV-27576 Use reverse index for max/min optimization
             
            We use a bool to indicate that the key part used is a descending
            index, which will flip the functions and flags used in
            get_index_max_value() and get_index_min_value(), that allows correct
            optimization for max/min for descending index.
            

            ycp Yuchen Pei added a comment - Thanks for the comments psergei . I addressed them and updated my commit. ptal thanks (I understand the fixversion is 11.5 so no rush): upstream/bb-11.3-mdev-27576 8398e5c153320efbbeeeeac9d9029e0f932aaaa2 MDEV-27576 Use reverse index for max/min optimization   We use a bool to indicate that the key part used is a descending index, which will flip the functions and flags used in get_index_max_value() and get_index_min_value(), that allows correct optimization for max/min for descending index.

            The patch is ok, I've only requested some formatting changes via email.

            Then, we need to submit it for testing.

            psergei Sergei Petrunia added a comment - The patch is ok, I've only requested some formatting changes via email. Then, we need to submit it for testing.
            ycp Yuchen Pei added a comment - - edited

            Thanks for the review psergei. Applied the formatting patch and
            pushed to branch bb-11.4-mdev-27576-preview for testing:

            upstream/bb-11.4-mdev-27576-preview 41566ac3896c6a6a01df10062b2fc29df3f84f86
            MDEV-27576 Use reverse index for max/min optimization
             
            We use a bool to indicate that the key part used is a descending
            index, which will flip the functions and flags used in
            get_index_max_value() and get_index_min_value(), that allows correct
            optimization for max/min for descending index.

            BTW, is this still targeted at 11.5?

            ycp Yuchen Pei added a comment - - edited Thanks for the review psergei . Applied the formatting patch and pushed to branch bb-11.4-mdev-27576-preview for testing: upstream/bb-11.4-mdev-27576-preview 41566ac3896c6a6a01df10062b2fc29df3f84f86 MDEV-27576 Use reverse index for max/min optimization   We use a bool to indicate that the key part used is a descending index, which will flip the functions and flags used in get_index_max_value() and get_index_min_value(), that allows correct optimization for max/min for descending index. BTW, is this still targeted at 11.5?

            Testing done. Ok to push.

            But there are some notes about optimization strategies used by the query optimizer when there is ascending/descending index in the table. These moments were discussed with psergei and at this time they do not require improvement (because there is a task for improvement, or there is no need for improvement), but I think it is necessary keep this information.

            For the table:

            CREATE TABLE t1
            (a VARCHAR(10),
            b VARCHAR(10),
            PRIMARY KEY (a DESC, b DESC),
            KEY ab_asc (a ASC, b ASC),
            KEY a_asc_b_desc (a ASC, b DESC),
            key a_desc_b_asc (a DESC, b ASC))
            ENGINE = InnoDB;
            

            The following query plans are:

            EXPLAIN  SELECT max(a)  FROM t1;
            id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            1	SIMPLE	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Select tables optimized away
            EXPLAIN  SELECT max(a)  FROM t1 ORDER BY a DESC;
            id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            1	SIMPLE	t1	index	NULL	ab_asc	24	NULL	13	Using index
            

            Thus second query:

            1. is using index, but in this case the query will return only one result and the "group by" can be ignored
            2. optimizer uses index "ab_asc" instead of Primary key with descending order
            lstartseva Lena Startseva added a comment - Testing done. Ok to push. But there are some notes about optimization strategies used by the query optimizer when there is ascending/descending index in the table. These moments were discussed with psergei and at this time they do not require improvement (because there is a task for improvement, or there is no need for improvement), but I think it is necessary keep this information. For the table: CREATE TABLE t1 (a VARCHAR (10), b VARCHAR (10), PRIMARY KEY (a DESC , b DESC ), KEY ab_asc (a ASC , b ASC ), KEY a_asc_b_desc (a ASC , b DESC ), key a_desc_b_asc (a DESC , b ASC )) ENGINE = InnoDB; The following query plans are: EXPLAIN SELECT max (a) FROM t1; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Select tables optimized away EXPLAIN SELECT max (a) FROM t1 ORDER BY a DESC ; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 index NULL ab_asc 24 NULL 13 Using index Thus second query: is using index, but in this case the query will return only one result and the "group by" can be ignored optimizer uses index "ab_asc" instead of Primary key with descending order
            ycp Yuchen Pei added a comment - - edited

            pushed the following to 11.4

            875377ad824473774c833b1aff4346ba3133f092 MDEV-27576 Use reverse index for max/min optimization

            ycp Yuchen Pei added a comment - - edited pushed the following to 11.4 875377ad824473774c833b1aff4346ba3133f092 MDEV-27576 Use reverse index for max/min optimization

            People

              ycp Yuchen Pei
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.