Details

    Description

      With the following secondary index (link to full schema):

                       KEY (`partition`, oid, tid),
      

      I have for example:

      MariaDB [neo0]> analyze SELECT SQL_NO_CACHE MAX(oid) AS oid FROM obj GROUP BY `partition`;
      +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows     | r_rows      | filtered | r_filtered | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+
      |    1 | SIMPLE      | obj   | index | NULL          | PRIMARY | 18      | NULL | 69066464 | 69066464.00 |   100.00 |     100.00 | Using index |
      +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+
      1 row in set (22.01 sec)
       
      MariaDB [neo0]> analyze SELECT SQL_NO_CACHE MAX(oid) AS oid FROM obj FORCE INDEX FOR GROUP BY (`partition`) GROUP BY `partition`;
      +------+-------------+-------+-------+---------------+-----------+---------+------+----------+-------------+----------+------------+-------------+
      | id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows     | r_rows      | filtered | r_filtered | Extra       |
      +------+-------------+-------+-------+---------------+-----------+---------+------+----------+-------------+----------+------------+-------------+
      |    1 | SIMPLE      | obj   | index | NULL          | partition | 18      | NULL | 69066464 | 69066464.00 |   100.00 |     100.00 | Using index |
      +------+-------------+-------+-------+---------------+-----------+---------+------+----------+-------------+----------+------------+-------------+
      1 row in set (27.40 sec)
      

      After ANALYZE TABLE obj:

      MariaDB [neo0]> analyze SELECT SQL_NO_CACHE MAX(oid) AS oid FROM obj GROUP BY `partition`;
      +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
      | id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                    |
      +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
      |    1 | SIMPLE      | obj   | range | NULL          | partition | 2       | NULL |  285 | 128.00 |   100.00 |     100.00 | Using index for group-by |
      +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
      1 row in set (0.19 sec)
      

      (this last result is what we expect)

      Here, I am not talking about an engine having so bad statistics that an inefficient query plan is chosen. Of course:

      • some engines may be better than other at providing good statistics, and there may be bugs to fix about this
      • and our code would be easier to write if we didn't have to give index hints

      But good statistics in our project is not really important. We have a fixed list of SQL statements and for each of them, the query plan must always be the same. Indexes are carefully chosen for this.

      Because engines can not guarantee that statistics are always good enough, we'll end up adding FORCE hints everywhere and we expect MariaDB to follow them blindly.

      We had this issue with both InnoDB and TokuDB.

      For us, this is the most critical bug we have because it can kill our application in unresolvable ways. In the past, we already had optimizer issues on performance-critical queries and fortunately index hints did the job at the time:

      Attachments

        Issue Links

          Activity

            Reproducible on all of 5.5-10.2 and MySQL 5.7. I'm not sure whether it's expected to work, but I couldn't find in documentation why it shouldn't.

            You need a big enough table to reproduce it. I've uploaded a dump (totally artificial) to the public ftp: ftp://ftp.askmonty.org/public/mdev12980.dmp.gz .

            elenst Elena Stepanova added a comment - Reproducible on all of 5.5-10.2 and MySQL 5.7. I'm not sure whether it's expected to work, but I couldn't find in documentation why it shouldn't. You need a big enough table to reproduce it. I've uploaded a dump (totally artificial) to the public ftp: ftp://ftp.askmonty.org/public/mdev12980.dmp.gz .

            Looking at the analyze input, it looks like FORCE INDEX works as expected. Using the FORCE clause causes
            the query to use the wanted index.

            The problem is that we in this case don't detect that we are not using the QUICK_SELECT_I::QS_TYPE_GROUP_MIN_MAX optimization unless we have proper statistics and there is
            currently no way to force that.

            monty Michael Widenius added a comment - Looking at the analyze input, it looks like FORCE INDEX works as expected. Using the FORCE clause causes the query to use the wanted index. The problem is that we in this case don't detect that we are not using the QUICK_SELECT_I::QS_TYPE_GROUP_MIN_MAX optimization unless we have proper statistics and there is currently no way to force that.

            I thought it was the purpose of FORCE INDEX FOR GROUP BY. Strangely, for this other case where statistics were bad but not too much, USE INDEX was enough to help:

            MariaDB [neo0]> analyze SELECT SQL_NO_CACHE `partition`, MAX(tid) FROM obj GROUP BY `partition`;
            +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+
            | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows     | r_rows      | filtered | r_filtered | Extra       |
            +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+
            |    1 | SIMPLE      | obj   | index | NULL          | PRIMARY | 18      | NULL | 69066464 | 69066464.00 |   100.00 |     100.00 | Using index |
            +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+
            1 row in set (22.83 sec)
             
            MariaDB [neo0]> analyze SELECT SQL_NO_CACHE `partition`, MAX(tid) FROM obj USE INDEX(PRIMARY) GROUP BY `partition`;
            +------+-------------+-------+-------+---------------+---------+---------+------+----------+--------+----------+------------+--------------------------+
            | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows     | r_rows | filtered | r_filtered | Extra                    |
            +------+-------------+-------+-------+---------------+---------+---------+------+----------+--------+----------+------------+--------------------------+
            |    1 | SIMPLE      | obj   | range | NULL          | PRIMARY | 2       | NULL | 69066465 | 128.00 |   100.00 |     100.00 | Using index for group-by |
            +------+-------------+-------+-------+---------------+---------+---------+------+----------+--------+----------+------------+--------------------------+
            1 row in set (0.03 sec)
            

            Key is PRIMARY for both queries and with the hint, it switched to range type.

            jmuchemb Julien Muchembled added a comment - I thought it was the purpose of FORCE INDEX FOR GROUP BY . Strangely, for this other case where statistics were bad but not too much, USE INDEX was enough to help: MariaDB [neo0]> analyze SELECT SQL_NO_CACHE `partition`, MAX(tid) FROM obj GROUP BY `partition`; +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | obj | index | NULL | PRIMARY | 18 | NULL | 69066464 | 69066464.00 | 100.00 | 100.00 | Using index | +------+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (22.83 sec)   MariaDB [neo0]> analyze SELECT SQL_NO_CACHE `partition`, MAX(tid) FROM obj USE INDEX(PRIMARY) GROUP BY `partition`; +------+-------------+-------+-------+---------------+---------+---------+------+----------+--------+----------+------------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+-------+-------+---------------+---------+---------+------+----------+--------+----------+------------+--------------------------+ | 1 | SIMPLE | obj | range | NULL | PRIMARY | 2 | NULL | 69066465 | 128.00 | 100.00 | 100.00 | Using index for group-by | +------+-------------+-------+-------+---------------+---------+---------+------+----------+--------+----------+------------+--------------------------+ 1 row in set (0.03 sec) Key is PRIMARY for both queries and with the hint, it switched to range type.

            Looked at all this again, and I agree with Monty's judgment: FORCE INDEX FOR GROUP BY means "resolve GROUP BY using the index". It does not imply "Resolve GROUP BY using Loose Scan (this is internal name for 'Using index for group-by' optimization) ".

            The choice whether to use Loose Index Scan is based on data statistics. (If the GROUP BY groups are sufficiently small, LooseScan can be worse than just using the index).

            We could get more detailed query hints like MySQL 8.0 has, but that is a big project and cannot be done as a bugfix.

            psergei Sergei Petrunia added a comment - Looked at all this again, and I agree with Monty's judgment: FORCE INDEX FOR GROUP BY means "resolve GROUP BY using the index". It does not imply "Resolve GROUP BY using Loose Scan (this is internal name for 'Using index for group-by' optimization) ". The choice whether to use Loose Index Scan is based on data statistics. (If the GROUP BY groups are sufficiently small, LooseScan can be worse than just using the index). We could get more detailed query hints like MySQL 8.0 has, but that is a big project and cannot be done as a bugfix.

            OK, thanks for the details.

            For the moment, this bug report is only about a particular kind of request and we don't have anymore such request in our project. In a recent commit message, I wrote:

            Note about MySQL changes in commit ca58ccd7fe93522aa440cbabad4d1d1ba2a47b7c: what we did as a workaround is not one any more. Now, we do so much on Python side that it's unlikely we could reduce the number of queries using GROUP BY. We even stopped doing that for SQLite.

            Maybe this bug report should be renamed.

            jmuchemb Julien Muchembled added a comment - OK, thanks for the details. For the moment, this bug report is only about a particular kind of request and we don't have anymore such request in our project. In a recent commit message, I wrote: Note about MySQL changes in commit ca58ccd7fe93522aa440cbabad4d1d1ba2a47b7c: what we did as a workaround is not one any more. Now, we do so much on Python side that it's unlikely we could reduce the number of queries using GROUP BY. We even stopped doing that for SQLite. Maybe this bug report should be renamed.

            People

              psergei Sergei Petrunia
              jmuchemb Julien Muchembled
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.