Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7982

EXPLAIN shows meaningless query plan with Distinct

Details

    Description

      Take the attached dataset and run this query:

      explain  select distinct count(*) from t2 where a between 10 and 20 group by a;
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                               |
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
      |    1 | SIMPLE      | t2    | range | a             | a    | 5       | NULL |   10 | Using where; Using index; Distinct; Using temporary; Using filesort |
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+

      and see an apparently non-sensical output (how can a table use "Distinct" if it
      is the only table in the join?)

      This is repeatable on 10.0.17, didn't check 5.5

      Attachments

        Activity

          Debugging on 10.0, looking at this line of code in JOIN::save_explain_data_intern:

          	if (distinct & test_all_bits(used_tables,
                                               join->select_list_used_tables))

          select_list_used_tables==0 when we come here... which is weird, JOIN::optimize_inner does call JOIN::eval_select_list_used_tables which should have put a correct value there.

          in eval_select_list_used_tables we have:

          (gdb) p item
            $11 = (Item *) 0x7fffa4006618
          (gdb) p dbug_print_item( item)
            $12 = 0x1842700 "count(0)"
          (gdb) p item->used_tables()
            $13 = 0

          One can debate what (count)->used_tables() should return, but this is not the point.

          The point is, if we run

            SELECT DISTINCT COUNT(*) 

          then it is obvious that Distinct optimization is not applicable.
          Short-circuiting logic applies when running

            SELECT DISTINCT func(table_column)
            

          but not when one runs

          SELECT DISTINCT aggregate_func.

          (unless maybe some cases with Loose Scan).

          psergei Sergei Petrunia added a comment - Debugging on 10.0, looking at this line of code in JOIN::save_explain_data_intern: if (distinct & test_all_bits(used_tables, join->select_list_used_tables)) select_list_used_tables==0 when we come here... which is weird, JOIN::optimize_inner does call JOIN::eval_select_list_used_tables which should have put a correct value there. in eval_select_list_used_tables we have: (gdb) p item $11 = (Item *) 0x7fffa4006618 (gdb) p dbug_print_item( item) $12 = 0x1842700 "count(0)" (gdb) p item->used_tables() $13 = 0 One can debate what (count )->used_tables() should return, but this is not the point. The point is, if we run SELECT DISTINCT COUNT(*) then it is obvious that Distinct optimization is not applicable. Short-circuiting logic applies when running SELECT DISTINCT func(table_column) but not when one runs SELECT DISTINCT aggregate_func. (unless maybe some cases with Loose Scan).

          On 5.3 and 5.5, it looks like this:

          elenst@wheezy-64:~/git/5.5$ ml test -e "explain  select distinct count(*) from t2 where a between 10 and 20 group by a;"
          +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
          | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                               |
          +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
          |    1 | SIMPLE      | t2    | range | a             | a    | 5       | NULL |   10 | Using where; Using index; Using temporary; Using filesort; Distinct |
          +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+

          elenst Elena Stepanova added a comment - On 5.3 and 5.5, it looks like this: elenst@wheezy-64:~/git/5.5$ ml test -e "explain select distinct count(*) from t2 where a between 10 and 20 group by a;" + ------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+ | 1 | SIMPLE | t2 | range | a | a | 5 | NULL | 10 | Using where ; Using index ; Using temporary ; Using filesort; Distinct | + ------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+

          This is fixed in 10.2 tree by post-MDEV-8646 fixes.

          psergei Sergei Petrunia added a comment - This is fixed in 10.2 tree by post- MDEV-8646 fixes.

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.