Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7982

EXPLAIN shows meaningless query plan with Distinct

Details

    Description

      Take the attached dataset and run this query:

      explain  select distinct count(*) from t2 where a between 10 and 20 group by a;
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                               |
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
      |    1 | SIMPLE      | t2    | range | a             | a    | 5       | NULL |   10 | Using where; Using index; Distinct; Using temporary; Using filesort |
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+

      and see an apparently non-sensical output (how can a table use "Distinct" if it
      is the only table in the join?)

      This is repeatable on 10.0.17, didn't check 5.5

      Attachments

        Activity

          This is fixed in 10.2 tree by post-MDEV-8646 fixes.

          psergei Sergei Petrunia added a comment - This is fixed in 10.2 tree by post- MDEV-8646 fixes.

          On 5.3 and 5.5, it looks like this:

          elenst@wheezy-64:~/git/5.5$ ml test -e "explain  select distinct count(*) from t2 where a between 10 and 20 group by a;"
          +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
          | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                               |
          +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
          |    1 | SIMPLE      | t2    | range | a             | a    | 5       | NULL |   10 | Using where; Using index; Using temporary; Using filesort; Distinct |
          +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+

          elenst Elena Stepanova added a comment - On 5.3 and 5.5, it looks like this: elenst@wheezy-64:~/git/5.5$ ml test -e "explain select distinct count(*) from t2 where a between 10 and 20 group by a;" + ------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+ | 1 | SIMPLE | t2 | range | a | a | 5 | NULL | 10 | Using where ; Using index ; Using temporary ; Using filesort; Distinct | + ------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+

          Debugging on 10.0, looking at this line of code in JOIN::save_explain_data_intern:

          	if (distinct & test_all_bits(used_tables,
                                               join->select_list_used_tables))

          select_list_used_tables==0 when we come here... which is weird, JOIN::optimize_inner does call JOIN::eval_select_list_used_tables which should have put a correct value there.

          in eval_select_list_used_tables we have:

          (gdb) p item
            $11 = (Item *) 0x7fffa4006618
          (gdb) p dbug_print_item( item)
            $12 = 0x1842700 "count(0)"
          (gdb) p item->used_tables()
            $13 = 0

          One can debate what (count)->used_tables() should return, but this is not the point.

          The point is, if we run

            SELECT DISTINCT COUNT(*) 

          then it is obvious that Distinct optimization is not applicable.
          Short-circuiting logic applies when running

            SELECT DISTINCT func(table_column)
            

          but not when one runs

          SELECT DISTINCT aggregate_func.

          (unless maybe some cases with Loose Scan).

          psergei Sergei Petrunia added a comment - Debugging on 10.0, looking at this line of code in JOIN::save_explain_data_intern: if (distinct & test_all_bits(used_tables, join->select_list_used_tables)) select_list_used_tables==0 when we come here... which is weird, JOIN::optimize_inner does call JOIN::eval_select_list_used_tables which should have put a correct value there. in eval_select_list_used_tables we have: (gdb) p item $11 = (Item *) 0x7fffa4006618 (gdb) p dbug_print_item( item) $12 = 0x1842700 "count(0)" (gdb) p item->used_tables() $13 = 0 One can debate what (count )->used_tables() should return, but this is not the point. The point is, if we run SELECT DISTINCT COUNT(*) then it is obvious that Distinct optimization is not applicable. Short-circuiting logic applies when running SELECT DISTINCT func(table_column) but not when one runs SELECT DISTINCT aggregate_func. (unless maybe some cases with Loose Scan).

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.