Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8857

[Upstream too] EXPLAIN incorrectly shows Distinct for tables using join buffer

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0(EOL), 10.1(EOL)
    • 10.2.0
    • Optimizer
    • None

    Description

      EXPLAIN may show "Distinct" for tables that use join buffer. This is not a possible execution strategy, one can debug and confirm that Distinct optimization is not used in this case.

      Copying from my report to upstream:

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table twenty (a int, filler char(200), key(a));
      insert into twenty select A.a + B.a* 10, 'AAAAAAAAAAAAAAAAAAAA' from ten A, ten B where B.a in (0,1);

      MySQL [test]>  explain select A.a from ten A, twenty B where A.a+B.a> 0;
      +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
      | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                           |
      +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
      |  1 | SIMPLE      | A     | ALL   | NULL          | NULL | NULL    | NULL |   10 | NULL                                                            |
      |  1 | SIMPLE      | B     | index | NULL          | a    | 5       | NULL |   20 | Using where; Using index; Using join buffer (Block Nested Loop) |
      +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+

      MySQL [test]> explain select distinct A.a from ten A, twenty B where A.a+B.a> 0; 
      +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
      | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                                     |
      +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
      |  1 | SIMPLE      | A     | ALL   | NULL          | NULL | NULL    | NULL |   10 | Using temporary                                                           |
      |  1 | SIMPLE      | B     | index | NULL          | a    | 5       | NULL |   20 | Using where; Using index; Distinct; Using join buffer (Block Nested Loop) |
      +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+

      Look at the second query. It has "Distinct". I think this wrong. "Distinct"
      doesn't work with "Using join buffer".

      To make sure we are on the same page: "Distinct" in table B means that as soon
      as we've got a match for current record in table A, we don't have to look for
      any other matches. The query's SELECT list is "SELECT DISTINCT A.a", there is
      no point to look for additional A.row-B.row pairs with the same A.row.

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            Description EXPLAIN may show "Distinct" for tables that use join buffer. This is not a possible execution strategy, one can debug and confirm that {{Distinct}} optimization is not used in this case.

            Copying from my report to upstream:
            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table twenty (a int, filler char(200), key(a));
            insert into twenty select A.a + B.a* 10, 'AAAAAAAAAAAAAAAAAAAA' from ten A, ten B where B.a in (0,1);
            {noformat}

            {noformat}
            MySQL [test]> explain select A.a from ten A, twenty B where A.a+B.a> 0;
            +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
            | 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | NULL |
            | 1 | SIMPLE | B | index | NULL | a | 5 | NULL | 20 | Using where; Using index; Using join buffer (Block Nested Loop) |
            +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
            {noformat}

            {noformat}
            MySQL [test]> explain select distinct A.a from ten A, twenty B where A.a+B.a> 0;
            +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
            | 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | Using temporary |
            | 1 | SIMPLE | B | index | NULL | a | 5 | NULL | 20 | Using where; Using index; Distinct; Using join buffer (Block Nested Loop) |
            +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
            {noformat}

            {noformat}
            Look at the second query. It has "Distinct". I think this wrong. "Distinct"
            doesn't work with "Using join buffer".
            {noformat}

            To make sure we are on the same page: "Distinct" in table B means that as soon
            as we've got a match for current record in table A, we don't have to look for
            any other matches. The query's SELECT list is "SELECT DISTINCT A.a", there is
            no point to look for additional A.row-B.row pairs with the same A.row.

            EXPLAIN may show "Distinct" for tables that use join buffer. This is not a possible execution strategy, one can debug and confirm that {{Distinct}} optimization is not used in this case.

            Copying from my report to upstream:
            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table twenty (a int, filler char(200), key(a));
            insert into twenty select A.a + B.a* 10, 'AAAAAAAAAAAAAAAAAAAA' from ten A, ten B where B.a in (0,1);
            {noformat}

            {noformat}
            MySQL [test]> explain select A.a from ten A, twenty B where A.a+B.a> 0;
            +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
            | 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | NULL |
            | 1 | SIMPLE | B | index | NULL | a | 5 | NULL | 20 | Using where; Using index; Using join buffer (Block Nested Loop) |
            +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
            {noformat}

            {noformat}
            MySQL [test]> explain select distinct A.a from ten A, twenty B where A.a+B.a> 0;
            +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
            | 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | Using temporary |
            | 1 | SIMPLE | B | index | NULL | a | 5 | NULL | 20 | Using where; Using index; Distinct; Using join buffer (Block Nested Loop) |
            +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
            {noformat}

            Look at the second query. It has "Distinct". I think this wrong. "Distinct"
            doesn't work with "Using join buffer".

            To make sure we are on the same page: "Distinct" in table B means that as soon
            as we've got a match for current record in table A, we don't have to look for
            any other matches. The query's SELECT list is "SELECT DISTINCT A.a", there is
            no point to look for additional A.row-B.row pairs with the same A.row.

            Quoting my comment for upstream:

            You can see Distinct in action: put a breakpoint in end_write and end_send.
            Then:

            set optimizer_switch='block_nested_loop=off';
            # Put a breakpoint in end_send
            select A.a from ten A, twenty B where A.a+B.a> 0;
            # observe that breakpoint was hit 200 times.

            # Run EXPLAIN to be sure about what's going on:
            MySQL [test]> explain select A.a from ten A, twenty B where A.a+B.a> 0;
            +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
            | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
            +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
            |  1 | SIMPLE      | A     | ALL   | NULL          | NULL | NULL    | NULL |   10 | NULL                     |
            |  1 | SIMPLE      | B     | index | NULL          | a    | 5       | NULL |   20 | Using where; Using index |
            +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+

            Then,

            set optimizer_switch='block_nested_loop=off';
            # Put a breakpoint in end_write 
            select distinct A.a from ten A, twenty B where A.a+B.a> 0;
            # Observe that breakpoint was hit 11 times.
            # This is how Distinct optimization works.

            # (just in case, let's check EXPLAIN: )
            MySQL [test]> explain select distinct A.a from ten A, twenty B where A.a+B.a> 0;
            +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
            | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                              |
            +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
            |  1 | SIMPLE      | A     | ALL   | NULL          | NULL | NULL    | NULL |   10 | Using temporary                    |
            |  1 | SIMPLE      | B     | index | NULL          | a    | 5       | NULL |   20 | Using where; Using index; Distinct |
            +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+

            Ok, now with "Using join buffer":

            set optimizer_switch='block_nested_loop=on';

            1. Put a breakpoint in end_send
              select A.a from ten A, twenty B where A.a+B.a> 0;
            2. observe that breakpoint was hit 200 times.
            1. Put a breakpoint in end_write
              select distinct A.a from ten A, twenty B where A.a+B.a> 0;
            2. Observe that breakpoint was hit 200 times, again.

            When one takes time to think about, it becomes obvious that "Distinct"
            optimization cannot be used with Join buffering. And it is not used. But EXPLAIN shows it's used.

            psergei Sergei Petrunia added a comment - Quoting my comment for upstream: You can see Distinct in action: put a breakpoint in end_write and end_send. Then: set optimizer_switch='block_nested_loop=off'; # Put a breakpoint in end_send select A.a from ten A, twenty B where A.a+B.a> 0; # observe that breakpoint was hit 200 times. # Run EXPLAIN to be sure about what's going on: MySQL [test]> explain select A.a from ten A, twenty B where A.a+B.a> 0; +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | NULL | | 1 | SIMPLE | B | index | NULL | a | 5 | NULL | 20 | Using where; Using index | +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ Then, set optimizer_switch='block_nested_loop=off'; # Put a breakpoint in end_write select distinct A.a from ten A, twenty B where A.a+B.a> 0; # Observe that breakpoint was hit 11 times. # This is how Distinct optimization works. # (just in case, let's check EXPLAIN: ) MySQL [test]> explain select distinct A.a from ten A, twenty B where A.a+B.a> 0; +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+ | 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | Using temporary | | 1 | SIMPLE | B | index | NULL | a | 5 | NULL | 20 | Using where; Using index; Distinct | +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+ Ok, now with "Using join buffer": set optimizer_switch='block_nested_loop=on'; Put a breakpoint in end_send select A.a from ten A, twenty B where A.a+B.a> 0; observe that breakpoint was hit 200 times. Put a breakpoint in end_write select distinct A.a from ten A, twenty B where A.a+B.a> 0; Observe that breakpoint was hit 200 times, again. When one takes time to think about, it becomes obvious that "Distinct" optimization cannot be used with Join buffering. And it is not used. But EXPLAIN shows it's used.
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -

            For the record: this was discovered when analyzing MDEV-7983.

            psergei Sergei Petrunia added a comment - For the record: this was discovered when analyzing MDEV-7983 .
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.2.0 [ 20700 ]
            Fix Version/s 10.1 [ 16100 ]

            This was fixed as part of MDEV-8646

            psergei Sergei Petrunia added a comment - This was fixed as part of MDEV-8646
            psergei Sergei Petrunia made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 71764 ] MariaDB v4 [ 149653 ]

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.