[MDEV-19620] Changing join_buffer_size causes different results - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
Fix Version/s: 10.5, 10.6
Component/s: Optimizer
Labels:
- 11.0-sel

Description

This test case is originally subselect_sj.test

--echo #

--echo # MDEV-5059: Wrong result (missing row) wih semijoin, join_cache_level > 2, LEFT JOIN, ORDER BY

--echo #

set join_buffer_size= 1024*1024*2;

SET join_cache_level = 3;

set optimizer_search_depth=62;

set join_buffer_space_limit=2097152;

set optimizer_switch="index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on";

CREATE TABLE t1 (pk1 INT PRIMARY KEY, i1 INT) ENGINE=MyISAM;

INSERT INTO t1 VALUES (1,4),(2,5);

CREATE TABLE t2 (i2 INT, c2 CHAR(1)) ENGINE=MyISAM;

INSERT INTO t2 VALUES (1,'v'),(7,'s');

CREATE TABLE t3 (pk3 INT PRIMARY KEY, i3 INT, c3 CHAR(1), INDEX(i3), INDEX(c3,i3)) ENGINE=MyISAM;

INSERT INTO t3 VALUES (1,7,'g'),(2,4,'p'),(3,1,'q');

CREATE TABLE t4 (i4 INT) ENGINE=MyISAM;

INSERT INTO t4 VALUES (1);

SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2)

WHERE ( i2, pk1 )  IN ( SELECT i3, 1 FROM t3, t4 ) ORDER BY i1, c2;

set join_buffer_size=262144;

SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2)

WHERE ( i2, pk1 )  IN ( SELECT i3, 1 FROM t3, t4 ) ORDER BY i1, c2;

DROP TABLE t1,t2,t3,t4;

-----------
The last SELECT produces two different results:

SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2)

WHERE ( i2, pk1 )  IN ( SELECT i3, 1 FROM t3, t4 ) ORDER BY i1, c2;

pk1     i1      i2      c2      pk3     i3      c3

1       4       7       s       1       7       g

1       4       7       s       2       4       p

1       4       7       s       3       1       q

1       4       1       v       1       7       g

1       4       1       v       2       4       p

1       4       1       v       3       1       q

set join_buffer_size=262144;

SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2)

WHERE ( i2, pk1 )  IN ( SELECT i3, 1 FROM t3, t4 ) ORDER BY i1, c2;

pk1     i1      i2      c2      pk3     i3      c3

1       4       7       s       NULL    NULL    NULL

1       4       1       v       NULL    NULL    NULL

Attachments

Activity

Descending order - Click to sort in ascending order

Sergei Petrunia added a comment - 2021-01-12 10:17

If Igor says it's an area he should be reviewing, perhaps this is also an area he should be making the fix for?

Sergei Petrunia added a comment - 2021-01-12 10:17 If Igor says it's an area he should be reviewing, perhaps this is also an area he should be making the fix for?

Varun Gupta (Inactive) added a comment - 2020-12-30 17:26 - edited

Made the changes
http://lists.askmonty.org/pipermail/commits/2020-December/014429.html

Varun Gupta (Inactive) added a comment - 2020-12-30 17:26 - edited Made the changes http://lists.askmonty.org/pipermail/commits/2020-December/014429.html

Sergei Petrunia added a comment - 2020-12-29 15:58

Review: https://lists.launchpad.net/maria-developers/msg12491.html

(also, do not forget to replace an assert with assignment, as discussed on Slack)

Sergei Petrunia added a comment - 2020-12-29 15:58 Review: https://lists.launchpad.net/maria-developers/msg12491.html (also, do not forget to replace an assert with assignment, as discussed on Slack)

Varun Gupta (Inactive) added a comment - 2020-10-21 08:47 - edited

OK the plans look similar but they are not exactly the same, lets analyze the output of ANALYZE FORMAT=JSON for the queries with different join_buffer_size.
When we have a smaller join_buffer_size in that case we disallow using join buffering for a table (t3 here)

Output of ANALYZE with join_buffer_size= 2097152

ANALYZE FORMAT=JSON

SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2)

WHERE ( i2, pk1 )  IN ( SELECT i3, i4 FROM t3 A, t4 B);

ANALYZE

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 0.4434,

    "table": {

      "table_name": "B",

      "access_type": "system",

      "r_loops": 0,

      "rows": 1,

      "r_rows": null,

      "filtered": 100,

      "r_filtered": null

},

    "table": {

      "table_name": "t1",

      "access_type": "const",

      "possible_keys": ["PRIMARY"],

      "key": "PRIMARY",

      "key_length": "4",

      "used_key_parts": ["pk1"],

      "ref": ["const"],

      "r_loops": 0,

      "rows": 1,

      "r_rows": null,

      "filtered": 100,

      "r_filtered": null

},

    "table": {

      "table_name": "t2",

      "access_type": "ALL",

      "r_loops": 1,

      "rows": 2,

      "r_rows": 2,

      "r_total_time_ms": 0.0201,

      "filtered": 100,

      "r_filtered": 100,

      "attached_condition": "t2.i2 is not null"

},

    "duplicates_removal": {

      "block-nl-join": {

        "table": {

          "table_name": "A",

          "access_type": "hash_index",

          "possible_keys": ["i3"],

          "key": "#hash#i3:i3",

          "key_length": "5:5",

          "used_key_parts": ["i3"],

          "ref": ["test.t2.i2"],

          "r_loops": 1,

          "rows": 3,

          "r_rows": 3,

          "r_total_time_ms": 0.055,

          "filtered": 66.667,

          "r_filtered": 100,

          "using_index": true

},

        "buffer_type": "flat",

        "buffer_size": "2048Kb",

        "join_type": "BNLH",

        "r_filtered": 100

},

    "table": {

      "table_name": "t3",

      "access_type": "hash_ALL",

      "possible_keys": ["c3"],

      "key": "#hash#c3",

      "key_length": "2",

      "used_key_parts": ["c3"],

      "ref": ["test.t2.c2"],

      "r_loops": 2,

      "rows": 3,

      "r_rows": 3,

      "r_total_time_ms": 0.0289,

      "filtered": 33.333,

      "r_filtered": 100,

      "attached_condition": "trigcond(trigcond(t2.c2 is not null))"

Output for ANALYZE FORMAT=JSON with join_buffer_size= 262144 (gives correct result)

ANALYZE FORMAT=JSON

SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2)

WHERE ( i2, pk1 )  IN ( SELECT i3, i4 FROM t3 A, t4 B);

ANALYZE

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 0.3191,

    "table": {

      "table_name": "B",

      "access_type": "system",

      "r_loops": 0,

      "rows": 1,

      "r_rows": null,

      "filtered": 100,

      "r_filtered": null

},

    "table": {

      "table_name": "t1",

      "access_type": "const",

      "possible_keys": ["PRIMARY"],

      "key": "PRIMARY",

      "key_length": "4",

      "used_key_parts": ["pk1"],

      "ref": ["const"],

      "r_loops": 0,

      "rows": 1,

      "r_rows": null,

      "filtered": 100,

      "r_filtered": null

},

    "table": {

      "table_name": "t2",

      "access_type": "ALL",

      "r_loops": 1,

      "rows": 2,

      "r_rows": 2,

      "r_total_time_ms": 0.0172,

      "filtered": 100,

      "r_filtered": 100,

      "attached_condition": "t2.i2 is not null"

},

    "duplicates_removal": {

      "block-nl-join": {

        "table": {

          "table_name": "A",

          "access_type": "hash_index",

          "possible_keys": ["i3"],

          "key": "#hash#i3:i3",

          "key_length": "5:5",

          "used_key_parts": ["i3"],

          "ref": ["test.t2.i2"],

          "r_loops": 1,

          "rows": 3,

          "r_rows": 3,

          "r_total_time_ms": 0.0422,

          "filtered": 66.667,

          "r_filtered": 100,

          "using_index": true

},

        "buffer_type": "flat",

        "buffer_size": "256Kb",

        "join_type": "BNLH",

        "r_filtered": 100

},

    "block-nl-join": {

      "table": {

        "table_name": "t3",

        "access_type": "hash_ALL",

        "possible_keys": ["c3"],

        "key": "#hash#c3",

        "key_length": "2",

        "used_key_parts": ["c3"],

        "ref": ["test.t2.c2"],

        "r_loops": 1,

        "rows": 3,

        "r_rows": 3,

        "r_total_time_ms": 0.0132,

        "filtered": 33.333,

        "r_filtered": 100

},

      "buffer_type": "flat",

      "buffer_size": "256Kb",

      "join_type": "BNLH",

      "attached_condition": "trigcond(trigcond(t2.c2 is not null))",

      "r_filtered": null

The problem with the first output (with smaller join buffer), it is still using hash key for table t3 which is incorrect. I think instead of using hash-key we could use ref access on key (c3) here instead. Also by not being able to do the lookup the ON condition is not evaluated (c3= c2) and hence all rows pass the where clause and we return more results.

Varun Gupta (Inactive) added a comment - 2020-10-21 08:47 - edited OK the plans look similar but they are not exactly the same, lets analyze the output of ANALYZE FORMAT=JSON for the queries with different join_buffer_size. When we have a smaller join_buffer_size in that case we disallow using join buffering for a table (t3 here) Output of ANALYZE with join_buffer_size= 2097152 ANALYZE FORMAT=JSON SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2) WHERE ( i2, pk1 ) IN ( SELECT i3, i4 FROM t3 A, t4 B); ANALYZE { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 0.4434, "table": { "table_name": "B", "access_type": "system", "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null }, "table": { "table_name": "t1", "access_type": "const", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "4", "used_key_parts": ["pk1"], "ref": ["const"], "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null }, "table": { "table_name": "t2", "access_type": "ALL", "r_loops": 1, "rows": 2, "r_rows": 2, "r_total_time_ms": 0.0201, "filtered": 100, "r_filtered": 100, "attached_condition": "t2.i2 is not null" }, "duplicates_removal": { "block-nl-join": { "table": { "table_name": "A", "access_type": "hash_index", "possible_keys": ["i3"], "key": "#hash#i3:i3", "key_length": "5:5", "used_key_parts": ["i3"], "ref": ["test.t2.i2"], "r_loops": 1, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.055, "filtered": 66.667, "r_filtered": 100, "using_index": true }, "buffer_type": "flat", "buffer_size": "2048Kb", "join_type": "BNLH", "r_filtered": 100 } }, "table": { "table_name": "t3", "access_type": "hash_ALL", "possible_keys": ["c3"], "key": "#hash#c3", "key_length": "2", "used_key_parts": ["c3"], "ref": ["test.t2.c2"], "r_loops": 2, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.0289, "filtered": 33.333, "r_filtered": 100, "attached_condition": "trigcond(trigcond(t2.c2 is not null))" } } } Output for ANALYZE FORMAT=JSON with join_buffer_size= 262144 (gives correct result) ANALYZE FORMAT=JSON SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2) WHERE ( i2, pk1 ) IN ( SELECT i3, i4 FROM t3 A, t4 B); ANALYZE { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 0.3191, "table": { "table_name": "B", "access_type": "system", "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null }, "table": { "table_name": "t1", "access_type": "const", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "4", "used_key_parts": ["pk1"], "ref": ["const"], "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null }, "table": { "table_name": "t2", "access_type": "ALL", "r_loops": 1, "rows": 2, "r_rows": 2, "r_total_time_ms": 0.0172, "filtered": 100, "r_filtered": 100, "attached_condition": "t2.i2 is not null" }, "duplicates_removal": { "block-nl-join": { "table": { "table_name": "A", "access_type": "hash_index", "possible_keys": ["i3"], "key": "#hash#i3:i3", "key_length": "5:5", "used_key_parts": ["i3"], "ref": ["test.t2.i2"], "r_loops": 1, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.0422, "filtered": 66.667, "r_filtered": 100, "using_index": true }, "buffer_type": "flat", "buffer_size": "256Kb", "join_type": "BNLH", "r_filtered": 100 } }, "block-nl-join": { "table": { "table_name": "t3", "access_type": "hash_ALL", "possible_keys": ["c3"], "key": "#hash#c3", "key_length": "2", "used_key_parts": ["c3"], "ref": ["test.t2.c2"], "r_loops": 1, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.0132, "filtered": 33.333, "r_filtered": 100 }, "buffer_type": "flat", "buffer_size": "256Kb", "join_type": "BNLH", "attached_condition": "trigcond(trigcond(t2.c2 is not null))", "r_filtered": null } } } The problem with the first output (with smaller join buffer), it is still using hash key for table t3 which is incorrect. I think instead of using hash-key we could use ref access on key (c3) here instead. Also by not being able to do the lookup the ON condition is not evaluated (c3= c2) and hence all rows pass the where clause and we return more results.

Sergei Petrunia added a comment - 2019-05-28 14:46

Reproducible without the ORDER BY clause.

Sergei Petrunia added a comment - 2019-05-28 14:46 Reproducible without the ORDER BY clause.

Sergei Petrunia added a comment - 2019-05-28 14:42

If run the query with semijoin=off, I get two rows. I assume two rows is the correct output.

The EXPLAIN is the same with both join_buffer_size settings:

+------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+

| id   | select_type | table | type       | possible_keys | key         | key_len | ref      | rows | Extra                                                                            |

+------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+

|    1 | PRIMARY     | t4    | system     | NULL          | NULL        | NULL    | NULL     |    1 | Using temporary; Using filesort                                                  |

|    1 | PRIMARY     | t1    | const      | PRIMARY       | PRIMARY     | 4       | const    |    1 |                                                                                  |

|    1 | PRIMARY     | t2    | ALL        | NULL          | NULL        | NULL    | NULL     |    2 | Using where                                                                      |

|    1 | PRIMARY     | t3    | hash_index | i3            | #hash#i3:i3 | 5:5     | j2.t2.i2 |    3 | Using index; Start temporary; End temporary; Using join buffer (flat, BNLH join) |

|    1 | PRIMARY     | t3    | hash_ALL   | c3            | #hash#c3    | 2       | j2.t2.c2 |    3 | Using where; Using join buffer (flat, BNLH join)                                 |

+------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+

Sergei Petrunia added a comment - 2019-05-28 14:42 If run the query with semijoin=off , I get two rows. I assume two rows is the correct output. The EXPLAIN is the same with both join_buffer_size settings: +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+ | 1 | PRIMARY | t4 | system | NULL | NULL | NULL | NULL | 1 | Using temporary; Using filesort | | 1 | PRIMARY | t1 | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 2 | Using where | | 1 | PRIMARY | t3 | hash_index | i3 | #hash#i3:i3 | 5:5 | j2.t2.i2 | 3 | Using index; Start temporary; End temporary; Using join buffer (flat, BNLH join) | | 1 | PRIMARY | t3 | hash_ALL | c3 | #hash#c3 | 2 | j2.t2.c2 | 3 | Using where; Using join buffer (flat, BNLH join) | +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+

People

Assignee:: Igor Babaev

Reporter:: Michael Widenius

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2019-05-28 13:30

Updated:: 2024-09-10 15:05

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server