Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19620

Changing join_buffer_size causes different results

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
    • 10.5, 10.6
    • Optimizer

    Description

      This test case is originally subselect_sj.test

      --echo # 
      --echo # MDEV-5059: Wrong result (missing row) wih semijoin, join_cache_level > 2, LEFT JOIN, ORDER BY
      --echo # 
       
      set join_buffer_size= 1024*1024*2;
      SET join_cache_level = 3;
      set optimizer_search_depth=62;
      set join_buffer_space_limit=2097152;
       
      set optimizer_switch="index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on";
      

      CREATE TABLE t1 (pk1 INT PRIMARY KEY, i1 INT) ENGINE=MyISAM;
      INSERT INTO t1 VALUES (1,4),(2,5);
       
      CREATE TABLE t2 (i2 INT, c2 CHAR(1)) ENGINE=MyISAM;
      INSERT INTO t2 VALUES (1,'v'),(7,'s');
       
      CREATE TABLE t3 (pk3 INT PRIMARY KEY, i3 INT, c3 CHAR(1), INDEX(i3), INDEX(c3,i3)) ENGINE=MyISAM;
      INSERT INTO t3 VALUES (1,7,'g'),(2,4,'p'),(3,1,'q');
       
      CREATE TABLE t4 (i4 INT) ENGINE=MyISAM;
      INSERT INTO t4 VALUES (1);
      

      SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2) 
      WHERE ( i2, pk1 )  IN ( SELECT i3, 1 FROM t3, t4 ) ORDER BY i1, c2;
       
      set join_buffer_size=262144;
       
      SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2) 
      WHERE ( i2, pk1 )  IN ( SELECT i3, 1 FROM t3, t4 ) ORDER BY i1, c2;
       
      DROP TABLE t1,t2,t3,t4;
      

      -----------
      The last SELECT produces two different results:

      SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2) 
      WHERE ( i2, pk1 )  IN ( SELECT i3, 1 FROM t3, t4 ) ORDER BY i1, c2;
      pk1     i1      i2      c2      pk3     i3      c3
      1       4       7       s       1       7       g
      1       4       7       s       2       4       p
      1       4       7       s       3       1       q
      1       4       1       v       1       7       g
      1       4       1       v       2       4       p
      1       4       1       v       3       1       q
      

      set join_buffer_size=262144;
      SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2) 
      WHERE ( i2, pk1 )  IN ( SELECT i3, 1 FROM t3, t4 ) ORDER BY i1, c2;
      pk1     i1      i2      c2      pk3     i3      c3
      1       4       7       s       NULL    NULL    NULL
      1       4       1       v       NULL    NULL    NULL
      

      Attachments

        Activity

          If Igor says it's an area he should be reviewing, perhaps this is also an area he should be making the fix for?

          psergei Sergei Petrunia added a comment - If Igor says it's an area he should be reviewing, perhaps this is also an area he should be making the fix for?
          varun Varun Gupta (Inactive) added a comment - - edited Made the changes http://lists.askmonty.org/pipermail/commits/2020-December/014429.html

          Review: https://lists.launchpad.net/maria-developers/msg12491.html

          (also, do not forget to replace an assert with assignment, as discussed on Slack)

          psergei Sergei Petrunia added a comment - Review: https://lists.launchpad.net/maria-developers/msg12491.html (also, do not forget to replace an assert with assignment, as discussed on Slack)
          varun Varun Gupta (Inactive) added a comment - - edited

          OK the plans look similar but they are not exactly the same, lets analyze the output of ANALYZE FORMAT=JSON for the queries with different join_buffer_size.
          When we have a smaller join_buffer_size in that case we disallow using join buffering for a table (t3 here)

          Output of ANALYZE with join_buffer_size= 2097152

          ANALYZE FORMAT=JSON
          SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2)
          WHERE ( i2, pk1 )  IN ( SELECT i3, i4 FROM t3 A, t4 B);
          ANALYZE
          {
            "query_block": {
              "select_id": 1,
              "r_loops": 1,
              "r_total_time_ms": 0.4434,
              "table": {
                "table_name": "B",
                "access_type": "system",
                "r_loops": 0,
                "rows": 1,
                "r_rows": null,
                "filtered": 100,
                "r_filtered": null
              },
              "table": {
                "table_name": "t1",
                "access_type": "const",
                "possible_keys": ["PRIMARY"],
                "key": "PRIMARY",
                "key_length": "4",
                "used_key_parts": ["pk1"],
                "ref": ["const"],
                "r_loops": 0,
                "rows": 1,
                "r_rows": null,
                "filtered": 100,
                "r_filtered": null
              },
              "table": {
                "table_name": "t2",
                "access_type": "ALL",
                "r_loops": 1,
                "rows": 2,
                "r_rows": 2,
                "r_total_time_ms": 0.0201,
                "filtered": 100,
                "r_filtered": 100,
                "attached_condition": "t2.i2 is not null"
              },
              "duplicates_removal": {
                "block-nl-join": {
                  "table": {
                    "table_name": "A",
                    "access_type": "hash_index",
                    "possible_keys": ["i3"],
                    "key": "#hash#i3:i3",
                    "key_length": "5:5",
                    "used_key_parts": ["i3"],
                    "ref": ["test.t2.i2"],
                    "r_loops": 1,
                    "rows": 3,
                    "r_rows": 3,
                    "r_total_time_ms": 0.055,
                    "filtered": 66.667,
                    "r_filtered": 100,
                    "using_index": true
                  },
                  "buffer_type": "flat",
                  "buffer_size": "2048Kb",
                  "join_type": "BNLH",
                  "r_filtered": 100
                }
              },
              "table": {
                "table_name": "t3",
                "access_type": "hash_ALL",
                "possible_keys": ["c3"],
                "key": "#hash#c3",
                "key_length": "2",
                "used_key_parts": ["c3"],
                "ref": ["test.t2.c2"],
                "r_loops": 2,
                "rows": 3,
                "r_rows": 3,
                "r_total_time_ms": 0.0289,
                "filtered": 33.333,
                "r_filtered": 100,
                "attached_condition": "trigcond(trigcond(t2.c2 is not null))"
              }
            }
          }
          

          Output for ANALYZE FORMAT=JSON with join_buffer_size= 262144 (gives correct result)

          ANALYZE FORMAT=JSON
          SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2)
          WHERE ( i2, pk1 )  IN ( SELECT i3, i4 FROM t3 A, t4 B);
          ANALYZE
          {
            "query_block": {
              "select_id": 1,
              "r_loops": 1,
              "r_total_time_ms": 0.3191,
              "table": {
                "table_name": "B",
                "access_type": "system",
                "r_loops": 0,
                "rows": 1,
                "r_rows": null,
                "filtered": 100,
                "r_filtered": null
              },
              "table": {
                "table_name": "t1",
                "access_type": "const",
                "possible_keys": ["PRIMARY"],
                "key": "PRIMARY",
                "key_length": "4",
                "used_key_parts": ["pk1"],
                "ref": ["const"],
                "r_loops": 0,
                "rows": 1,
                "r_rows": null,
                "filtered": 100,
                "r_filtered": null
              },
              "table": {
                "table_name": "t2",
                "access_type": "ALL",
                "r_loops": 1,
                "rows": 2,
                "r_rows": 2,
                "r_total_time_ms": 0.0172,
                "filtered": 100,
                "r_filtered": 100,
                "attached_condition": "t2.i2 is not null"
              },
              "duplicates_removal": {
                "block-nl-join": {
                  "table": {
                    "table_name": "A",
                    "access_type": "hash_index",
                    "possible_keys": ["i3"],
                    "key": "#hash#i3:i3",
                    "key_length": "5:5",
                    "used_key_parts": ["i3"],
                    "ref": ["test.t2.i2"],
                    "r_loops": 1,
                    "rows": 3,
                    "r_rows": 3,
                    "r_total_time_ms": 0.0422,
                    "filtered": 66.667,
                    "r_filtered": 100,
                    "using_index": true
                  },
                  "buffer_type": "flat",
                  "buffer_size": "256Kb",
                  "join_type": "BNLH",
                  "r_filtered": 100
                }
              },
              "block-nl-join": {
                "table": {
                  "table_name": "t3",
                  "access_type": "hash_ALL",
                  "possible_keys": ["c3"],
                  "key": "#hash#c3",
                  "key_length": "2",
                  "used_key_parts": ["c3"],
                  "ref": ["test.t2.c2"],
                  "r_loops": 1,
                  "rows": 3,
                  "r_rows": 3,
                  "r_total_time_ms": 0.0132,
                  "filtered": 33.333,
                  "r_filtered": 100
                },
                "buffer_type": "flat",
                "buffer_size": "256Kb",
                "join_type": "BNLH",
                "attached_condition": "trigcond(trigcond(t2.c2 is not null))",
                "r_filtered": null
              }
            }
          }
          
          

          The problem with the first output (with smaller join buffer), it is still using hash key for table t3 which is incorrect. I think instead of using hash-key we could use ref access on key (c3) here instead. Also by not being able to do the lookup the ON condition is not evaluated (c3= c2) and hence all rows pass the where clause and we return more results.

          varun Varun Gupta (Inactive) added a comment - - edited OK the plans look similar but they are not exactly the same, lets analyze the output of ANALYZE FORMAT=JSON for the queries with different join_buffer_size. When we have a smaller join_buffer_size in that case we disallow using join buffering for a table (t3 here) Output of ANALYZE with join_buffer_size= 2097152 ANALYZE FORMAT=JSON SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2) WHERE ( i2, pk1 ) IN ( SELECT i3, i4 FROM t3 A, t4 B); ANALYZE { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 0.4434, "table": { "table_name": "B", "access_type": "system", "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null }, "table": { "table_name": "t1", "access_type": "const", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "4", "used_key_parts": ["pk1"], "ref": ["const"], "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null }, "table": { "table_name": "t2", "access_type": "ALL", "r_loops": 1, "rows": 2, "r_rows": 2, "r_total_time_ms": 0.0201, "filtered": 100, "r_filtered": 100, "attached_condition": "t2.i2 is not null" }, "duplicates_removal": { "block-nl-join": { "table": { "table_name": "A", "access_type": "hash_index", "possible_keys": ["i3"], "key": "#hash#i3:i3", "key_length": "5:5", "used_key_parts": ["i3"], "ref": ["test.t2.i2"], "r_loops": 1, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.055, "filtered": 66.667, "r_filtered": 100, "using_index": true }, "buffer_type": "flat", "buffer_size": "2048Kb", "join_type": "BNLH", "r_filtered": 100 } }, "table": { "table_name": "t3", "access_type": "hash_ALL", "possible_keys": ["c3"], "key": "#hash#c3", "key_length": "2", "used_key_parts": ["c3"], "ref": ["test.t2.c2"], "r_loops": 2, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.0289, "filtered": 33.333, "r_filtered": 100, "attached_condition": "trigcond(trigcond(t2.c2 is not null))" } } } Output for ANALYZE FORMAT=JSON with join_buffer_size= 262144 (gives correct result) ANALYZE FORMAT=JSON SELECT * FROM t1, t2 LEFT JOIN t3 ON (c3 = c2) WHERE ( i2, pk1 ) IN ( SELECT i3, i4 FROM t3 A, t4 B); ANALYZE { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 0.3191, "table": { "table_name": "B", "access_type": "system", "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null }, "table": { "table_name": "t1", "access_type": "const", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "4", "used_key_parts": ["pk1"], "ref": ["const"], "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null }, "table": { "table_name": "t2", "access_type": "ALL", "r_loops": 1, "rows": 2, "r_rows": 2, "r_total_time_ms": 0.0172, "filtered": 100, "r_filtered": 100, "attached_condition": "t2.i2 is not null" }, "duplicates_removal": { "block-nl-join": { "table": { "table_name": "A", "access_type": "hash_index", "possible_keys": ["i3"], "key": "#hash#i3:i3", "key_length": "5:5", "used_key_parts": ["i3"], "ref": ["test.t2.i2"], "r_loops": 1, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.0422, "filtered": 66.667, "r_filtered": 100, "using_index": true }, "buffer_type": "flat", "buffer_size": "256Kb", "join_type": "BNLH", "r_filtered": 100 } }, "block-nl-join": { "table": { "table_name": "t3", "access_type": "hash_ALL", "possible_keys": ["c3"], "key": "#hash#c3", "key_length": "2", "used_key_parts": ["c3"], "ref": ["test.t2.c2"], "r_loops": 1, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.0132, "filtered": 33.333, "r_filtered": 100 }, "buffer_type": "flat", "buffer_size": "256Kb", "join_type": "BNLH", "attached_condition": "trigcond(trigcond(t2.c2 is not null))", "r_filtered": null } } } The problem with the first output (with smaller join buffer), it is still using hash key for table t3 which is incorrect. I think instead of using hash-key we could use ref access on key (c3) here instead. Also by not being able to do the lookup the ON condition is not evaluated (c3= c2) and hence all rows pass the where clause and we return more results.

          Reproducible without the ORDER BY clause.

          psergei Sergei Petrunia added a comment - Reproducible without the ORDER BY clause.

          If run the query with semijoin=off, I get two rows. I assume two rows is the correct output.

          The EXPLAIN is the same with both join_buffer_size settings:

          +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+
          | id   | select_type | table | type       | possible_keys | key         | key_len | ref      | rows | Extra                                                                            |
          +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+
          |    1 | PRIMARY     | t4    | system     | NULL          | NULL        | NULL    | NULL     |    1 | Using temporary; Using filesort                                                  |
          |    1 | PRIMARY     | t1    | const      | PRIMARY       | PRIMARY     | 4       | const    |    1 |                                                                                  |
          |    1 | PRIMARY     | t2    | ALL        | NULL          | NULL        | NULL    | NULL     |    2 | Using where                                                                      |
          |    1 | PRIMARY     | t3    | hash_index | i3            | #hash#i3:i3 | 5:5     | j2.t2.i2 |    3 | Using index; Start temporary; End temporary; Using join buffer (flat, BNLH join) |
          |    1 | PRIMARY     | t3    | hash_ALL   | c3            | #hash#c3    | 2       | j2.t2.c2 |    3 | Using where; Using join buffer (flat, BNLH join)                                 |
          +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+
          

          psergei Sergei Petrunia added a comment - If run the query with semijoin=off , I get two rows. I assume two rows is the correct output. The EXPLAIN is the same with both join_buffer_size settings: +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+ | 1 | PRIMARY | t4 | system | NULL | NULL | NULL | NULL | 1 | Using temporary; Using filesort | | 1 | PRIMARY | t1 | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 2 | Using where | | 1 | PRIMARY | t3 | hash_index | i3 | #hash#i3:i3 | 5:5 | j2.t2.i2 | 3 | Using index; Start temporary; End temporary; Using join buffer (flat, BNLH join) | | 1 | PRIMARY | t3 | hash_ALL | c3 | #hash#c3 | 2 | j2.t2.c2 | 3 | Using where; Using join buffer (flat, BNLH join) | +------+-------------+-------+------------+---------------+-------------+---------+----------+------+----------------------------------------------------------------------------------+

          People

            igor Igor Babaev
            monty Michael Widenius
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.