Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17752

Plan changes from hash_index_merge to index_merge with new optimizer defaults

Details

    Description

      The mtr test case

      set @save_optimizer_switch=@@optimizer_switch;
      set @@optimizer_switch='optimize_join_buffer_size=on';
      set optimizer_switch='semijoin=on,firstmatch=on,loosescan=on';
      set @@optimizer_switch='semijoin_with_cache=on';
      set @@optimizer_switch='outer_join_with_cache=on';
      set optimizer_switch='mrr=on,mrr_sort_keys=on,index_condition_pushdown=on';
      set @@use_stat_tables= PREFERABLY;
      set @@optimizer_use_condition_selectivity=4;
      set @local_join_cache_test_optimizer_switch_default=@@optimizer_switch;
      set names utf8;
       
      CREATE DATABASE world;
       
      use world;
       
      --source include/world_schema1.inc
       
      --disable_query_log
      --disable_result_log
      --disable_warnings
      --source include/world.inc
      --enable_warnings
      --enable_result_log
      --enable_query_log
       
      SELECT COUNT(*) FROM Country;
      SELECT COUNT(*) FROM City;
      SELECT COUNT(*) FROM CountryLanguage;
       
      show variables like 'join_buffer_size';
       
      set join_cache_level=4;
       
      CREATE INDEX City_Population ON City(Population);
      CREATE INDEX City_Name ON City(Name);
       
      --disable_result_log
      ANALYZE TABLE City;
      --enable_result_log
       
      EXPLAIN
      SELECT Country.Name, Country.Population, City.Name, City.Population 
        FROM Country LEFT JOIN City
             ON City.Country=Country.Code AND City.Population > 5000000
         WHERE Country.Name LIKE 'C%' AND Country.Population > 10000000;
       
      EXPLAIN
      SELECT Country.Name, Country.Population, City.Name, City.Population
        FROM Country LEFT JOIN City 
             ON City.Country=Country.Code AND
                (City.Population > 5000000 OR City.Name LIKE 'Za%')
        WHERE Country.Name LIKE 'C%' AND Country.Population > 10000000;
      

      EXPLAIN
      SELECT Country.Name, Country.Population, City.Name, City.Population
      FROM Country LEFT JOIN City 
      ON City.Country=Country.Code AND
      (City.Population > 5000000 OR City.Name LIKE 'Za%')
      WHERE Country.Name LIKE 'C%' AND Country.Population > 10000000;
      id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
      1	SIMPLE	Country	ALL	NULL	NULL	NULL	NULL	239	Using where
      1	SIMPLE	City	index_merge	City_Population,City_Name	City_Population,City_Name	4,35	NULL	96	Using sort_union(City_Population,City_Name); Using where; Using join buffer (flat, BNL join)
      

      Attachments

        Issue Links

          Activity

            If I just run the explain without the other query

            EXPLAIN
            SELECT Country.Name, Country.Population, City.Name, City.Population
            FROM Country LEFT JOIN City 
            ON City.Country=Country.Code AND
            (City.Population > 5000000 OR City.Name LIKE 'Za%')
            WHERE Country.Name LIKE 'C%' AND Country.Population > 10000000;
            id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            1	SIMPLE	Country	ALL	NULL	NULL	NULL	NULL	239	Using where
            1	SIMPLE	City	hash_index_merge	City_Population,City_Name	#hash#$hj:City_Population,City_Name	3:4,35world.Country.Code	96	Using sort_union(City_Population,City_Name); Using where; Using join buffer (flat, BNLH join)
            

            varun Varun Gupta (Inactive) added a comment - If I just run the explain without the other query EXPLAIN SELECT Country.Name, Country.Population, City.Name, City.Population FROM Country LEFT JOIN City ON City.Country=Country.Code AND (City.Population > 5000000 OR City.Name LIKE 'Za%' ) WHERE Country.Name LIKE 'C%' AND Country.Population > 10000000 ; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE Country ALL NULL NULL NULL NULL 239 Using where 1 SIMPLE City hash_index_merge City_Population,City_Name #hash#$hj:City_Population,City_Name 3 : 4 ,35world.Country.Code 96 Using sort_union(City_Population,City_Name); Using where; Using join buffer (flat, BNLH join)
            varun Varun Gupta (Inactive) added a comment - Patch http://lists.askmonty.org/pipermail/commits/2018-November/013112.html
            psergei Sergei Petrunia added a comment - - edited

            Debugging the original query plan (with hash_index_merge)

            MariaDB [world]> set use_stat_tables=never;
            Query OK, 0 rows affected (0.001 sec)
             
            MariaDB [world]> flush tables;
            Query OK, 0 rows affected (0.026 sec)
             
            MariaDB [world]> EXPLAIN SELECT Country.Name, Country.Population, City.Name, City.Population   FROM Country LEFT JOIN City         ON City.Country=Country.Code AND           (City.Population > 5000000 OR City.Name LIKE 'Za%')   WHERE Country.Name LIKE 'C%' AND Country.Population > 10000000;
            +------+-------------+---------+------------------+---------------------------+-------------------------------------+---------+--------------------+------+-----------------------------------------------------------------------------------------------+
            | id   | select_type | table   | type             | possible_keys             | key                                 | key_len | ref                | rows | Extra                                                                                         |
            +------+-------------+---------+------------------+---------------------------+-------------------------------------+---------+--------------------+------+-----------------------------------------------------------------------------------------------+
            |    1 | SIMPLE      | Country | ALL              | NULL                      | NULL                                | NULL    | NULL               |  239 | Using where                                                                                   |
            |    1 | SIMPLE      | City    | hash_index_merge | City_Population,City_Name | #hash#$hj:City_Population,City_Name | 3:4,35  | world.Country.Code |   43 | Using sort_union(City_Population,City_Name); Using where; Using join buffer (flat, BNLH join) |
            +------+-------------+---------+------------------+---------------------------+-------------------------------------+---------+--------------------+------+-----------------------------------------------------------------------------------------------+
            

            Re-running the above query under debugger, in best_access_path(idx=1, table=City) :

            The first line of the condition computes to true:

              if ((records >= s->found_records || best > s->read_time) &&            // (1)
            

            (gdb) p (records >= s->found_records || best > s->read_time)
              $21 = true
            

            Then

                  !(s->quick && best_key && s->quick->index == best_key->key &&      // (2)
                    best_max_key_part >= s->table->quick_key_parts[best_key->key]) &&// (2)
            

            (gdb) p s->quick
              $14 = (QUICK_INDEX_MERGE_SELECT *) 0x7fff54075e90
            (gdb) p best_key
              $15 = (KEYUSE *) 0x7fff54036398  // this is hj_start_key.
            (gdb) p  s->quick->index == best_key->key
              $16 = true
            (gdb) p  best_max_key_part
              $22 = 0
            (gdb) p best_key->key
              $23 = 64
            

            So, the first line computes to true:

            (gdb) p s->quick && best_key && s->quick->index == best_key->key 
              $25 = true
            

            and we are about to compute s->table->quick_key_parts[best_key->key]. This will read beyond the end of the quick_key_parts array:

              uint		quick_key_parts[MAX_KEY];
              uint		quick_n_ranges[MAX_KEY];
            

            There's no crash potential because of the next array, but this is still bad.

            ... and this is why Varun observes that the query plan changes depending on whether there was another query ran before it. best_max_key_part >= s->table->quick_key_parts[best_key->key] evaluates to true when there was no previous query (and we get hash_index_merge). When there was a previous query, it evaluates to false.

            psergei Sergei Petrunia added a comment - - edited Debugging the original query plan (with hash_index_merge) MariaDB [world]> set use_stat_tables=never; Query OK, 0 rows affected (0.001 sec)   MariaDB [world]> flush tables; Query OK, 0 rows affected (0.026 sec)   MariaDB [world]> EXPLAIN SELECT Country.Name, Country.Population, City.Name, City.Population FROM Country LEFT JOIN City ON City.Country=Country.Code AND (City.Population > 5000000 OR City.Name LIKE 'Za%') WHERE Country.Name LIKE 'C%' AND Country.Population > 10000000; +------+-------------+---------+------------------+---------------------------+-------------------------------------+---------+--------------------+------+-----------------------------------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------+------------------+---------------------------+-------------------------------------+---------+--------------------+------+-----------------------------------------------------------------------------------------------+ | 1 | SIMPLE | Country | ALL | NULL | NULL | NULL | NULL | 239 | Using where | | 1 | SIMPLE | City | hash_index_merge | City_Population,City_Name | #hash#$hj:City_Population,City_Name | 3:4,35 | world.Country.Code | 43 | Using sort_union(City_Population,City_Name); Using where; Using join buffer (flat, BNLH join) | +------+-------------+---------+------------------+---------------------------+-------------------------------------+---------+--------------------+------+-----------------------------------------------------------------------------------------------+ Re-running the above query under debugger, in best_access_path(idx=1, table=City) : The first line of the condition computes to true: if ((records >= s->found_records || best > s->read_time) && // (1) (gdb) p (records >= s->found_records || best > s->read_time) $21 = true Then !(s->quick && best_key && s->quick->index == best_key->key && // (2) best_max_key_part >= s->table->quick_key_parts[best_key->key]) && // (2) (gdb) p s->quick $14 = (QUICK_INDEX_MERGE_SELECT *) 0x7fff54075e90 (gdb) p best_key $15 = (KEYUSE *) 0x7fff54036398 // this is hj_start_key. (gdb) p s->quick->index == best_key->key $16 = true (gdb) p best_max_key_part $22 = 0 (gdb) p best_key->key $23 = 64 So, the first line computes to true: (gdb) p s->quick && best_key && s->quick->index == best_key->key $25 = true and we are about to compute s->table->quick_key_parts [best_key->key]. This will read beyond the end of the quick_key_parts array: uint quick_key_parts[MAX_KEY]; uint quick_n_ranges[MAX_KEY]; There's no crash potential because of the next array, but this is still bad. ... and this is why Varun observes that the query plan changes depending on whether there was another query ran before it. best_max_key_part >= s->table->quick_key_parts [best_key->key] evaluates to true when there was no previous query (and we get hash_index_merge). When there was a previous query, it evaluates to false.

            Now, looking at the patch. It disables taking the if-branch if hash join is being used:

               if ((records >= s->found_records || best > s->read_time) &&            // (1)
            +      !(best_key && best_key->key == MAX_KEY) &&                         // (2)
                   !(s->quick && best_key && s->quick->index == best_key->key &&      // (2)
            

            This was not my intent. My intent was as follows:

            Disable the heuristic marked as "(2)" if we are using hash join.

            But maybe what the patch does would be better? Perhaps, but in that case the code inside the if-branch should not check "best_key->is_for_hash_join()" like it currently does.

            I would like to check with Igor, does the execution need to enter the if-branch if hash join is used.

            psergei Sergei Petrunia added a comment - Now, looking at the patch. It disables taking the if-branch if hash join is being used: if ((records >= s->found_records || best > s->read_time) && // (1) + !(best_key && best_key->key == MAX_KEY) && // (2) !(s->quick && best_key && s->quick->index == best_key->key && // (2) This was not my intent. My intent was as follows: Disable the heuristic marked as "(2)" if we are using hash join. But maybe what the patch does would be better? Perhaps, but in that case the code inside the if-branch should not check "best_key->is_for_hash_join()" like it currently does. I would like to check with Igor, does the execution need to enter the if-branch if hash join is used.

            ... varun, on the question of loading the "world" database again - why is it not possible to add the test into the join_cache.test where this database is loaded already?

            psergei Sergei Petrunia added a comment - ... varun , on the question of loading the "world" database again - why is it not possible to add the test into the join_cache.test where this database is loaded already?
            varun Varun Gupta (Inactive) added a comment - Patch http://lists.askmonty.org/pipermail/commits/2019-May/013793.html

            Ok to push into 10.1

            igor Igor Babaev (Inactive) added a comment - Ok to push into 10.1

            People

              varun Varun Gupta (Inactive)
              varun Varun Gupta (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.