Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23878

Wrong result with semi-join and splittable derived table

Details

    Description

      When the number of conditions inside the "IN" clause matches the value of in_predicate_conversion_threshold , query behavior changes and returned result is incorrect for query.

      Attaching file for case reproduction:

      CREATE DATABASE subscriber_data;
      mysql subscriber_data < subscriber_data.sql

      Execute:

      pager;
      set in_predicate_conversion_threshold = 51;
      pager md5sum;
       
      SELECT object, ind, data FROM `subscriber_data`.`dev_config` AS live JOIN (SELECT object, ind, MAX(collection_time) AS collection_time FROM `subscriber_data`.`dev_config` WHERE collection_time <= '2020-09-24' AND ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51) GROUP BY object, ind) AS latest USING (object, ind, collection_time) WHERE ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51);
      

      Result is
      13 rows in set (0.039 sec)

      Otherwise setting set in_predicate_conversion_threshold = 1000; (value bigger than number of clauses inside "IN" ):

      pager;
      set in_predicate_conversion_threshold = 1000;
      pager md5sum;
       
      SELECT object, ind, data FROM `subscriber_data`.`dev_config` AS live JOIN (SELECT object, ind, MAX(collection_time) AS collection_time FROM `subscriber_data`.`dev_config` WHERE collection_time <= '2020-09-24' AND ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51) GROUP BY object, ind) AS latest USING (object, ind, collection_time) WHERE ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51);
      

      Returns
      700 rows in set (0.038 sec)

      The returned result when bug triggers (in_predicate_conversion_threshold = 51) is:
      -------------------------------------------

      object ind data

      -------------------------------------------

      1 51 51
      2 51 0
      3 51 us-east-1d
      4 51 ebd81398742e75f2fafb5511e06720c3
      5 51 0
      6 51 0
      7 51 0
      8 51 9b5d0e93ea552fb8a86420e87410ae21
      10 51 NULL
      11 51 0
      12 51 0
      13 51 0
      14 51 0

      -------------------------------------------
      13 rows in set (0.046 sec)

      But if I reorder the values inside the query IN clause and set "1" at the end instead of 51, result is:
      -------------------------------------------

      object ind data

      -------------------------------------------

      1 1 1
      2 1 0
      3 1 test_by_vcug_device
      4 1 6dc8e300391a4730609c89d9bf5e7564
      5 1 0
      6 1 NULL
      7 1 0
      8 1 8d4ffb6e30134a25b5297975eb3c0155
      10 1 NULL
      11 1 0
      12 1 0
      13 1 0
      14 1 0

      -------------------------------------------
      13 rows in set (0.040 sec)

      It can be seen that on the group by "ind" , the result "ind" value displayed is only the one last used in the "IN" conditions, and there seems to is a "UNION ALL" missing to merge all results when in_predicate_conversion_threshold kicks in.

      Attachments

        Activity

          Starting from MariaDB version 11.0 the test case provided in my previous comment starts working correctly.
          The plan for this test case differs from the plan in the earlier versions: the column of the transformed IN-predicate is not used in the lateral derived.

          set in_predicate_conversion_threshold = 2;
           
          SELECT COUNT(*) AS cnt FROM t1
          JOIN
          (
          SELECT groupId, id
          FROM t1
          WHERE id IN (1,2,3,4,5,6)
          GROUP BY groupId, id
          ) AS t2
          USING (groupId, id)
          WHERE id IN (1,2,3,4,5,6,7,8);
          +-----+
          | cnt |
          +-----+
          |   6 |
          +-----+
           
          EXPLAIN
          SELECT COUNT(*) AS cnt FROM t1
          JOIN
          (
            SELECT groupId, id
            FROM t1
            WHERE id IN (1,2,3,4,5,6)
            GROUP BY groupId, id
          ) AS t2
          USING (groupId, id)
          WHERE id IN (1,2,3,4,5,6,7,8);
          +------+-----------------+-------------+--------+---------------+--------------+---------+----------------------------+------+----------------+
          | id   | select_type     | table       | type   | possible_keys | key          | key_len | ref                        | rows | Extra          |
          +------+-----------------+-------------+--------+---------------+--------------+---------+----------------------------+------+----------------+
          |    1 | PRIMARY         | t1          | ALL    | PRIMARY       | NULL         | NUL     | NULL                       | 19   |                |
          |    1 | PRIMARY         | <derived3>  | ref    | key0          | key0         | 8       | test.t1.groupId,test.t1.id | 1    |                |
          |    1 | PRIMARY         | <derived5>  | eq_ref | distinct_key  | distinct_key | 4       | test.t1.id                 | 1    | Using where    |
          |    5 | DERIVED         | NULL        | NULL   | NULL          | NULL         | NULL    | NULL                       | NULL | No tables used |
          |    3 | LATERAL DERIVED | t1          | eq_ref | PRIMARY       | PRIMARY      | 8       | test.t1.groupId,test.t1.id | 1    |                |
          |    3 | LATERAL DERIVED | <derived7>  | eq_ref | distinct_key  | distinct_key | 4       | test.t1.id                 | 1    | Using where    |
          |    7 | DERIVED         | NULL        | NULL   | NULL          | NULL         | NULL    | NULL                       | NULL | No tables used |
          +------+-----------------+-------------+--------+---------------+--------------+---------+----------------------------+------+----------------+
          

          shagalla Galina Shalygina (Inactive) added a comment - - edited Starting from MariaDB version 11.0 the test case provided in my previous comment starts working correctly. The plan for this test case differs from the plan in the earlier versions: the column of the transformed IN-predicate is not used in the lateral derived. set in_predicate_conversion_threshold = 2;   SELECT COUNT (*) AS cnt FROM t1 JOIN ( SELECT groupId, id FROM t1 WHERE id IN (1,2,3,4,5,6) GROUP BY groupId, id ) AS t2 USING (groupId, id) WHERE id IN (1,2,3,4,5,6,7,8); + -----+ | cnt | + -----+ | 6 | + -----+   EXPLAIN SELECT COUNT (*) AS cnt FROM t1 JOIN ( SELECT groupId, id FROM t1 WHERE id IN (1,2,3,4,5,6) GROUP BY groupId, id ) AS t2 USING (groupId, id) WHERE id IN (1,2,3,4,5,6,7,8); + ------+-----------------+-------------+--------+---------------+--------------+---------+----------------------------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + ------+-----------------+-------------+--------+---------------+--------------+---------+----------------------------+------+----------------+ | 1 | PRIMARY | t1 | ALL | PRIMARY | NULL | NUL | NULL | 19 | | | 1 | PRIMARY | <derived3> | ref | key0 | key0 | 8 | test.t1.groupId,test.t1.id | 1 | | | 1 | PRIMARY | <derived5> | eq_ref | distinct_key | distinct_key | 4 | test.t1.id | 1 | Using where | | 5 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 3 | LATERAL DERIVED | t1 | eq_ref | PRIMARY | PRIMARY | 8 | test.t1.groupId,test.t1.id | 1 | | | 3 | LATERAL DERIVED | <derived7> | eq_ref | distinct_key | distinct_key | 4 | test.t1.id | 1 | Using where | | 7 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | + ------+-----------------+-------------+--------+---------------+--------------+---------+----------------------------+------+----------------+

          We see a different join order for the top join in 11.0: t1 > t2 -> semi-join nest. In 10.4 the join order is t1>semi-join nest->t2.
          shagalla: It would be good to construct a test case with same join order for both 10.4 and 11.0.

          igor Igor Babaev (Inactive) added a comment - We see a different join order for the top join in 11.0: t1 > t2 -> semi-join nest. In 10.4 the join order is t1 >semi-join nest->t2. shagalla : It would be good to construct a test case with same join order for both 10.4 and 11.0.

          When the function JOIN::fix_all_splittings_in_plan() is called it's not guaranteed that the value of JOIN::sjm_lookup_tables contains the bitmap of all semi-join tables which were included into materialized nests. For example when executing the reported query this value is equal to 0 when JOIN::fix_all_splittings_in_plan() is called. As result the call of reset_validity_vars_for_keyuses() in the function JOIN_TAB::fix_splitting() does not work as expected.
          This problem can be fixed with the following patch:

          --- a/sql/opt_split.cc
          +++ b/sql/opt_split.cc
          @@ -1334,6 +1334,7 @@ bool JOIN::fix_all_splittings_in_plan()
           {
             table_map prev_tables= 0;
             table_map all_tables= (table_map(1) << table_count) - 1;
          +  table_map prev_sjm_lookup_tables= 0;
             for (uint tablenr= 0; tablenr < table_count; tablenr++)
             {
               POSITION *cur_pos= &best_positions[tablenr];
          @@ -1342,7 +1343,7 @@ bool JOIN::fix_all_splittings_in_plan()
               {
                 SplM_plan_info *spl_plan= cur_pos->spl_plan;
                 table_map excluded_tables= (all_tables & ~prev_tables) |
          -                                 sjm_lookup_tables;
          +                                  prev_sjm_lookup_tables;
                                              ;
                 if (spl_plan)
                 {
          @@ -1360,6 +1361,8 @@ bool JOIN::fix_all_splittings_in_plan()
                     return true;
               }
               prev_tables|= tab->table->map;
          +    if (cur_pos->sj_strategy == SJ_OPT_MATERIALIZE)
          +      prev_sjm_lookup_tables|= tab->table->map;
             }
             return false;
           }
          

          igor Igor Babaev (Inactive) added a comment - When the function JOIN::fix_all_splittings_in_plan() is called it's not guaranteed that the value of JOIN::sjm_lookup_tables contains the bitmap of all semi-join tables which were included into materialized nests. For example when executing the reported query this value is equal to 0 when JOIN::fix_all_splittings_in_plan() is called. As result the call of reset_validity_vars_for_keyuses() in the function JOIN_TAB::fix_splitting() does not work as expected. This problem can be fixed with the following patch: --- a/sql/opt_split.cc +++ b/sql/opt_split.cc @@ -1334,6 +1334,7 @@ bool JOIN::fix_all_splittings_in_plan() { table_map prev_tables= 0; table_map all_tables= (table_map(1) << table_count) - 1; + table_map prev_sjm_lookup_tables= 0; for (uint tablenr= 0; tablenr < table_count; tablenr++) { POSITION *cur_pos= &best_positions[tablenr]; @@ -1342,7 +1343,7 @@ bool JOIN::fix_all_splittings_in_plan() { SplM_plan_info *spl_plan= cur_pos->spl_plan; table_map excluded_tables= (all_tables & ~prev_tables) | - sjm_lookup_tables; + prev_sjm_lookup_tables; ; if (spl_plan) { @@ -1360,6 +1361,8 @@ bool JOIN::fix_all_splittings_in_plan() return true; } prev_tables|= tab->table->map; + if (cur_pos->sj_strategy == SJ_OPT_MATERIALIZE) + prev_sjm_lookup_tables|= tab->table->map; } return false; }

          Ok to push into 10.5

          igor Igor Babaev (Inactive) added a comment - Ok to push into 10.5

          Pushed in 10.5

          shagalla Galina Shalygina (Inactive) added a comment - Pushed in 10.5

          People

            shagalla Galina Shalygina (Inactive)
            ctutte carlos tutte
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.