Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16225

wrong resultset from query with semijoin=on

Details

    Description

      Please check the following test case kindly supplied by a customer of ours.
      The 2nd query produces wrong resultset (should list all rows , it only lists those
      explicited by the second condition).
      If optimizer switch semijoin is turned off, it will produce correct results.

      This bug is not present in Oracle MySQL 5.6.

      CREATE TABLE `store_loan` (
        `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
        `local_name` varchar(64) NOT NULL,
        PRIMARY KEY (`id`)
      ) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=latin1;
       
      insert  into `store_loan`(`id`,`local_name`) values 
      (1,'Cash Advance'),
      (2,'Cash Advance'),
      (3,'Rollover'),
      (4,'AL Installment'),
      (5,'AL Installment'),
      (6,'AL Installment'),
      (7,'AL Installment'),
      (8,'AL Installment'),
      (9,'AL Installment'),
      (10,'Internet Payday'),
      (11,'Rollover - Internet Payday'),
      (12,'AL Monthly Installment'),
      (13,'AL Semi-Monthly Installment');
       
      ## Make sure @@Optimizer_switch semijoin=on is set
      ## current settings in my env is: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
      ##
      ## I would expect both queries below to return the same result set since the sub-queries in query 1 returns the same data s the IN() clause in query 2
       
      #Query 1
      SELECT SQL_NO_CACHE sl.id 
      FROM store_loan sl
      WHERE (
      	sl.id IN (SELECT A.id FROM store_loan AS A WHERE A.local_name IN (SELECT B.local_name FROM store_loan AS B WHERE B.id IN (0,4,12,13,1,10,3,11)))  
      	OR 
      	(sl.id IN (0,4,12,13,1,10,3,11))
      );
      +----+
      | id |
      +----+
      |  1 |
      |  3 |
      |  4 |
      | 10 |
      | 11 |
      | 12 |
      | 13 |
      +----+
      7 rows in set (0.01 sec)
       
      #Query 2
      SELECT SQL_NO_CACHE sl.id 
      FROM store_loan sl
      WHERE (
       	sl.id IN (1,2,3,4,5,6,7,8,9,10,11,12,13)
      	OR 
      	(sl.id IN (0,4,12,13,1,10,3,11))
      );
      +----+
      | id |
      +----+
      |  1 |
      |  2 |
      |  3 |
      |  4 |
      |  5 |
      |  6 |
      |  7 |
      |  8 |
      |  9 |
      | 10 |
      | 11 |
      | 12 |
      | 13 |
      +----+
      13 rows in set (0.00 sec)
      

      Attachments

        Activity

          Output for analyze format=json

          ANALYZE
          {
            "query_block": {
              "select_id": 1,
              "r_loops": 1,
              "r_total_time_ms": 6.5435,
              "table": {
                "table_name": "sl",
                "access_type": "index",
                "possible_keys": ["PRIMARY"],
                "key": "PRIMARY",
                "key_length": "4",
                "used_key_parts": ["id"],
                "r_loops": 1,
                "rows": 13,
                "r_rows": 13,
                "r_total_time_ms": 0.8685,
                "filtered": 100,
                "r_filtered": 53.846,
                "attached_condition": "<in_optimizer>(sl.`id`,sl.`id` in (subquery#2)) or sl.`id` in (0,4,12,13,1,10,3,11)",
                "using_index": true
              },
              "subqueries": [
                {
                  "query_block": {
                    "select_id": 2,
                    "r_loops": 1,
                    "r_total_time_ms": 5.5768,
                    "table": {
                      "table_name": "<subquery3>",
                      "access_type": "ALL",
                      "possible_keys": ["distinct_key"],
                      "r_loops": 1,
                      "rows": 8,
                      "r_rows": 7,
                      "r_total_time_ms": 0.0975,
                      "filtered": 100,
                      "r_filtered": 100,
                      "materialized": {
                        "unique": 1,
                        "query_block": {
                          "select_id": 3,
                          "table": {
                            "table_name": "B",
                            "access_type": "ALL",
                            "possible_keys": ["PRIMARY"],
                            "r_loops": 1,
                            "rows": 13,
                            "r_rows": 13,
                            "r_total_time_ms": 4.2749,
                            "filtered": 61.538,
                            "r_filtered": 53.846,
                            "attached_condition": "b.`id` in (0,4,12,13,1,10,3,11)"
                          }
                        }
                      }
                    },
                    "table": {
                      "table_name": "A",
                      "access_type": "eq_ref",
                      "possible_keys": ["PRIMARY"],
                      "key": "PRIMARY",
                      "key_length": "4",
                      "used_key_parts": ["id"],
                      "ref": ["func"],
                      "r_loops": 7,
                      "rows": 1,
                      "r_rows": 1,
                      "r_total_time_ms": 0.5449,
                      "filtered": 100,
                      "r_filtered": 14.286,
                      "attached_condition": "a.local_name = b.local_name"
                    }
                  }
                }
              ]
            }
          }
          
          

          varun Varun Gupta (Inactive) added a comment - Output for analyze format=json ANALYZE { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 6.5435, "table": { "table_name": "sl", "access_type": "index", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "r_loops": 1, "rows": 13, "r_rows": 13, "r_total_time_ms": 0.8685, "filtered": 100, "r_filtered": 53.846, "attached_condition": "<in_optimizer>(sl.`id`,sl.`id` in (subquery#2)) or sl.`id` in (0,4,12,13,1,10,3,11)", "using_index": true }, "subqueries": [ { "query_block": { "select_id": 2, "r_loops": 1, "r_total_time_ms": 5.5768, "table": { "table_name": "<subquery3>", "access_type": "ALL", "possible_keys": ["distinct_key"], "r_loops": 1, "rows": 8, "r_rows": 7, "r_total_time_ms": 0.0975, "filtered": 100, "r_filtered": 100, "materialized": { "unique": 1, "query_block": { "select_id": 3, "table": { "table_name": "B", "access_type": "ALL", "possible_keys": ["PRIMARY"], "r_loops": 1, "rows": 13, "r_rows": 13, "r_total_time_ms": 4.2749, "filtered": 61.538, "r_filtered": 53.846, "attached_condition": "b.`id` in (0,4,12,13,1,10,3,11)" } } } }, "table": { "table_name": "A", "access_type": "eq_ref", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "ref": ["func"], "r_loops": 7, "rows": 1, "r_rows": 1, "r_total_time_ms": 0.5449, "filtered": 100, "r_filtered": 14.286, "attached_condition": "a.local_name = b.local_name" } } } ] } }

          So after discussion with psergey , we concluded that the eq_ref access on table A should not happen when we have non-semi-join materialisation.

          This eq_ref access is expected for the IN->EXISTS strategy for the optimisation of non-semi-join subquery.

          So there is somewhere an A.id = sl.id getting injected and because of that we are made to do the ref access

          varun Varun Gupta (Inactive) added a comment - So after discussion with psergey , we concluded that the eq_ref access on table A should not happen when we have non-semi-join materialisation. This eq_ref access is expected for the IN->EXISTS strategy for the optimisation of non-semi-join subquery. So there is somewhere an A.id = sl.id getting injected and because of that we are made to do the ref access

          So a way by which the equality can be injected is when we do the IN -> EXIST transformation.
          For non-semi join subqueries we have 2 strategies

          • Materialisation
          • IN -> EXIST transformation

          We have a cost based approach to decide which technique should be used.
          So trying to see if the equality is actually injected by IN -> EXIST transformation but somehow we forget to remove it when we pick up the materialisation strategy.

          Lets try to run the query with in_to_exists = off

          MariaDB [test]> set optimizer_switch='in_to_exists=off';
          Query OK, 0 rows affected (0.000 sec)
           
          MariaDB [test]> SELECT SQL_NO_CACHE sl.id 
              -> FROM store_loan sl
              -> WHERE (
              -> sl.id IN (SELECT A.id FROM store_loan AS A WHERE A.local_name IN (SELECT B.local_name FROM store_loan AS B WHERE B.id IN (0,4,12,13,1,10,3,11)))  
              -> OR 
              -> (sl.id IN (0,4,12,13,1,10,3,11))
              -> );
          +----+
          | id |
          +----+
          |  1 |
          |  2 |
          |  3 |
          |  4 |
          |  5 |
          |  6 |
          |  7 |
          |  8 |
          |  9 |
          | 10 |
          | 11 |
          | 12 |
          | 13 |
          +----+
          13 rows in set (0.004 sec)
          
          

          The results are as expected, so IN->EXIST strategy injects A.id = sl.id and then it is not removed when we pick materialisation as the optimum strategy.

          varun Varun Gupta (Inactive) added a comment - So a way by which the equality can be injected is when we do the IN -> EXIST transformation. For non-semi join subqueries we have 2 strategies Materialisation IN -> EXIST transformation We have a cost based approach to decide which technique should be used. So trying to see if the equality is actually injected by IN -> EXIST transformation but somehow we forget to remove it when we pick up the materialisation strategy. Lets try to run the query with in_to_exists = off MariaDB [test]> set optimizer_switch='in_to_exists=off'; Query OK, 0 rows affected (0.000 sec)   MariaDB [test]> SELECT SQL_NO_CACHE sl.id -> FROM store_loan sl -> WHERE ( -> sl.id IN (SELECT A.id FROM store_loan AS A WHERE A.local_name IN (SELECT B.local_name FROM store_loan AS B WHERE B.id IN (0,4,12,13,1,10,3,11))) -> OR -> (sl.id IN (0,4,12,13,1,10,3,11)) -> ); +----+ | id | +----+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | | 11 | | 12 | | 13 | +----+ 13 rows in set (0.004 sec) The results are as expected, so IN->EXIST strategy injects A.id = sl.id and then it is not removed when we pick materialisation as the optimum strategy.
          varun Varun Gupta (Inactive) added a comment - Patch http://lists.askmonty.org/pipermail/commits/2018-May/012576.html

          Pushed to 5.5

          varun Varun Gupta (Inactive) added a comment - Pushed to 5.5

          People

            varun Varun Gupta (Inactive)
            rpizzi Rick Pizzi (Inactive)
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.