[MDEV-16225] wrong resultset from query with semijoin=on Created: 2018-05-21  Updated: 2020-08-25  Resolved: 2018-06-02

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Server
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3
Fix Version/s: 5.5.61, 10.0.36, 10.1.34, 10.2.16, 10.3.8

Type: Bug Priority: Major
Reporter: Rick Pizzi Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 2
Labels: semi-join, upstream-not-affected


 Description   

Please check the following test case kindly supplied by a customer of ours.
The 2nd query produces wrong resultset (should list all rows , it only lists those
explicited by the second condition).
If optimizer switch semijoin is turned off, it will produce correct results.

This bug is not present in Oracle MySQL 5.6.

CREATE TABLE `store_loan` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `local_name` varchar(64) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=latin1;
 
insert  into `store_loan`(`id`,`local_name`) values 
(1,'Cash Advance'),
(2,'Cash Advance'),
(3,'Rollover'),
(4,'AL Installment'),
(5,'AL Installment'),
(6,'AL Installment'),
(7,'AL Installment'),
(8,'AL Installment'),
(9,'AL Installment'),
(10,'Internet Payday'),
(11,'Rollover - Internet Payday'),
(12,'AL Monthly Installment'),
(13,'AL Semi-Monthly Installment');
 
## Make sure @@Optimizer_switch semijoin=on is set
## current settings in my env is: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
##
## I would expect both queries below to return the same result set since the sub-queries in query 1 returns the same data s the IN() clause in query 2
 
#Query 1
SELECT SQL_NO_CACHE sl.id 
FROM store_loan sl
WHERE (
	sl.id IN (SELECT A.id FROM store_loan AS A WHERE A.local_name IN (SELECT B.local_name FROM store_loan AS B WHERE B.id IN (0,4,12,13,1,10,3,11)))  
	OR 
	(sl.id IN (0,4,12,13,1,10,3,11))
);
+----+
| id |
+----+
|  1 |
|  3 |
|  4 |
| 10 |
| 11 |
| 12 |
| 13 |
+----+
7 rows in set (0.01 sec)
 
#Query 2
SELECT SQL_NO_CACHE sl.id 
FROM store_loan sl
WHERE (
 	sl.id IN (1,2,3,4,5,6,7,8,9,10,11,12,13)
	OR 
	(sl.id IN (0,4,12,13,1,10,3,11))
);
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  5 |
|  6 |
|  7 |
|  8 |
|  9 |
| 10 |
| 11 |
| 12 |
| 13 |
+----+
13 rows in set (0.00 sec)



 Comments   
Comment by Varun Gupta (Inactive) [ 2018-05-22 ]

Output for analyze format=json

ANALYZE
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 6.5435,
    "table": {
      "table_name": "sl",
      "access_type": "index",
      "possible_keys": ["PRIMARY"],
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["id"],
      "r_loops": 1,
      "rows": 13,
      "r_rows": 13,
      "r_total_time_ms": 0.8685,
      "filtered": 100,
      "r_filtered": 53.846,
      "attached_condition": "<in_optimizer>(sl.`id`,sl.`id` in (subquery#2)) or sl.`id` in (0,4,12,13,1,10,3,11)",
      "using_index": true
    },
    "subqueries": [
      {
        "query_block": {
          "select_id": 2,
          "r_loops": 1,
          "r_total_time_ms": 5.5768,
          "table": {
            "table_name": "<subquery3>",
            "access_type": "ALL",
            "possible_keys": ["distinct_key"],
            "r_loops": 1,
            "rows": 8,
            "r_rows": 7,
            "r_total_time_ms": 0.0975,
            "filtered": 100,
            "r_filtered": 100,
            "materialized": {
              "unique": 1,
              "query_block": {
                "select_id": 3,
                "table": {
                  "table_name": "B",
                  "access_type": "ALL",
                  "possible_keys": ["PRIMARY"],
                  "r_loops": 1,
                  "rows": 13,
                  "r_rows": 13,
                  "r_total_time_ms": 4.2749,
                  "filtered": 61.538,
                  "r_filtered": 53.846,
                  "attached_condition": "b.`id` in (0,4,12,13,1,10,3,11)"
                }
              }
            }
          },
          "table": {
            "table_name": "A",
            "access_type": "eq_ref",
            "possible_keys": ["PRIMARY"],
            "key": "PRIMARY",
            "key_length": "4",
            "used_key_parts": ["id"],
            "ref": ["func"],
            "r_loops": 7,
            "rows": 1,
            "r_rows": 1,
            "r_total_time_ms": 0.5449,
            "filtered": 100,
            "r_filtered": 14.286,
            "attached_condition": "a.local_name = b.local_name"
          }
        }
      }
    ]
  }
}

Comment by Varun Gupta (Inactive) [ 2018-05-22 ]

So after discussion with psergey , we concluded that the eq_ref access on table A should not happen when we have non-semi-join materialisation.

This eq_ref access is expected for the IN->EXISTS strategy for the optimisation of non-semi-join subquery.

So there is somewhere an A.id = sl.id getting injected and because of that we are made to do the ref access

Comment by Varun Gupta (Inactive) [ 2018-05-22 ]

So a way by which the equality can be injected is when we do the IN -> EXIST transformation.
For non-semi join subqueries we have 2 strategies

  • Materialisation
  • IN -> EXIST transformation

We have a cost based approach to decide which technique should be used.
So trying to see if the equality is actually injected by IN -> EXIST transformation but somehow we forget to remove it when we pick up the materialisation strategy.

Lets try to run the query with in_to_exists = off

MariaDB [test]> set optimizer_switch='in_to_exists=off';
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> SELECT SQL_NO_CACHE sl.id 
    -> FROM store_loan sl
    -> WHERE (
    -> sl.id IN (SELECT A.id FROM store_loan AS A WHERE A.local_name IN (SELECT B.local_name FROM store_loan AS B WHERE B.id IN (0,4,12,13,1,10,3,11)))  
    -> OR 
    -> (sl.id IN (0,4,12,13,1,10,3,11))
    -> );
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  5 |
|  6 |
|  7 |
|  8 |
|  9 |
| 10 |
| 11 |
| 12 |
| 13 |
+----+
13 rows in set (0.004 sec)

The results are as expected, so IN->EXIST strategy injects A.id = sl.id and then it is not removed when we pick materialisation as the optimum strategy.

Comment by Varun Gupta (Inactive) [ 2018-05-23 ]

Patch
http://lists.askmonty.org/pipermail/commits/2018-May/012576.html

Comment by Varun Gupta (Inactive) [ 2018-06-02 ]

Pushed to 5.5

Generated at Thu Feb 08 08:27:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.