[MDEV-15454] Nested SELECT IN returns wrong results Created: 2018-03-02  Updated: 2018-07-25  Resolved: 2018-07-25

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Subquery
Affects Version/s: 5.5, 10.0, 10.1, 10.1.26, 10.2
Fix Version/s: 5.5.61

Type: Bug Priority: Major
Reporter: Gary Scholten Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 1
Labels: SELECT
Environment:

Debian 9 Stretch


Attachments: Text File analyze.txt     Zip Archive mysql.zip     PNG File output1.png     PNG File output2.png     Text File tables.txt     Text File variables.txt    
Issue Links:
Relates
relates to MDEV-15982 Incorrect results when subquery is ma... Closed

 Description   

Upgraded Debian 8 with MySql 5.5.59 to Debian 9 with MariaDB 10.1.26 and noticed different results (auto-installed by dist-upgrade)

SELECT p.`id_promo`, p.`name`
FROM `ps_promo` p
WHERE p.`id_promo` IN (
SELECT pg.`id_promo` FROM `ps_promo_group` pg WHERE pg.`id_group` IN (
SELECT cg.`id_group` FROM `ps_customer_group` cg WHERE cg.`id_customer`= 727)
)
AND (p.`date_effective` <= CURRENT_DATE AND p.`date_expiry` >= CURRENT_DATE)
AND p.`position_essential` > 0
ORDER BY p.`name` ASC LIMIT 30;

It returns too many results!!. If I run the inner part first and save that into my buffer and paste that in the outer SELECT, I get correct results (2). Now I get 7 results, 5 are incorrect.

Innerpart:
SELECT pg.`id_promo` FROM `ps_promo_group` pg WHERE pg.`id_group` IN (
SELECT cg.`id_group` FROM `ps_customer_group` cg WHERE cg.`id_customer`= 727)

I checked MDEV-13694 and some other similar issues, however my optimizer_switch=orderby_uses_equalities=off is my current set up and should return correct results, but it doesn't.



 Comments   
Comment by Alice Sherepa [ 2018-03-05 ]

Please add an output of

SHOW CREATE TABLE  ps_promo;
SHOW CREATE TABLE  ps_promo_group;
SHOW CREATE TABLE  ps_customer_group;

and your .cnf file(s)

Comment by Gary Scholten [ 2018-03-05 ]

On our website, customers can create accounts. When the account is created customers is only added to 1 group and only has limited access to services and limited sets of links are displayed for them (promo-items). Customer 727 registered and should only get 2 promo links, but got 7 links instead.
Then, we added some additional services to this customer (meaning, customer had now relations to 3 groups) and some other tables got changed too (we have 370 tables for this website). Then we revoked the groups and customer is now only in 2 groups, which should only give access to the 2 links again. Now it seems to work as only 2 rows get returned (the correct ones).
Then, we added another customer: 728 and the same thing happens again. The output of the query with customer 727 (now correct) and 728 (incorrect) is attached in the output1.png

Then, I ran the inside query with the inner 2 selects, sorted by id_promo and it did NOT show the id_promo's 12 and 13.
Then, I ran the complete query with 3 selects, sorted by id_promo again and now I do get id_promo's 12 and 13 which are incorrect. The output is attached in output2.png

Comment by Gary Scholten [ 2018-03-05 ]

By the way, we are running on Google Compute Engine with 1vcpu, 1.7GB mem

Comment by Alice Sherepa [ 2018-03-08 ]

I can not reproduce it so far...
Maybe it will be suitable for you to upload a dump to ftp.askmonty.org/private
Also please add the output of

SHOW VARIABLES;
ANALYZE format=json #query

Comment by Gary Scholten [ 2018-03-08 ]

Hi,
I uploaded the variables and the json output.
Regarding the dump. I guess you mean a mysqldump. As the used database is a copy of our production database, it contains many user passwords and some other sensitive data. So, the answer is No.
I could strip it first, but that will take some time.
Thx.
Gary

Comment by Alice Sherepa [ 2018-03-08 ]

Thanks for the report!
It is repeatable on 5.5-10.2, with InnoDB and MyIsam.
As a temprorary workaround, please use
SET optimizer_switch='materialization=off';

CREATE TABLE t1 ( a int NOT NULL PRIMARY KEY ) ENGINE=myisam;
INSERT INTO  t1 SELECT seq FROM seq_1_to_1000;
 
CREATE TABLE t2 ( a int(5) NOT NULL,b int(3) NOT NULL) ENGINE=myisam;
INSERT INTO t2  VALUES (7878, 96),(3465, 96),(1403, 96),(4189, 96),(8732, 96), (5,96);
INSERT INTO t2 SELECT seq,56 FROM seq_1_to_1000;
 
CREATE TABLE t3 (
  c int(10) unsigned NOT NULL,
  b int(10) unsigned NOT NULL,
  PRIMARY KEY (c,b)
) ENGINE=myisam;
 
INSERT INTO t3 (c, b) VALUES (27, 96);
INSERT INTO t3 SELECT seq,seq FROM seq_30_to_1000;
 
SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27);
 
SET optimizer_switch='materialization=off';
 
SELECT t1.a FROM t1 
WHERE t1.a IN (SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)) LIMIT 5;
 
SET optimizer_switch='materialization=on';
 
SELECT t1.a FROM t1 
WHERE t1.a IN (SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)) LIMIT 5;

--------------
SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)
--------------
+------+
| a    |
+------+
| 7878 |
| 3465 |
| 1403 |
| 4189 |
| 8732 |
|    5 |
+------+
6 rows in set (0.00 sec)
 
--------------
SET optimizer_switch='materialization=off'
--------------
Query OK, 0 rows affected (0.00 sec)
 
--------------
SELECT t1.a FROM t1 
WHERE t1.a IN (SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)) LIMIT 5
--------------
+---+
| a |
+---+
| 5 |
+---+
1 row in set (0.00 sec)
 
--------------
explain extended SELECT t1.a FROM t1 
WHERE t1.a IN (SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)) LIMIT 5
--------------
+------+-------------+-------+--------+---------------+---------+---------+-----------+------+----------+-------------------------------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | filtered | Extra                                           |
+------+-------------+-------+--------+---------------+---------+---------+-----------+------+----------+-------------------------------------------------+
|    1 | PRIMARY     | t3    | ref    | PRIMARY       | PRIMARY | 4       | const     |    1 |   100.00 | Using index; Start temporary                    |
|    1 | PRIMARY     | t2    | ALL    | NULL          | NULL    | NULL    | NULL      | 1006 |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    1 | PRIMARY     | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | test.t2.a |    1 |   100.00 | Using index; End temporary                      |
+------+-------------+-------+--------+---------------+---------+---------+-----------+------+----------+-------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
 
Note (Code 1003): select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t3` join `test`.`t2`) where `test`.`t3`.`c` = 27 and `test`.`t1`.`a` = `test`.`t2`.`a` and `test`.`t2`.`b` = `test`.`t3`.`b` limit 5
--------------
SET optimizer_switch='materialization=on'
--------------
Query OK, 0 rows affected (0.00 sec)
 
--------------
SELECT t1.a FROM t1 
WHERE t1.a IN (SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)) LIMIT 5
--------------
+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+---+
5 rows in set (0.00 sec)
 
--------------
explain extended SELECT t1.a FROM t1 
WHERE t1.a IN (SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)) LIMIT 5
--------------
+------+--------------+-------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref   | rows | filtered | Extra                              |
+------+--------------+-------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
|    1 | PRIMARY      | t1          | index  | PRIMARY       | PRIMARY      | 4       | NULL  | 1000 |   100.00 | Using index                        |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func  |    1 |   100.00 |                                    |
|    2 | MATERIALIZED | t3          | ref    | PRIMARY       | PRIMARY      | 4       | const |    1 |   100.00 | Using index                        |
|    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL  | 1006 |   100.00 | Using join buffer (flat, BNL join) |
+------+--------------+-------------+--------+---------------+--------------+---------+-------+------+----------+------------------------------------+
4 rows in set, 1 warning (0.00 sec)
 
Note (Code 1003): select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t3` join `test`.`t2`) where `test`.`t3`.`c` = 27 and `test`.`t2`.`b` = `test`.`t3`.`b` limit 5

Comment by Alice Sherepa [ 2018-03-09 ]

mtr test:

CREATE TABLE t1 ( a int NOT NULL PRIMARY KEY);
 
CREATE TABLE t2 ( a int, b int );
INSERT INTO t2  VALUES (7878, 96),(3465, 96),(1403, 96),(4189, 96),(8732, 96), (5,96);
 
CREATE TABLE t3 (c int unsigned NOT NULL, b int unsigned NOT NULL, PRIMARY KEY (c,b));
INSERT INTO t3 (c, b) VALUES (27, 96);
 
DELIMITER $$;
CREATE PROCEDURE prepare_data()
BEGIN
  DECLARE i INT DEFAULT 1;
 
  WHILE i < 1000 DO
    INSERT INTO t1 (a) VALUES (i);
    INSERT INTO t2 (a,b) VALUES (i,56);
    INSERT INTO t3 (c,b) VALUES (i,i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;$$
 
CALL prepare_data();
 
SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27);
 
SET optimizer_switch='materialization=off';
 
SELECT t1.a FROM t1 
WHERE t1.a IN (SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)) LIMIT 5;
 
SET optimizer_switch='materialization=on';
 
SELECT t1.a FROM t1 
WHERE t1.a IN (SELECT t2.a FROM t2 WHERE t2.b IN (SELECT t3.b FROM t3 WHERE t3.c= 27)) LIMIT 5;
 
drop procedure prepare_data;
drop table t1,t2,t3;

Comment by Varun Gupta (Inactive) [ 2018-05-10 ]

The main issue here is the type difference in the subqueries where clause

t2.b is defined as INT
t3.b is defined as INT unsigned

One is signed and the other is unsigned , so we are not able to create multiple equalities for t2.b = t3.b.

The issue here is we are not having the condition t2.b= t3.b attached to the BNL join, so materialised table is containing way more rows than expected.

Comment by Varun Gupta (Inactive) [ 2018-05-10 ]

Patch
http://lists.askmonty.org/pipermail/commits/2018-May/012522.html

Comment by Sergei Petrunia [ 2018-07-25 ]

Review input provided over email. Ok to push after it is addressed.

Generated at Thu Feb 08 08:21:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.