[MCOL-4641] IN/NOT IN subquery returns incorrect results for some cases Created: 2021-03-26  Updated: 2023-12-15

Status: Open
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 5.5.1
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Gagan Goel (Inactive) Assignee: Roman
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Following queries involving IN/NOT IN subqueries are not producing the correct results in ColumnStore:

create table cs1 (a int)engine=columnstore;
insert into cs1 values (1), (2), (3), (4), (null);
 
create table cs2 (b int, c int)engine=columnstore;
insert into cs2 values (1, 100), (1, 101), (2, 200),
(3, 300), (3, 301), (3, 302), (null, null);
 
MariaDB [test]> select * from cs1 where hex(a*10) not in (select hex(b*10) from cs2);
+------+
| a    |
+------+
|    4 |
+------+
1 row in set (0.073 sec)
 
MariaDB [test]> select * from cs1 where a not in (select b from cs2 where b not in (select a from cs1 where a in (2,3)));
+------+
| a    |
+------+
|    2 |
|    3 |
|    4 |
| NULL |
+------+
4 rows in set (0.029 sec)
 
MariaDB [test]> select * from cs1 where a not in (select b from cs2 where b not in (select a from cs1 where a not in (2,3)));
Empty set (0.030 sec)
 
MariaDB [test]> select * from cs1 where a not in (select b from cs2 where b=3);
+------+
| a    |
+------+
|    1 |
|    2 |
|    4 |
| NULL |
+------+
4 rows in set (0.017 sec)
 
MariaDB [test]> select * from cs1 where a not in (select count(*) over (partition by b) from cs2 where b is null);
+------+
| a    |
+------+
|    2 |
|    3 |
|    4 |
| NULL |
+------+
4 rows in set (0.017 sec)
 
MariaDB [test]> select * from cs1 where a not in (select c from cs2 where b is not null);
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
| NULL |
+------+
5 rows in set (0.019 sec)
 
MariaDB [test]> select * from cs1 where (a+a) not in (select (b+b) from cs2 where b is not null);
+------+
| a    |
+------+
|    4 |
| NULL |
+------+
2 rows in set (0.024 sec)
 
MariaDB [test]> select * from cs1 where (a+1) not in (select b from cs2 where b is not null);
+------+
| a    |
+------+
|    3 |
|    4 |
| NULL |
+------+
3 rows in set (0.018 sec)
 
MariaDB [test]> select * from cs1 where a not in (select b from cs2 where b not in (select a from cs1 where a in (2,3) and a is not null) and b is not null);
+------+
| a    |
+------+
|    2 |
|    3 |
|    4 |
| NULL |
+------+
4 rows in set (0.034 sec)

The correct output for these queries should be as follows (note that cs1 and cs2 below are same tables as above, but in InnoDB):

MariaDB [test]> select * from cs1 where hex(a*10) not in (select hex(b*10) from cs2);
Empty set (0.001 sec)
 
MariaDB [test]> select * from cs1 where a not in (select b from cs2 where b not in (select a from cs1 where a in (2,3)));
+------+
| a    |
+------+
|    2 |
|    3 |
|    4 |
+------+
3 rows in set (0.001 sec)
 
MariaDB [test]> select * from cs1 where a not in (select b from cs2 where b not in (select a from cs1 where a not in (2,3)));
+------+
| a    |
+------+
|    1 |
|    4 |
+------+
2 rows in set (0.001 sec)
 
MariaDB [test]> select * from cs1 where a not in (select b from cs2 where b=3);
+------+
| a    |
+------+
|    1 |
|    2 |
|    4 |
+------+
3 rows in set (0.000 sec)
 
MariaDB [test]> select * from cs1 where a not in (select count(*) over (partition by b) from cs2 where b is null);
+------+
| a    |
+------+
|    2 |
|    3 |
|    4 |
+------+
3 rows in set (0.001 sec)
 
MariaDB [test]> select * from cs1 where a not in (select c from cs2 where b is not null);
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
+------+
4 rows in set (0.000 sec)
 
MariaDB [test]> select * from cs1 where (a+a) not in (select (b+b) from cs2 where b is not null);
+------+
| a    |
+------+
|    4 |
+------+
1 row in set (0.000 sec)
 
MariaDB [test]> select * from cs1 where (a+1) not in (select b from cs2 where b is not null);
+------+
| a    |
+------+
|    3 |
|    4 |
+------+
2 rows in set (0.000 sec)
 
MariaDB [test]> select * from cs1 where a not in (select b from cs2 where b not in (select a from cs1 where a in (2,3) and a is not null) and b is not null);
+------+
| a    |
+------+
|    2 |
|    3 |
|    4 |
+------+
3 rows in set (0.001 sec)



 Comments   
Comment by Roman [ 2021-04-14 ]

The case

select * from cs1 where hex(a*10) not in (select hex(b*10) from cs2);

looks like a bug in MDB b/c hex(40) doesn't exist in the subquery.

Generated at Thu Feb 08 02:51:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.