[MCOL-4505] Regression: DISTINCT with sub query gets wrong results Created: 2021-01-19  Updated: 2021-01-25  Resolved: 2021-01-25

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 5.5.2
Fix Version/s: 5.5.2

Type: Bug Priority: Blocker
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MCOL-4455 "SELECT DISTINCT col FROM table LIMIT... Closed
Sprint: 2021-1, 2021-2

 Description   

See regression/mysql/queries/working_tpch1_compareLogOnly/view/mts_view.sql:

CREATE TABLE t1 (lid int, name char(10)) engine=columnstore;
INSERT INTO t1 (lid, name) VALUES (1, 'YES'), (2, 'NO');
 
CREATE TABLE t2 (  id int, gid int, lid int, dt date) engine=columnstore;
INSERT INTO t2 (id, gid, lid, dt) VALUES
 (1, 1, 1, '2007-01-01'),(2, 1, 2, '2007-01-02'),
 (3, 2, 2, '2007-02-01'),(4, 2, 1, '2007-02-02');
 
SELECT DISTINCT t2.gid,(SELECT t1.name FROM t1, t2 WHERE t1.lid  = t2.lid AND t2.gid = t1.lid and t1.lid =1 ) as clid FROM t2 order by 1,2;

+------+------+
| gid  | clid |
+------+------+
|    1 | YES  |
|    2 | YES  |
+------+------+

However, if you strip off the order by, you get

SELECT DISTINCT t2.gid,(SELECT t1.name FROM t1, t2 WHERE t1.lid  = t2.lid AND t2.gid = t1.lid and t1.lid =1 ) as clid FROM t2;

+------+------+
| gid  | clid |
+------+------+
|    1 | YES  |
|    1 | YES  |
|    2 | YES  |
|    2 | YES  |
+------+------+

There should only be two rows returned



 Comments   
Comment by David Hall (Inactive) [ 2021-01-19 ]

This bug is introduced by commit 0181c6f122865e5bf68b14f8faadc4fe34cdcdc3
PR – MCOL-4455 Fix ignoring LIMIT offset

Comment by Alexander Barkov [ 2021-01-19 ]

The problem is also repeatable with the INT data type instead of CHAR(10):

DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
CREATE TABLE t1 (lid int, name INT) engine=columnstore;
INSERT INTO t1 (lid, name) VALUES (1, 1000), (2, 1001);
CREATE TABLE t2 (  id int, gid int, lid int, dt date) engine=columnstore;
INSERT INTO t2 (id, gid, lid, dt) VALUES
 (1, 1, 1, '2007-01-01'),(2, 1, 2, '2007-01-02'),
 (3, 2, 2, '2007-02-01'),(4, 2, 1, '2007-02-02');
SELECT DISTINCT t2.gid,(SELECT t1.name FROM t1, t2 WHERE t1.lid  = t2.lid AND t2.gid = t1.lid and t1.lid =1 ) as clid FROM t2 order by 1,2;

+------+------+
| gid  | clid |
+------+------+
|    1 | 1000 |
|    2 | 1000 |
+------+------+

SELECT DISTINCT t2.gid,(SELECT t1.name FROM t1, t2 WHERE t1.lid  = t2.lid AND t2.gid = t1.lid and t1.lid =1 ) as clid   FROM t2;

+------+------+
| gid  | clid |
+------+------+
|    1 | 1000 |
|    1 | 1000 |
|    2 | 1000 |
|    2 | 1000 |
+------+------+

Comment by Daniel Lee (Inactive) [ 2021-01-25 ]

Build verified: 5.5.2 (b1552), develop b1546

The fixed also checked into the develop branch

Generated at Thu Feb 08 02:50:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.