[MCOL-3448] Wrong results for (col1, col2) in (val1, val2) queries Created: 2019-08-22  Updated: 2021-04-19  Resolved: 2019-11-18

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr, MDB Plugin
Affects Version/s: 1.2.5, 1.4.1
Fix Version/s: 1.2.6, 1.4.1

Type: Bug Priority: Major
Reporter: Valerii Kravchuk Assignee: Bharath Bokka (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MCOL-3612 Merge develop-1.2 into develop Closed
Sprint: 2019-06

 Description   

Consider the following simple table:

openxs@ao756:~$ mcsmysql -uroot test
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 19
Server version: 10.3.16-MariaDB-log Columnstore 1.2.5-1
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [test]> create table dim_date(semaine varchar(10), date_jour date, val int) engine=columnstore;
Query OK, 0 rows affected (5,804 sec)
 
MariaDB [test]> insert into dim_date values ('2019-S08','2019-02-18',1);        
Query OK, 1 row affected (4,658 sec)
 
MariaDB [test]> insert into dim_date values ('2019-S08','2019-02-19',2);
Query OK, 1 row affected (0,845 sec)
 
MariaDB [test]> insert into dim_date values ('2019-S08','2019-02-20',3);
Query OK, 1 row affected (0,770 sec)
 
MariaDB [test]> insert into dim_date values ('2019-S07','2019-02-15',0);
Query OK, 1 row affected (1,028 sec)
 
MariaDB [test]> insert into dim_date values ('2019-S07','2019-02-14',5);
Query OK, 1 row affected (0,926 sec)
 
MariaDB [test]> insert into dim_date values ('2019-S01','2019-01-01',5);
Query OK, 1 row affected (1,056 sec)
 
MariaDB [test]> SELECT *
    -> FROM dim_date
    -> WHERE
    -> SEMAINE IN ('2019-S07')
    -> OR
    -> (
    -> SEMAINE = '2019-S08'
    -> AND DATE_JOUR IN ('2019-02-18','2019-02-19')
    -> ) ;
+----------+------------+------+
| semaine  | date_jour  | val  |
+----------+------------+------+
| 2019-S08 | 2019-02-18 |    1 |
| 2019-S08 | 2019-02-19 |    2 |
| 2019-S07 | 2019-02-15 |    0 |
| 2019-S07 | 2019-02-14 |    5 |
+----------+------------+------+
4 rows in set (3,374 sec)

The query returns expected results. Now, consider this semantically equivalent (supposedly) query that compares tuples:

MariaDB [test]> SELECT *
    -> FROM dim_date
    -> WHERE
    -> SEMAINE IN ('2019-S07')
    -> OR
    -> (SEMAINE,DATE_JOUR) IN
    -> (
    -> ('2019-S08','2019-02-18'),
    -> ('2019-S08','2019-02-19')
    -> )
    -> ORDER BY 1;
+----------+------------+------+
| semaine  | date_jour  | val  |
+----------+------------+------+
| 2019-S07 | 2019-02-15 |    0 |
| 2019-S07 | 2019-02-14 |    5 |
+----------+------------+------+
2 rows in set (0,658 sec)
 
MariaDB [test]> SELECT * FROM dim_date WHERE SEMAINE IN ('2019-S07') OR (SEMAINE,DATE_JOUR) IN ( ('2019-S08','2019-02-18'), ('2019-S08','2019-02-19') );
+----------+------------+------+
| semaine  | date_jour  | val  |
+----------+------------+------+
| 2019-S07 | 2019-02-15 |    0 |
| 2019-S07 | 2019-02-14 |    5 |
+----------+------------+------+
2 rows in set (0,432 sec)
 
MariaDB [test]> show variables like '%vtable%';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| infinidb_vtable_mode | 1     |
+----------------------+-------+
1 row in set (0,002 sec)
 
MariaDB [test]> set infinidb_vtable_mode = 0;
Query OK, 0 rows affected (0,000 sec)
 
MariaDB [test]> SELECT * FROM dim_date WHERE SEMAINE IN ('2019-S07') OR (SEMAINE,DATE_JOUR) IN ( ('2019-S08','2019-02-18'), ('2019-S08','2019-02-19') );
+----------+------------+------+
| semaine  | date_jour  | val  |
+----------+------------+------+
| 2019-S07 | 2019-02-15 |    0 |
| 2019-S07 | 2019-02-14 |    5 |
+----------+------------+------+
2 rows in set (0,106 sec)

As you can see removing ORDER BY or changing infinidb_vtable_mode does not help.

For InnoDB table with the same data we get correct, expected result:

MariaDB [test]> create table dim_date_inno(semaine varchar(10), date_jour date, val int) engine=innodb;
Query OK, 0 rows affected (1,802 sec)
 
MariaDB [test]> insert into dim_date_inno select * from dim_date;
Query OK, 6 rows affected (1,367 sec)
Records: 6  Duplicates: 0  Warnings: 0
 
MariaDB [test]> SELECT * FROM dim_date_inno WHERE SEMAINE IN ('2019-S07') OR (SEMAINE,DATE_JOUR) IN ( ('2019-S08','2019-02-18'), ('2019-S08','2019-02-19') );
+----------+------------+------+
| semaine  | date_jour  | val  |
+----------+------------+------+
| 2019-S08 | 2019-02-18 |    1 |
| 2019-S08 | 2019-02-19 |    2 |
| 2019-S07 | 2019-02-15 |    0 |
| 2019-S07 | 2019-02-14 |    5 |
+----------+------------+------+
4 rows in set (0,001 sec)



 Comments   
Comment by David Hall (Inactive) [ 2019-11-05 ]

The error appears to be associated with the OR in combination with a row-based IN.

A row based IN by itself works:
MariaDB [dhall]> SELECT * FROM dim_date WHERE (SEMAINE,DATE_JOUR) IN (('2019-S08','2019-02-18'),('2019-S08','2019-02-19'));
------------------------

semaine date_jour val

------------------------

2019-S08 2019-02-18 1
2019-S08 2019-02-19 2

------------------------
2 rows in set (2.434 sec)

OR with either of the columns from the ROW based IN works as well:
MariaDB [dhall]> SELECT * FROM dim_date WHERE (DATE_JOUR) IN ('2019-02-18','2019-02-19') OR SEMAINE IN ('2019-S07');
------------------------

semaine date_jour val

------------------------

2019-S08 2019-02-18 1
2019-S08 2019-02-19 2
2019-S07 2019-02-15 0
2019-S07 2019-02-14 5

------------------------
4 rows in set (4.458 sec)

MariaDB [dhall]> SELECT * FROM dim_date WHERE (SEMAINE) IN ('2019-S08','2019-S08') OR SEMAINE IN ('2019-S07');
------------------------

semaine date_jour val

------------------------

2019-S08 2019-02-18 1
2019-S08 2019-02-19 2
2019-S08 2019-02-20 3
2019-S07 2019-02-15 0
2019-S07 2019-02-14 5

------------------------
5 rows in set (3.179 sec)

But combine OR with ROW based IN and we get a wrong answer:
MariaDB [dhall]> SELECT * FROM dim_date WHERE (SEMAINE,DATE_JOUR) IN (('2019-S08','2019-02-18'),('2019-S08','2019-02-19')) OR SEMAINE IN ('2019-S07');
------------------------

semaine date_jour val

------------------------

2019-S07 2019-02-15 0
2019-S07 2019-02-14 5

------------------------
2 rows in set (2.555 sec)

Comment by David Hall (Inactive) [ 2019-11-05 ]

The problem turns out that the wrong datatype was being grabbed for one of the resulting PredicateOperators. The reason it doesn't break when there is no OR is that, in that case, the particular predicate in question is optimized into a join, which follows a different path.

Corrected the datatype.

Comment by David Hall (Inactive) [ 2019-11-05 ]

regression test #159

Comment by Roman [ 2019-11-06 ]

Plz see the comments on github.

Comment by Bharath Bokka (Inactive) [ 2019-11-13 ]

Build tested-
1.2.6-1
[bbokka@cs-tst-02 centos7]$ cat gitversionInfo.txt
engine commit:
30a15a1

The query which failed in the description works fine on the above build.

MariaDB [test]> SELECT * FROM dim_date WHERE SEMAINE IN ('2019-S07') OR (SEMAINE,DATE_JOUR) IN ( ('2019-S08','2019-02-18'), ('2019-S08','2019-02-19') );
------------------------

semaine date_jour val

------------------------

2019-S08 2019-02-18 1
2019-S08 2019-02-19 2
2019-S07 2019-02-15 0
2019-S07 2019-02-14 5

------------------------
4 rows in set (0.103 sec)

Also tested on,
Build-
1.4.1
[bbokka@cs-tst-02 centos7]$ cat gitversionInfo.txt
engine commit:
dea1dec

This issue isn't fixed on the latest 1.4.1. I think we should reopen, please suggest.

Comment by Roman [ 2019-11-13 ]

That's true. The fix was pushed into 1.2 and will be upmerged into 1.4 before we give 1.4.2 to QA.

Comment by Daniel Lee (Inactive) [ 2019-11-18 ]

Build verfied: 1.4.1-1 github source

Server
commit 77a245fe5658b8d6d937620586ecd802b3432a78
Author: Marko Mäkelä <marko.makela@mariadb.com>
Date: Sun Nov 17 20:04:11 2019 +0200

Engine
commit 7c6a086cfb54b8bbd500efb41f34c9fa1ed03ca1
Merge: f291d88 1a94d53
Author: Roman Nozdrin <drrtuy@gmail.com>
Date: Mon Nov 18 12:11:30 2019 +0300

Generated at Thu Feb 08 02:42:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.