[MCOL-4776] Create initial architectural component for query rewrite and implement 2 use cases Created: 2021-06-24  Updated: 2023-07-01

Status: Open
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 23.10

Type: Task Priority: Major
Reporter: Gregory Dorman (Inactive) Assignee: Roman
Resolution: Unresolved Votes: 0
Labels: tech_debt


 Description   

We appear to lack a usual component of database optimization - an ability to internally transform (rewrite) queries into the shape best suited for the specifics of the database engine. Without it, a number of commercially critical query shapes cannot be executed in Columnstore, including 2 from TPC-H (3 queries fail to parse as a result).

Type 1. Certain correlated subqueries.

MariaDB [dbt3]> drop table if exists t1,t2;
Query OK, 0 rows affected (0.778 sec)
 
MariaDB [dbt3]> create table t1 (i int) engine=columnstore;
Query OK, 0 rows affected (0.667 sec)
 
MariaDB [dbt3]> create table t2 (i int) engine=columnstore;
Query OK, 0 rows affected (0.640 sec)
 
-- case A
 
MariaDB [dbt3]> select count(*) from t1, t2 where t1.i=t2.i and t2.j = (select min(j) from t2 where t1.i=i);
ERROR 1815 (HY000): Internal error: IDB-3012: Scalar filter and semi join are not from the same pair of tables.
 
case B
 
MariaDB [dbt3]> select count(*) from t1, t2 where t1.i=t2.i and t2.j in (select j from t2 where t1.i=i);
ERROR 1815 (HY000): Internal error: IDB-1003: Circular joins are not supported.

Fundamentally, this would require an internal rewrite procedure usually called de-correlation:

 
-- case A
 
MariaDB [dbt3]> select count(*) from t1, t2,(select min(j) j, i from t2 group by i) t3  where t1.i=t2.i and t2.i = t3.i and t2.j=t3.j;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.052 sec)
 
-- case B
 
MariaDB [dbt3]> select count(*) from t1, t2 where t1.i=t2.i and exists (select i,j from t2 where t2.i = i and t2.j=j) ;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.068 sec)

Limited Case - TPC-H Query 19

MariaDB [dbt3]> select count(*) from t1, t2 where (t1.i=t2.i and t2.j=1) or (t1.i=t2.i and t2.j=2);
ERROR 1815 (HY000): Internal error: IDB-1000: 't1' and 't2' are not joined.

The pattern here is somewhat specific: a bunch of OR clauses with the join predicate(s) repeated in each. Rewriter should pull common predicates out...

CREATE TABLE a (
id integer,
x integer,
p integer
);
 
CREATE TABLE b (
id integer,
x integer,
q integer
);
 
INSERT INTO a (x, p) VALUES
(1, 1),
(1, 1),
(2, 1);
 
INSERT INTO b (x, q) VALUES
(1, 3),
(2, 3);
 
SELECT a.x, a.p, b.q FROM a, b WHERE (a.x = b.x and a.p = 1) OR (a.x = b.x and b.q = 3);


Generated at Thu Feb 08 02:52:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.