Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4776

Create initial architectural component for query rewrite and implement 2 use cases

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • 23.10
    • None

    Description

      We appear to lack a usual component of database optimization - an ability to internally transform (rewrite) queries into the shape best suited for the specifics of the database engine. Without it, a number of commercially critical query shapes cannot be executed in Columnstore, including 2 from TPC-H (3 queries fail to parse as a result).

      Type 1. Certain correlated subqueries.

      MariaDB [dbt3]> drop table if exists t1,t2;
      Query OK, 0 rows affected (0.778 sec)
       
      MariaDB [dbt3]> create table t1 (i int) engine=columnstore;
      Query OK, 0 rows affected (0.667 sec)
       
      MariaDB [dbt3]> create table t2 (i int) engine=columnstore;
      Query OK, 0 rows affected (0.640 sec)
       
      -- case A
       
      MariaDB [dbt3]> select count(*) from t1, t2 where t1.i=t2.i and t2.j = (select min(j) from t2 where t1.i=i);
      ERROR 1815 (HY000): Internal error: IDB-3012: Scalar filter and semi join are not from the same pair of tables.
       
      case B
       
      MariaDB [dbt3]> select count(*) from t1, t2 where t1.i=t2.i and t2.j in (select j from t2 where t1.i=i);
      ERROR 1815 (HY000): Internal error: IDB-1003: Circular joins are not supported.
      

      Fundamentally, this would require an internal rewrite procedure usually called de-correlation:

       
      -- case A
       
      MariaDB [dbt3]> select count(*) from t1, t2,(select min(j) j, i from t2 group by i) t3  where t1.i=t2.i and t2.i = t3.i and t2.j=t3.j;
      +----------+
      | count(*) |
      +----------+
      |        0 |
      +----------+
      1 row in set (0.052 sec)
       
      -- case B
       
      MariaDB [dbt3]> select count(*) from t1, t2 where t1.i=t2.i and exists (select i,j from t2 where t2.i = i and t2.j=j) ;
      +----------+
      | count(*) |
      +----------+
      |        0 |
      +----------+
      1 row in set (0.068 sec)
      
      

      Limited Case - TPC-H Query 19

      MariaDB [dbt3]> select count(*) from t1, t2 where (t1.i=t2.i and t2.j=1) or (t1.i=t2.i and t2.j=2);
      ERROR 1815 (HY000): Internal error: IDB-1000: 't1' and 't2' are not joined.
      

      The pattern here is somewhat specific: a bunch of OR clauses with the join predicate(s) repeated in each. Rewriter should pull common predicates out...

      CREATE TABLE a (
      id integer,
      x integer,
      p integer
      );
       
      CREATE TABLE b (
      id integer,
      x integer,
      q integer
      );
       
      INSERT INTO a (x, p) VALUES
      (1, 1),
      (1, 1),
      (2, 1);
       
      INSERT INTO b (x, q) VALUES
      (1, 3),
      (2, 3);
       
      SELECT a.x, a.p, b.q FROM a, b WHERE (a.x = b.x and a.p = 1) OR (a.x = b.x and b.q = 3);
      
      

      Attachments

        Activity

          People

            drrtuy Roman
            gdorman Gregory Dorman (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.