Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35357

Non-recursive CTE is materialized as many times as it is used

    XMLWordPrintable

Details

    Description

      Consider a CTE that is not recursive, not mergeable, and used multiple times in the query:

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table t10 (a int);
      insert into t10 select T.a from ten T, ten T2, ten T3;
      

      explain
      with T as (
      select distinct a from t10
      )
      select * from T as T1, T as T2 where T1.a+T2.a>1000;
      

      EXPLAIN shows that each use of the CTE is completely independent, as if one has copy-pasted its definition:

      id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
      1       PRIMARY <derived2>      ALL     NULL    NULL    NULL    NULL    1000
      1       PRIMARY <derived3>      ALL     NULL    NULL    NULL    NULL    1000    Using where; Using join buffer (flat, BNL join)
      3       DERIVED t10     ALL     NULL    NULL    NULL    NULL    1000    Using temporary
      2       DERIVED t10     ALL     NULL    NULL    NULL    NULL    1000    Using temporary
      

      This can cause poor performance when the CTE in question costs a lot to compute.

      Note that MySQL would re-use the CTE in this case:

      +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
      | id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
      +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
      |  1 | PRIMARY     | <derived2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1000 |   100.00 | NULL                                       |
      |  1 | PRIMARY     | <derived2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1000 |   100.00 | Using where; Using join buffer (hash join) |
      |  2 | DERIVED     | t10        | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1000 |   100.00 | Using temporary                            |
      +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
      

      It's interesting that if one defines a VIEW instead of the CTE and uses it multiple times, both MariaDB and MySQL will materialize the view as many times as it is used.

      One can say that sharing the CTE may interfere with other optimizations, like derived condition pushdown or derived_with_keys, or Split-Materialized.

      A suggestion off the top of the head: if neither of these optimizations were employed, can we still re-use the CTE?

      Attachments

        Activity

          People

            Unassigned Unassigned
            psergei Sergei Petrunia
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.