Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13731

Too many CTEs limit

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • None
    • N/A
    • Optimizer - CTE
    • None

    Description

      Hi,

      I am currently testing on a tool to translate machine learning models (built with scikit-learn python library) into a SQL code that can produce the model predictions when executed. The tool works by translating the model into a "sequence" of CTEs. It supports the main relational databases available on the market (sqlite, postgresql, MSSql, Oracle). Some demo is available here (https://github.com/antoinecarme/sklearn2sql-demo/) with sample SQL codes.

      More background here : https://jira.mariadb.org/browse/MDEV-12440 where a previous issue was successfully fixed.

      I am filing this Jira after analyzing the SQL code quality for various databases. MariaDB seems to be the only database to not support more than 64 CTEs (error message 4003, 'Too many WITH elements in WITH clause'). My goal here is to add the possibility to control/increase the limit on the number of CTEs.

      The tool generates a sequence of CTEs based on the definition of the machine learning model. A decision tree for example is translated as a set of three CTEs. A rnadom forest is a set of decision trees whose number is defined by the end user model. It can then generate enough CTEs while still being understandable for the model builder.

      The rationale behind the choice of sequences of CTEs is clarity and debugging possibilities.

      The reason behind MariaDB not supporting more than 64 CTEs may be related to the same limitation for physical tables (to be confirmed by an expert). The CTEs I am generating are all semantically designed according to the machine learning model. The whole SQL code works on a single physical table and generates too many CTEs. CTEs are added only when needed.

      Is it possible to add the possibility to control the maximum number of CTEs in a select statement, or remove completely this limit which is the case for other databases that do not have it and run an equivalent SQL code in an acceptable time (all take less than 3 minutes according to the tests).

      One can find a sample SQL code here :
      https://raw.githubusercontent.com/antoinecarme/sklearn2sql-demo/master/sample_outputs_tuning_round_1/GradientBoostingClassifier/iris/mysql/demo2_GradientBoostingClassifier_mysql.sql

      This select statement is based on a unique table called iris , than can be defined this way:

      create table `iris` AS
      (
      select 1 as `KEY`, 0.0 as `Feature_0`, 1.0 as `Feature_1`, 2.0 as `Feature_2`, 3.0 as `Feature_3`
      )

      Your help is welcome.

      Sorry for the long message. I tried to give a full bug report with the most details.

      Thanks in advance

      Antoine

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              AntoineC CARME Antoine
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.