Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.2(EOL)
    • N/A
    • Optimizer - CTE
    • None

    Description

      Hi,

      I am currently testing on a tool to translate machine learning models (built with scikit-learn python library) into a SQL code that can produce the model predictions when executed. The tool works by translating the model into a "sequence" of CTEs. It supports the main relational databases available on the market (sqlite, postgresql, MSSql, Oracle). Some demo is available here (https://github.com/antoinecarme/sklearn2sql-demo/) with sample SQL codes.

      More background here : MDEV-12440 where a previous issue was successfully fixed.

      I am filing this Jira after analyzing the SQL code quality for various databases. MariaDB seems to be the only database to not support more than 64 CTEs (error message 4003, 'Too many WITH elements in WITH clause'). My goal here is to add the possibility to control/increase the limit on the number of CTEs.

      The tool generates a sequence of CTEs based on the definition of the machine learning model. A decision tree for example is translated as a set of three CTEs. A rnadom forest is a set of decision trees whose number is defined by the end user model. It can then generate enough CTEs while still being understandable for the model builder.

      The rationale behind the choice of sequences of CTEs is clarity and debugging possibilities.

      The reason behind MariaDB not supporting more than 64 CTEs may be related to the same limitation for physical tables (to be confirmed by an expert). The CTEs I am generating are all semantically designed according to the machine learning model. The whole SQL code works on a single physical table and generates too many CTEs. CTEs are added only when needed.

      Is it possible to add the possibility to control the maximum number of CTEs in a select statement, or remove completely this limit which is the case for other databases that do not have it and run an equivalent SQL code in an acceptable time (all take less than 3 minutes according to the tests).

      One can find a sample SQL code here :
      https://raw.githubusercontent.com/antoinecarme/sklearn2sql-demo/master/sample_outputs_tuning_round_1/GradientBoostingClassifier/iris/mysql/demo2_GradientBoostingClassifier_mysql.sql

      This select statement is based on a unique table called iris , than can be defined this way:

      create table `iris` AS
      (
      select 1 as `KEY`, 0.0 as `Feature_0`, 1.0 as `Feature_1`, 2.0 as `Feature_2`, 3.0 as `Feature_3`
      )
      

      Your help is welcome.

      Sorry for the long message. I tried to give a full bug report with the most details.

      Thanks in advance

      Antoine

      Attachments

        Issue Links

          Activity

            AntoineC CARME Antoine added a comment - - edited

            Igor,

            I am progressing and am now able to share with you some progress (see https://github.com/antoinecarme/sklearn2sql-demo/tree/master/tests/mariadb/MDEV-13730).

            I was able to reduce the total number of CTE by scope by nesting with clauses. I applied the same change to all databases and it works on postgresql and sqlite (oracle, db2 and mssql do not support nesting WITH clauses).

            For mysql, you can see the result by comparing the following SQL codes :

            https://raw.githubusercontent.com/antoinecarme/sklearn2sql-demo/master/tests/mariadb/MDEV-13730/old/GradientBoostingClassifier/iris/mysql/demo2_GradientBoostingClassifier_mysql.sql

            https://raw.githubusercontent.com/antoinecarme/sklearn2sql-demo/master/tests/mariadb/MDEV-13730/new/GradientBoostingClassifier/iris/mysql/demo2_GradientBoostingClassifier_mysql.sql

            The SQL codes are large (too many CTEs but the code is clear enough to make it easy to identify the change points (with a graphical diff).

            The first code executes with an error message

            4003, 'Too many WITH elements in WITH clause'

            while the second , unfortunately, has a new glitch and outputs the error:

            1146, "Table 'db.DT_node_lookup' doesn't exist"

            At first, the last error seems to be a CTE resolution issue when CTEs are defined inside CTEs.

            Your help is welcome.

            Thanks in advance

            Antoine

            AntoineC CARME Antoine added a comment - - edited Igor, I am progressing and am now able to share with you some progress (see https://github.com/antoinecarme/sklearn2sql-demo/tree/master/tests/mariadb/MDEV-13730 ). I was able to reduce the total number of CTE by scope by nesting with clauses. I applied the same change to all databases and it works on postgresql and sqlite (oracle, db2 and mssql do not support nesting WITH clauses). For mysql, you can see the result by comparing the following SQL codes : https://raw.githubusercontent.com/antoinecarme/sklearn2sql-demo/master/tests/mariadb/MDEV-13730/old/GradientBoostingClassifier/iris/mysql/demo2_GradientBoostingClassifier_mysql.sql https://raw.githubusercontent.com/antoinecarme/sklearn2sql-demo/master/tests/mariadb/MDEV-13730/new/GradientBoostingClassifier/iris/mysql/demo2_GradientBoostingClassifier_mysql.sql The SQL codes are large (too many CTEs but the code is clear enough to make it easy to identify the change points (with a graphical diff). The first code executes with an error message 4003, 'Too many WITH elements in WITH clause' while the second , unfortunately, has a new glitch and outputs the error: 1146, "Table 'db.DT_node_lookup' doesn't exist" At first, the last error seems to be a CTE resolution issue when CTEs are defined inside CTEs. Your help is welcome. Thanks in advance Antoine

            This not a bug. This is a limitation of the current implementation of CTEs of the same kind as the limit of the number of tables used in a join,

            igor Igor Babaev (Inactive) added a comment - This not a bug. This is a limitation of the current implementation of CTEs of the same kind as the limit of the number of tables used in a join,
            AntoineC CARME Antoine added a comment -

            Thanks a lot Igor for looking at this. I understand that this can be a limitation.

            I don't know however if you are talking of the 4003 error or the new 1146 error.

            The last error is talking of a missing table while a CTE by that name exists.

            Regards

            Antoine

            AntoineC CARME Antoine added a comment - Thanks a lot Igor for looking at this. I understand that this can be a limitation. I don't know however if you are talking of the 4003 error or the new 1146 error. The last error is talking of a missing table while a CTE by that name exists. Regards Antoine

            Antoine,
            Could you please report the second problem in a separate entry?

            igor Igor Babaev (Inactive) added a comment - Antoine, Could you please report the second problem in a separate entry?
            AntoineC CARME Antoine added a comment - Thanks Igor. Added https://jira.mariadb.org/browse/MDEV-13780

            People

              igor Igor Babaev (Inactive)
              AntoineC CARME Antoine
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.