[MDEV-13730] Too many CTEs limit Created: 2017-09-04 Updated: 2017-10-31 Resolved: 2017-09-10 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer - CTE |
| Affects Version/s: | 10.2 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | CARME Antoine | Assignee: | Igor Babaev |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
Hi, I am currently testing on a tool to translate machine learning models (built with scikit-learn python library) into a SQL code that can produce the model predictions when executed. The tool works by translating the model into a "sequence" of CTEs. It supports the main relational databases available on the market (sqlite, postgresql, MSSql, Oracle). Some demo is available here (https://github.com/antoinecarme/sklearn2sql-demo/) with sample SQL codes. More background here : I am filing this Jira after analyzing the SQL code quality for various databases. MariaDB seems to be the only database to not support more than 64 CTEs (error message 4003, 'Too many WITH elements in WITH clause'). My goal here is to add the possibility to control/increase the limit on the number of CTEs. The tool generates a sequence of CTEs based on the definition of the machine learning model. A decision tree for example is translated as a set of three CTEs. A rnadom forest is a set of decision trees whose number is defined by the end user model. It can then generate enough CTEs while still being understandable for the model builder. The rationale behind the choice of sequences of CTEs is clarity and debugging possibilities. The reason behind MariaDB not supporting more than 64 CTEs may be related to the same limitation for physical tables (to be confirmed by an expert). The CTEs I am generating are all semantically designed according to the machine learning model. The whole SQL code works on a single physical table and generates too many CTEs. CTEs are added only when needed. Is it possible to add the possibility to control the maximum number of CTEs in a select statement, or remove completely this limit which is the case for other databases that do not have it and run an equivalent SQL code in an acceptable time (all take less than 3 minutes according to the tests). One can find a sample SQL code here : This select statement is based on a unique table called iris , than can be defined this way:
Your help is welcome. Sorry for the long message. I tried to give a full bug report with the most details. Thanks in advance Antoine |
| Comments |
| Comment by Igor Babaev [ 2017-09-04 ] |
|
Antoine, |
| Comment by CARME Antoine [ 2017-09-04 ] |
|
Igor, Thanks a lot for the quick answer. 1. Unfortunately the limitation of 64 CTEs attached to a query cannot be lifted in the current code. There is no emergency in getting the fix in the current code. I don't want to disturb the valuable work of your team. Thanks already a lot for making MariaDB available. If the goal is to get rid completely of this limitation or set the limit high enough, I will wait. 2. Yet the specification of a CTE is another query that can itself contain up to 64 CTEs whose scope is the parent CTE though Interesting. Do you mean that the limit is recursively defined ? I mean, a CTE can contain refernces to 64 tables that contain other references ? I will test that. If this is something particular for MySQL implementation, I may probably use a dichotomic approach to solve this, To be more precise, when dealing with a random forest model (set of 64 trees, a tree = one or mote CTE), I can work on two or more sub-forests of 32 or 16 trees each ? Will the limitation apply to smaller forests ? All this inside a single select statement. 3. If your query refers to more than 64 CTEs then it refers to more 64 tables and you come the limitation for a query without CTEs. My query is referencing the same table (iris) more than 64 times. Don't know if the limitation is on the table or its reference. Help is welcome. I also do not understand why the limitation was set on ordinary tables (historical reasons). |
| Comment by Igor Babaev [ 2017-09-04 ] |
|
Antoine, |
| Comment by CARME Antoine [ 2017-09-04 ] |
|
Igor, Thanks a lot for this discussion. I now have a much better understanding of this issue. Changing the optimizer to support more than 64 CTEs is the best fix as it will not need any change of my side I am trying to see if I can isolate some SQL code that describes this issue, So far, I have an experiment where I can perform a change in the SQL code (using nested CTEs, With inside With I will share my results as soon as I have something. |
| Comment by CARME Antoine [ 2017-09-08 ] |
|
Igor, I am progressing and am now able to share with you some progress (see https://github.com/antoinecarme/sklearn2sql-demo/tree/master/tests/mariadb/MDEV-13730). I was able to reduce the total number of CTE by scope by nesting with clauses. I applied the same change to all databases and it works on postgresql and sqlite (oracle, db2 and mssql do not support nesting WITH clauses). For mysql, you can see the result by comparing the following SQL codes : The SQL codes are large (too many CTEs The first code executes with an error message 4003, 'Too many WITH elements in WITH clause' while the second , unfortunately, has a new glitch and outputs the error: 1146, "Table 'db.DT_node_lookup' doesn't exist" At first, the last error seems to be a CTE resolution issue when CTEs are defined inside CTEs. Your help is welcome. Thanks in advance Antoine |
| Comment by Igor Babaev [ 2017-09-10 ] |
|
This not a bug. This is a limitation of the current implementation of CTEs of the same kind as the limit of the number of tables used in a join, |
| Comment by CARME Antoine [ 2017-09-10 ] |
|
Thanks a lot Igor for looking at this. I understand that this can be a limitation. I don't know however if you are talking of the 4003 error or the new 1146 error. The last error is talking of a missing table while a CTE by that name exists. Regards Antoine |
| Comment by Igor Babaev [ 2017-09-10 ] |
|
Antoine, |
| Comment by CARME Antoine [ 2017-09-11 ] |
|
Thanks Igor. |