Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.2.4
    • 10.2.6
    • Optimizer - CTE
    • None
    • debian sid package on x64

    Description

      Dear all,

      I am currently working on a tool to translate machine learning models (built with scikit-learn python library) into a SQL code that can produce the model predictions when executed. The tool works by translating the model into a "sequence" of CTEs. It supports the main relational databases available on the market (sqlite, postgresql, MSSql, Oracle). Some demo is available here (https://github.com/antoinecarme/sklearn2sql-demo/) with sample SQL codes.

      MySQL used to not support CTEs. I see that MariaDB is adding support for CTEs and I will be glad to use it and help with examples. This is very good news!!!

      I am giving it a try. I use the debian sid (x64) 10.2.4 version and already read some the bug reports filed recently about CTEs.

      I use a set of automatic tests that run on all supported databases. I will only report when MariaDB is the only failing case.

      So far, a lot of the tested models generate SQL that executes OK on MariaDB which is great!! However, I still have some issues with some complex models (>10 CTEs), I can provide a list.

      For example, the AdaBoostClassifier models produce some SQL code that works on postgresql/oracle/MSSQL/SQlite but not on MariaDB. I get a message

      "Table \'db.AdaBoost_AVG_Scores\' doesn\'t exist"

      AdaBoost_AVG_Scores is a CTE defined in the query. It seems to be a lookup issue for CTEs. the SQL code is available here :
      https://github.com/antoinecarme/sklearn2sql-demo/blob/master/sample_outputs_round_4/AdaBoostClassifier/iris/mysql/demo1_AdaBoostClassifier_mysql.sql

      the SQL code is based on a table INPUT_DATA that can be defined this way :

      create table `INPUT_DATA` AS
      (
      select 1 as `KEY`, 0.0 as `Feature_0`, 1.0 as `Feature_1`, 2.0 as `Feature_2`, 3.0 as `Feature_3`
      )

      Your help is welcome.

      Sorry for the long message. I tried to give a full bug report with the most details.

      Thanks in advance

      Antoine

      Attachments

        Issue Links

          Activity

            igor Igor Babaev added a comment - - edited

            The following simple test case demonstrates this problem:

            create table t1 (a int, b  varchar(32));
            insert into t1 values
              (4,'aaaa' ), (7,'bb'), (1,'ccc'), (4,'dd');  
             
            with
            cte1 as (select * from t1 where  b >= 'c'),
            cte2 as (select * from cte1 where b < 'b'),
            cte3 as (select * from cte2 where a > 1 union select * from cte2 where a > 1)
            select * from ct3;
            
            

            As in the reported query the same CTE in used twice in another CTE and the server returns:

            MariaDB [test]> with
                -> cte1 as (select * from t1 where  b >= 'c'),
                -> cte2 as (select * from cte1 where b < 'b'),
                -> cte3 as (select * from cte2 where a > 1 union select * from cte2 where a > 1)
                -> select * from cte3;
            ERROR 1146 (42S02): Table 'test.cte1' doesn't exist
            

            Here's another query where the same CTE is used twice: In the main query and in a CTE.

            ariaDB [test]> with
                -> cte1 as (select * from t1 where  b >= 'c'),
                -> cte2 as (select * from cte1 where a < 7),
                -> cte3 as (select * from cte2 where a > 1)
                -> select * from cte2, cte3 where cte2.a = cte3.a;
            ERROR 1146 (42S02): Table 'test.cte1' doesn't exist
            

            igor Igor Babaev added a comment - - edited The following simple test case demonstrates this problem: create table t1 (a int, b varchar(32)); insert into t1 values (4,'aaaa' ), (7,'bb'), (1,'ccc'), (4,'dd');   with cte1 as (select * from t1 where b >= 'c'), cte2 as (select * from cte1 where b < 'b'), cte3 as (select * from cte2 where a > 1 union select * from cte2 where a > 1) select * from ct3; As in the reported query the same CTE in used twice in another CTE and the server returns: MariaDB [test]> with -> cte1 as (select * from t1 where b >= 'c'), -> cte2 as (select * from cte1 where b < 'b'), -> cte3 as (select * from cte2 where a > 1 union select * from cte2 where a > 1) -> select * from cte3; ERROR 1146 (42S02): Table 'test.cte1' doesn't exist Here's another query where the same CTE is used twice: In the main query and in a CTE. ariaDB [test]> with -> cte1 as (select * from t1 where b >= 'c'), -> cte2 as (select * from cte1 where a < 7), -> cte3 as (select * from cte2 where a > 1) -> select * from cte2, cte3 where cte2.a = cte3.a; ERROR 1146 (42S02): Table 'test.cte1' doesn't exist
            igor Igor Babaev added a comment -

            The fix for this bug was pushed into the 10.2 tree.

            igor Igor Babaev added a comment - The fix for this bug was pushed into the 10.2 tree.
            igor Igor Babaev added a comment -

            Hi Antoine,

            I executed your query after having applied my fix. It returned some result set, but I could not verify the result, because I did know what to expect.

            igor Igor Babaev added a comment - Hi Antoine, I executed your query after having applied my fix. It returned some result set, but I could not verify the result, because I did know what to expect.
            AntoineC CARME Antoine added a comment -

            YEEEES!

            That's excellent. Thanks a lot Igor Babaev.

            Can't wait or testing the 10.2.6 release !!!!

            AntoineC CARME Antoine added a comment - YEEEES! That's excellent. Thanks a lot Igor Babaev. Can't wait or testing the 10.2.6 release !!!!
            AntoineC CARME Antoine added a comment -

            Igor,

            I can do some tests and give you my feedback (in particular, I compare all supported databases result sets, should be the same).

            AntoineC CARME Antoine added a comment - Igor, I can do some tests and give you my feedback (in particular, I compare all supported databases result sets, should be the same).
            AntoineC CARME Antoine added a comment - - edited

            Igor,
            Just a message to tell you that I made some tests with 10.2.6 release, and this is a full success. Now I get a test report like this

            ??
            TEST_STATUS AdaBoostClassifier_('sqltm', 'sqlite://')_None success None
            TEST_STATUS AdaBoostClassifier_('mssql', 'mssql+pymssql://sa:Micro123@SklearnTestDB/db?charset=utf8&timeout=180')_None success None
            TEST_STATUS AdaBoostClassifier_('pgsql', 'postgresql://db:db@localhost/db?port=5432')_None success None
            TEST_STATUS AdaBoostClassifier_('mysql', 'mysql://db:db@localhost/db')_None success None
            TEST_STATUS AdaBoostClassifier_('oracle', 'oracle+cx_oracle://db:dba@xe')_None success None
            ??

            instead of this

            ??
            TEST_STATUS AdaBoostClassifier_('sqltm', 'sqlite://')_None success None
            TEST_STATUS AdaBoostClassifier_('mssql', 'mssql+pymssql://sa:Micro123@SklearnTestDB/db?charset=utf8')_None success None
            TEST_STATUS AdaBoostClassifier_('pgsql', 'postgresql://db:db@localhost/db?port=5432')_None success None
            TEST_STATUS AdaBoostClassifier_('mysql', 'mysql://db:db@localhost/db')_None failure ProgrammingError_mysql_exceptions.ProgrammingError) (1146, "Table 'db.AdaBoost_AVG_Scores' doesn't exist") [SQL: "WITH `DT_node_lookup` AS \n(SELECT `ADS`.`index` AS `index`, CASE WHEN (`ADS`.`XD_10` <= -0.751862764
            TEST_STATUS AdaBoostClassifier_('oracle', 'oracle+cx_oracle://db:db@xe')_None success None
            ??

            Good job, guys. You made a happy MariaDB user.

            Cheers
            Antoine

            AntoineC CARME Antoine added a comment - - edited Igor, Just a message to tell you that I made some tests with 10.2.6 release, and this is a full success. Now I get a test report like this ?? TEST_STATUS AdaBoostClassifier_('sqltm', 'sqlite://')_None success None TEST_STATUS AdaBoostClassifier_('mssql', 'mssql+pymssql://sa:Micro123@SklearnTestDB/db?charset=utf8&timeout=180')_None success None TEST_STATUS AdaBoostClassifier_('pgsql', 'postgresql://db:db@localhost/db?port=5432')_None success None TEST_STATUS AdaBoostClassifier_('mysql', 'mysql://db:db@localhost/db')_None success None TEST_STATUS AdaBoostClassifier_('oracle', 'oracle+cx_oracle://db:dba@xe')_None success None ?? instead of this ?? TEST_STATUS AdaBoostClassifier_('sqltm', 'sqlite://')_None success None TEST_STATUS AdaBoostClassifier_('mssql', 'mssql+pymssql://sa:Micro123@SklearnTestDB/db?charset=utf8')_None success None TEST_STATUS AdaBoostClassifier_('pgsql', 'postgresql://db:db@localhost/db?port=5432')_None success None TEST_STATUS AdaBoostClassifier_('mysql', 'mysql://db:db@localhost/db')_None failure ProgrammingError _mysql_exceptions.ProgrammingError) (1146, "Table 'db.AdaBoost_AVG_Scores' doesn't exist") [SQL: "WITH `DT_node_lookup` AS \n(SELECT `ADS`.`index` AS `index`, CASE WHEN (`ADS`.`XD_10` <= -0.751862764 TEST_STATUS AdaBoostClassifier_('oracle', 'oracle+cx_oracle://db:db@xe')_None success None ?? Good job, guys. You made a happy MariaDB user. Cheers Antoine

            People

              igor Igor Babaev
              AntoineC CARME Antoine
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.