Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23886

Reusing CTE inside a function fails with table doesn't exist

Details

    Description

      When I try to use a single CTE multiple times inside a stored function, the function fails with
      ERROR 1146 (42S02): Table 'test.testtab' doesn't exist

      DROP TABLE IF EXISTS testtab;
      CREATE TABLE testtab (c1 int(11) );
      INSERT INTO testtab VALUES (1),(2),(6);
       
      DROP FUNCTION IF EXISTS TESTFUN;
      DELIMITER |
      CREATE FUNCTION TESTFUN() RETURNS INT DETERMINISTIC BEGIN
      RETURN (
          WITH
              cte1 as (select c1 from testtab)
      -- here
          SELECT SUM(c1) FROM ( SELECT * FROM cte1 UNION ALL SELECT * FROM cte1 ) as u1
      );
      END|
      DELIMITER ;
      SELECT TESTFUN();

      Returns -> ERROR 1146 (42S02): Table 'test.testtab' doesn't exist
      Expected -> 18

      Running the same query without wrapping it in a function works fine:

          WITH
              cte1 as (select c1 from testtab)
      -- here
          SELECT SUM(c1) FROM ( SELECT * FROM cte1 UNION ALL SELECT * FROM cte1 ) as u1;
      

      Returns -> 18

      The same error occurs when:

      • CTE is self-joined in a query - inside a stored function
      • multiple branches of a CASE ... END use it in tests: WHEN EXISTS (SELECT NULL FROM CTE WHERE X=Y) THEN "result 1" WHEN EXISTS ( SELECT NULL FROM CTE ) THEN "result 2" END - inside a stored function

      Attachments

        Issue Links

          Activity

            szatox Szatox created issue -
            szatox Szatox made changes -
            Field Original Value New Value
            Description When I try to use a single CTE multiple times inside a stored function, the function fails with
            ERROR 1146 (42S02): Table 'test.testtab' doesn't exist

            {code}DROP TABLE IF EXISTS testtab;
            CREATE TABLE testtab (c1 int(11) );
            INSERT INTO testtab VALUES (1),(2),(6);
             
            DROP FUNCTION IF EXISTS TESTFUN;
            DELIMITER |
            CREATE FUNCTION TESTFUN() RETURNS INT DETERMINISTIC BEGIN
            RETURN (
                WITH
                    cte1 as (select c1 from testtab)
            -- here
                SELECT SUM(c1) FROM ( SELECT * FROM cte1 UNION ALL SELECT * FROM cte1 ) as u1
            );
            END|
            DELIMITER ;
            SELECT TESTFUN();{code}
            Returns -> ERROR 1146 (42S02): Table 'test.testtab' doesn't exist
            Expected -> 18

            Running the same query _without_ wrapping it in a function works fine:
            {code} WITH
                    cte1 as (select c1 from testtab)
            -- here
                SELECT SUM(c1) FROM ( SELECT * FROM cte1 UNION ALL SELECT * FROM cte1 ) as u1;
            {code}
            Returns -> 18
            When I try to use a single CTE multiple times inside a stored function, the function fails with
            ERROR 1146 (42S02): Table 'test.testtab' doesn't exist

            {code}DROP TABLE IF EXISTS testtab;
            CREATE TABLE testtab (c1 int(11) );
            INSERT INTO testtab VALUES (1),(2),(6);
             
            DROP FUNCTION IF EXISTS TESTFUN;
            DELIMITER |
            CREATE FUNCTION TESTFUN() RETURNS INT DETERMINISTIC BEGIN
            RETURN (
                WITH
                    cte1 as (select c1 from testtab)
            -- here
                SELECT SUM(c1) FROM ( SELECT * FROM cte1 UNION ALL SELECT * FROM cte1 ) as u1
            );
            END|
            DELIMITER ;
            SELECT TESTFUN();{code}
            Returns -> ERROR 1146 (42S02): Table 'test.testtab' doesn't exist
            Expected -> 18

            Running the same query _without_ wrapping it in a function works fine:
            {code} WITH
                    cte1 as (select c1 from testtab)
            -- here
                SELECT SUM(c1) FROM ( SELECT * FROM cte1 UNION ALL SELECT * FROM cte1 ) as u1;
            {code}
            Returns -> 18


            The same error occurs when:
            * CTE is self-joined in a query - inside a stored function
            * multiple branches of a CASE ... END use it in tests: WHEN EXISTS (SELECT NULL FROM CTE WHERE X=Y) THEN "result 1" WHEN EXISTS ( SELECT NULL FROM CTE ) THEN "result 2" END - inside a stored function
            alice Alice Sherepa made changes -
            Component/s Optimizer - CTE [ 13513 ]
            alice Alice Sherepa made changes -
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5 [ 23123 ]
            alice Alice Sherepa made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            alice Alice Sherepa made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            alice Alice Sherepa made changes -
            Assignee Igor Babaev [ igor ]
            alice Alice Sherepa added a comment -

            Thanks a lot! Reproducible on 10.2-10.5:

            create table t1 (c1 int);
            insert into t1 values (1),(2),(6);
            create function f1() returns int return 
            (with cte1 as (select c1 from t1) select sum(c1) from
            ( select * from cte1 union all select * from cte1) u1);
            select f1();
            main.1_my                                [ fail ]
                    Test ended at 2020-10-05 10:44:13
             
            CURRENT_TEST: main.1_my
            mysqltest: At line 9: query 'select f1()' failed: 1146: Table 'test.t1' doesn't exist
            

            alice Alice Sherepa added a comment - Thanks a lot! Reproducible on 10.2-10.5: create table t1 (c1 int); insert into t1 values (1),(2),(6); create function f1() returns int return (with cte1 as (select c1 from t1) select sum(c1) from ( select * from cte1 union all select * from cte1) u1); select f1(); main.1_my [ fail ] Test ended at 2020-10-05 10:44:13   CURRENT_TEST: main.1_my mysqltest: At line 9: query 'select f1()' failed: 1146: Table 'test.t1' doesn't exist
            igor Igor Babaev (Inactive) made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]

            This bug practically prevents using queries with more than one reference to the same CTE in stored procedures / stored functions. That's why I marked it as 'Critical'.
            The problem is with a special optimization if prelocking mechanism used for SP/SF. This optimization makes impossible to resolve CTE when opening the tables. One of the solution would be elimination of all references to CTE at the very end of the parsing phase for each query rather than in open_and_process_table() as it's done now. This would be a good architectural solution as well. In this case the references to CTE would be substituted for CTE specifications when PREPARE command was executed. Yet this would require a recursive call of the parser when parsing SF/SP.

            igor Igor Babaev (Inactive) added a comment - This bug practically prevents using queries with more than one reference to the same CTE in stored procedures / stored functions. That's why I marked it as 'Critical'. The problem is with a special optimization if prelocking mechanism used for SP/SF. This optimization makes impossible to resolve CTE when opening the tables. One of the solution would be elimination of all references to CTE at the very end of the parsing phase for each query rather than in open_and_process_table() as it's done now. This would be a good architectural solution as well. In this case the references to CTE would be substituted for CTE specifications when PREPARE command was executed. Yet this would require a recursive call of the parser when parsing SF/SP.
            igor Igor Babaev (Inactive) made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            Roel Roel Van de Paar made changes -
            alice Alice Sherepa made changes -
            alice Alice Sherepa added a comment -

            one more test case (MDEV-25464)

            create table t1 (id1 int);
            insert into t1 values (1), (2);
             
            create function f1() returns int return 
            (with cte1 as (select t1.id1 from t1), cte2 as (select 1 from cte1)
            	select 1 from cte1 limit 1);
             
            select f1(); 
            

            alice Alice Sherepa added a comment - one more test case ( MDEV-25464 ) create table t1 (id1 int); insert into t1 values (1), (2);   create function f1() returns int return (with cte1 as (select t1.id1 from t1), cte2 as (select 1 from cte1) select 1 from cte1 limit 1);   select f1();
            igor Igor Babaev (Inactive) made changes -
            igor Igor Babaev (Inactive) added a comment - - edited

            In the current code binding of a table reference to the specification of the corresponding CTE happens in open_and_process_table(). If the table reference is not the first in the query the specification is cloned in the same way as it is cloned for any view reference. This works fine for standalone queries, but does not work for stored procedures / functions for the following reason.
            When the first call of a stored procedure/ function SP is processed the body of SP is parsed. When a query of SP is parsed the info on each encountered table reference is put into a TABLE_LIST object linked into a global chain associated with the query. When parsing of the query is finished the basic info on the table references from this chain except table references to derived tables and information schema tables is put in one hash table associated with SP.
            When parsing of the body of SP is finished this hash table is used to construct TABLE_LIST objects for all table references mentioned in the table and link them into the list of such objects passed to a pre-locking process that calls open_and_process_table() for each table from the list.
            When a TABLE_LIST for a view is encountered the view is opened and its specification is parsed. For any table reference occurred in the specification a new TABLE_LIST object is created to be included into the list for pre-locking. After all objects in the pre-locking have been looked through the tables mentioned in the list are locked. Note that the objects referenced CTEs are just skipped here as it is impossible to resolve these references without any info on the context where they occur. Now the statements from the body of SP are executed one by one that. At the very beginning of the execution of a query the tables used in the query are opened and open_and_process_table() now is called for each table reference mentioned in the list of TABLE_LIST objects associated with the query that was built when the query was parsed.
            For each table reference first it is checked against CTEs definitions in whose scope it occurred. If such definition is found the reference is considered resolved and if this is not the first reference to the found CTE the the specification of the CTE is re-parsed and the result of the parsing is added to the parsing tree of the query as a sub-tree. If this sub-tree contains table references to other tables they are added the list of TABLE_LIST objects associated with the query in order the referenced tables to be opened. When the procedure that opens the tables comes to the TABLE_LIST object created for a non-first reference to a CTE it discovers that the referenced table instance is not locked and reports an error.
            Thus processing non-first table references to a CTE similar to how references to view are processed does not work for queries used in stored procedures / functions. And the main problem is that the current pre-locking mechanism employed for stored procedures / functions does not allow to save the context in which a CTE reference occur.
            It's not trivial to save the info about the context where a CTE reference occurs while the resolution of the table reference cannot be done without this context and consequentially the specification for the table reference cannot be determined.

            igor Igor Babaev (Inactive) added a comment - - edited In the current code binding of a table reference to the specification of the corresponding CTE happens in open_and_process_table(). If the table reference is not the first in the query the specification is cloned in the same way as it is cloned for any view reference. This works fine for standalone queries, but does not work for stored procedures / functions for the following reason. When the first call of a stored procedure/ function SP is processed the body of SP is parsed. When a query of SP is parsed the info on each encountered table reference is put into a TABLE_LIST object linked into a global chain associated with the query. When parsing of the query is finished the basic info on the table references from this chain except table references to derived tables and information schema tables is put in one hash table associated with SP. When parsing of the body of SP is finished this hash table is used to construct TABLE_LIST objects for all table references mentioned in the table and link them into the list of such objects passed to a pre-locking process that calls open_and_process_table() for each table from the list. When a TABLE_LIST for a view is encountered the view is opened and its specification is parsed. For any table reference occurred in the specification a new TABLE_LIST object is created to be included into the list for pre-locking. After all objects in the pre-locking have been looked through the tables mentioned in the list are locked. Note that the objects referenced CTEs are just skipped here as it is impossible to resolve these references without any info on the context where they occur. Now the statements from the body of SP are executed one by one that. At the very beginning of the execution of a query the tables used in the query are opened and open_and_process_table() now is called for each table reference mentioned in the list of TABLE_LIST objects associated with the query that was built when the query was parsed. For each table reference first it is checked against CTEs definitions in whose scope it occurred. If such definition is found the reference is considered resolved and if this is not the first reference to the found CTE the the specification of the CTE is re-parsed and the result of the parsing is added to the parsing tree of the query as a sub-tree. If this sub-tree contains table references to other tables they are added the list of TABLE_LIST objects associated with the query in order the referenced tables to be opened. When the procedure that opens the tables comes to the TABLE_LIST object created for a non-first reference to a CTE it discovers that the referenced table instance is not locked and reports an error. Thus processing non-first table references to a CTE similar to how references to view are processed does not work for queries used in stored procedures / functions. And the main problem is that the current pre-locking mechanism employed for stored procedures / functions does not allow to save the context in which a CTE reference occur. It's not trivial to save the info about the context where a CTE reference occurs while the resolution of the table reference cannot be done without this context and consequentially the specification for the table reference cannot be determined.

            Let's see how exactly we come to a bogus error message for a call of the function:

            create function f1() returns int return
            ( with cte1 as (select c1 from t1)
            select sum(c1) from
            (select * from cte1 as s union all select * from cte1 as t) dt
            );
            

            The function contains 4 table references with aliases t1, s,t, dt.
            When the query

            select f1();
            

            is processed for the first time the function open_and_process_routine() is called that in its turn calls sp_cache_routine() starting parsing of the body of the function f1. The only query of the body is parsed and the chain of 4 TABLE_LIST objects for table references t1,s,t,dt is attached to the query. At the very end of the parsing process for the query the function sp_head::merge_table_list looks through this chain and puts some basic info about the references to t1,s,t into a hash table. When parsing of f1 is finished the control returns to open_and_process_routine() and the function DML_prelocking_strategy::handle_routine is called. The latter invokes the function sp_head::add_used_tables_to_table_list() that looks through the hash table mentioned above and using the info from the elements in table construct a chain of TABLE_LIST objects for t,s,t1 These objects are qualified as placeholders and contain only basic info for locking t,s,t1. Then pre-locking for t,s,t1 is performed. The process tries to open t and fails. Also it fails with opening s. Finally it succeeds with opening t1. Then the control comes to the lock_tables(). The function forms an array of pointers to the TABLE structures for t,s,t1. Only the pointer to t1 is not NULL. After this the function mysql_lock_tables() is called. Now we come to the point where the only query of f1 is processed. The function open_tables() is called for the list of TABLE_LIST objects created for t1,s,t,dt that was attached to the query at the parsing stage. For each TABLE_LIST object the function open_and_process_table() is called. When it's called the TABLE_LIST object created for t the function finds out that t refers to cte1 and it's the second reference to cte1. The parser is called the specification of cte1 that inserts afrer TABLE_LIST object for t a new TABLE_LIST object for the second instance of t1. After this open_and_process_table() is called for this instance. The function sees that the instance has not been locked and an error message is reported.

            igor Igor Babaev (Inactive) added a comment - Let's see how exactly we come to a bogus error message for a call of the function: create function f1() returns int return ( with cte1 as ( select c1 from t1) select sum (c1) from ( select * from cte1 as s union all select * from cte1 as t) dt ); The function contains 4 table references with aliases t1, s,t, dt. When the query select f1(); is processed for the first time the function open_and_process_routine() is called that in its turn calls sp_cache_routine() starting parsing of the body of the function f1. The only query of the body is parsed and the chain of 4 TABLE_LIST objects for table references t1,s,t,dt is attached to the query. At the very end of the parsing process for the query the function sp_head::merge_table_list looks through this chain and puts some basic info about the references to t1,s,t into a hash table. When parsing of f1 is finished the control returns to open_and_process_routine() and the function DML_prelocking_strategy::handle_routine is called. The latter invokes the function sp_head::add_used_tables_to_table_list() that looks through the hash table mentioned above and using the info from the elements in table construct a chain of TABLE_LIST objects for t,s,t1 These objects are qualified as placeholders and contain only basic info for locking t,s,t1. Then pre-locking for t,s,t1 is performed. The process tries to open t and fails. Also it fails with opening s. Finally it succeeds with opening t1. Then the control comes to the lock_tables(). The function forms an array of pointers to the TABLE structures for t,s,t1. Only the pointer to t1 is not NULL. After this the function mysql_lock_tables() is called. Now we come to the point where the only query of f1 is processed. The function open_tables() is called for the list of TABLE_LIST objects created for t1,s,t,dt that was attached to the query at the parsing stage. For each TABLE_LIST object the function open_and_process_table() is called. When it's called the TABLE_LIST object created for t the function finds out that t refers to cte1 and it's the second reference to cte1. The parser is called the specification of cte1 that inserts afrer TABLE_LIST object for t a new TABLE_LIST object for the second instance of t1. After this open_and_process_table() is called for this instance. The function sees that the instance has not been locked and an error message is reported.
            igor Igor Babaev (Inactive) made changes -
            Assignee Igor Babaev [ igor ] Oleksandr Byelkin [ sanja ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            OK to push

            sanja Oleksandr Byelkin added a comment - OK to push
            sanja Oleksandr Byelkin made changes -
            Assignee Oleksandr Byelkin [ sanja ] Igor Babaev [ igor ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            igor and sanja, this was apparently pushed to 10.2.

            I failed to find a 10.3 or 10.4 version of this. On merge to 10.3 I had to resolve numerous conflicts, but the tests passed for me. On an attempted merge to 10.4, I was less lucky, and 3 tests fail. Please fix.

            marko Marko Mäkelä added a comment - igor and sanja , this was apparently pushed to 10.2. I failed to find a 10.3 or 10.4 version of this. On merge to 10.3 I had to resolve numerous conflicts, but the tests passed for me. On an attempted merge to 10.4 , I was less lucky, and 3 tests fail. Please fix.

            This was pushed separately to 10.4. It looks like some adjustment for 10.5 is needed too.

            marko Marko Mäkelä added a comment - This was pushed separately to 10.4 . It looks like some adjustment for 10.5 is needed too.

            This was pushed separately to 10.5 and finally merged to 10.6.
            A follow-up fix seems to be necessary. A DBUG_ASSERT(false) statement was removed from the following function, but the comment was not adjusted to say when the member function is expected to be invoked:

            double Item_in_subselect::val_real()
            {
              /*
                As far as Item_in_subselect called only from Item_in_optimizer this
                method should not be used
              */
            

            marko Marko Mäkelä added a comment - This was pushed separately to 10.5 and finally merged to 10.6 . A follow-up fix seems to be necessary. A DBUG_ASSERT(false) statement was removed from the following function, but the comment was not adjusted to say when the member function is expected to be invoked: double Item_in_subselect::val_real() { /* As far as Item_in_subselect called only from Item_in_optimizer this method should not be used */

            A fix for this bug was pushed into 10.2. Then an adjusted fix was pushed into 10.4. Another variant was pushed into 10.5.

            igor Igor Babaev (Inactive) added a comment - A fix for this bug was pushed into 10.2. Then an adjusted fix was pushed into 10.4. Another variant was pushed into 10.5.
            igor Igor Babaev (Inactive) made changes -
            Fix Version/s 10.6.2 [ 25800 ]
            Fix Version/s 10.2.39 [ 25731 ]
            Fix Version/s 10.3.30 [ 25732 ]
            Fix Version/s 10.4.20 [ 25733 ]
            Fix Version/s 10.5.11 [ 25734 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            elenst Elena Stepanova made changes -
            alice Alice Sherepa made changes -
            alice Alice Sherepa made changes -
            alice Alice Sherepa made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 114183 ] MariaDB v4 [ 158437 ]

            People

              igor Igor Babaev (Inactive)
              szatox Szatox
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.