Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31995

Bogus error executing PS for query using CTE with renaming of columns

Details

    Description

      create table t1 (a int, b int);
       
      with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 group by col1) select * from cte;
      

      so far so good.

      prepare st from 'with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 group by col1) select * from cte';
      execute st;
      

      ERROR 1054 (42S22): Unknown column 'col1' in 'group statement'

      swapping the original column name back in...

      prepare st from 'with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 group by a) select * from cte';
      execute st;
      

      is OK

      order by clause is similar

      with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 order by col1) select * from cte;
      

      is OK.

      prepare st from 'with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 order by col1) select * from cte';
      execute st;
      

      ERROR 1054 (42S22): Unknown column 'col1' in 'order clause'

      interestingly (by design) the where clause is treated differently

      select a as col1, sum(b) as col2 from t1 where col1 is not null;
      

      ERROR 1054 (42S22): Unknown column 'col1' in 'where clause'

      Attachments

        Issue Links

          Activity

            Johnston Rex Johnston added a comment - - edited

            The attached commit addresses column naming issues with CTEs in the use of prepared statements and procedures.
            Usage of either prepared statements or procedures with Common Table Expressions and column renaming may be affected.

            There are three related but different issues addressed here.

            1) First execution issue. Consider the following

            with cte (col1, col2) as (select a as c1, b as c2 from t order by c1) select col1, col2 from cte;
            

            During parsing, the Item representing the first column in the derived table (cte) created, then has it's name set to c1.
            When elements in the order by column are resolved during context analysis, this Item is found from it's name. Later in context analysis st_select_lex_unit::prepare(), this Item is renamed col1, for use in a more outer statement (in the above example select col1, col2 from cte).

            When run as a prepared statement, context analysis is run again before statement execution, so name resolution on elements in order by will no longer find an Item with the name of c1 and will produce an error
            ERROR 1054 (42S22): Unknown column 'c1' in 'order clause'

            Columns of the first select in the unit representing any potential union determine the names of the resultant set, so only Items in this select are renamed to names specified in the with clause.
            It would be reasonable to only restore these names to those after parsing, but doing so will lead to another problem.

            2) Second execution issue. Consider this example

            prepare st from "with cte (c1,c2) as
                    (select a as col1, sum(b) as col2 from t1 where a > 0 group by col1
                      union select a as col3, sum(b) as col4 from t2 where b > 2 group by col3)
                    select * from cte where c1=1";
            

            and what happens to Items in the second select in the union. Parsing sets the name of the first Item to col3, which is then used in resolving the group by clause.
            During first execution, the name is left as col3, which is again used to resolve the group by clause.

            The optimizer will then potentially push the clause "c1=1" into BOTH selects in the union and in doing so will rename all Items in all selects in the union in that position to c1 to make the clause valid.

            During the next execution, the group by clause will no longer be resolvable.
            ERROR 1054 (42S22): Unknown column 'col3' in 'group statement'

            Examining the JSON output from an explain command on the above query produces

                        "query_specifications": [
                          {
                            "query_block": {
                              "select_id": 2,
                              "table": {
                                "table_name": "t1",
                                "access_type": "ALL",
                                "rows": 5,
                                "filtered": 100,
                                "attached_condition": "t1.a = 1"
                              }
                            }
                          },
                          {
                            "query_block": {
                              "select_id": 3,
                              "operation": "UNION",
                              "table": {
                                "table_name": "t2",
                                "access_type": "ALL",
                                "rows": 5,
                                "filtered": 100,
                                "attached_condition": "t2.a = 1 and t2.b > 2"
            

            but with set @@optimizer_switch="condition_pushdown_for_derived=off";

                        "query_specifications": [
                          {
                            "query_block": {
                              "select_id": 2,
                              "filesort": {
                                "sort_key": "t1.a",
                                "temporary_table": {
                                  "table": {
                                    "table_name": "t1",
                                    "access_type": "ALL",
                                    "rows": 5,
                                    "filtered": 100,
                                    "attached_condition": "t1.a > 0"
                                  }
                                }
                              }
                            }
                          },
                          {
                            "query_block": {
                              "select_id": 3,
                              "operation": "UNION",
                              "filesort": {
                                "sort_key": "t2.a",
                                "temporary_table": {
                                  "table": {
                                    "table_name": "t2",
                                    "access_type": "ALL",
                                    "rows": 5,
                                    "filtered": 100,
                                    "attached_condition": "t2.b > 2"
            

            we can see this condition (t1.a=1) pushdown in action.

            3) Memory Leak. During parsing, Item names are allocated in statement memory. During context analysis before each execution, Item renaming allocated more memory on statement memory. For statements that have a high number of executions, this would be a noticable memory leak.

            A recently introduced compile time flag, activated with the cmake option -DWITH_PROTECT_STATEMENT_MEMROOT:BOOL=ON triggers a failing assert on second execution

            prepare stmt from "
            with cte(c) as (select ? ) select r.c, s.c+10  from cte as r, cte as s;
            ";
            set @a=2;
            execute stmt using @a;
            c	s.c+10
            2	12
            set @a=5;
            execute stmt using @a;
            

            #6  0x0000557e017dcb5e in alloc_root (mem_root=0x7fceec20a538, length=4) at /home/rex/src/mariadb/server.10.4-MDEV-24961/mysys/my_alloc.c:225
            #7  0x0000557e00a22163 in Query_arena::alloc (this=0x7fceec000db8, size=4) at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/sql_class.h:1121
            #8  0x0000557e00ee4652 in make_name (thd=0x7fceec000da0, str=0x7fceec1f80c8 "c", length=1, cs=0x557e024f5300 <my_charset_utf8_general_ci>, max_octet_length=255)
                at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/item.cc:1153
            #9  0x0000557e00ee4989 in Item::set_name (this=0x7fceec1f85d0, thd=0x7fceec000da0, str=0x7fceec1f80c8 "c", length=1, cs=0x557e024f5300 <my_charset_utf8_general_ci>)
                at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/item.cc:1217
            #10 0x0000557e00dd132f in With_element::rename_columns_of_derived_unit (this=0x7fceec1f8fe0, thd=0x7fceec000da0, unit=0x7fceec1f87b0)
            

            Johnston Rex Johnston added a comment - - edited The attached commit addresses column naming issues with CTEs in the use of prepared statements and procedures. Usage of either prepared statements or procedures with Common Table Expressions and column renaming may be affected. There are three related but different issues addressed here. 1) First execution issue. Consider the following with cte (col1, col2) as ( select a as c1, b as c2 from t order by c1) select col1, col2 from cte; During parsing, the Item representing the first column in the derived table (cte) created, then has it's name set to c1 . When elements in the order by column are resolved during context analysis, this Item is found from it's name. Later in context analysis st_select_lex_unit::prepare() , this Item is renamed col1 , for use in a more outer statement (in the above example select col1, col2 from cte ). When run as a prepared statement, context analysis is run again before statement execution, so name resolution on elements in order by will no longer find an Item with the name of c1 and will produce an error ERROR 1054 (42S22): Unknown column 'c1' in 'order clause' Columns of the first select in the unit representing any potential union determine the names of the resultant set, so only Items in this select are renamed to names specified in the with clause. It would be reasonable to only restore these names to those after parsing, but doing so will lead to another problem. 2) Second execution issue. Consider this example prepare st from "with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 where a > 0 group by col1 union select a as col3, sum(b) as col4 from t2 where b > 2 group by col3) select * from cte where c1=1" ; and what happens to Items in the second select in the union. Parsing sets the name of the first Item to col3 , which is then used in resolving the group by clause. During first execution, the name is left as col3 , which is again used to resolve the group by clause. The optimizer will then potentially push the clause "c1=1" into BOTH selects in the union and in doing so will rename all Items in all selects in the union in that position to c1 to make the clause valid. During the next execution, the group by clause will no longer be resolvable. ERROR 1054 (42S22): Unknown column 'col3' in 'group statement' Examining the JSON output from an explain command on the above query produces "query_specifications": [ { "query_block": { "select_id": 2, "table": { "table_name": "t1", "access_type": "ALL", "rows": 5, "filtered": 100, "attached_condition": "t1.a = 1" } } }, { "query_block": { "select_id": 3, "operation": "UNION", "table": { "table_name": "t2", "access_type": "ALL", "rows": 5, "filtered": 100, "attached_condition": "t2.a = 1 and t2.b > 2" but with set @@optimizer_switch="condition_pushdown_for_derived=off"; "query_specifications": [ { "query_block": { "select_id": 2, "filesort": { "sort_key": "t1.a", "temporary_table": { "table": { "table_name": "t1", "access_type": "ALL", "rows": 5, "filtered": 100, "attached_condition": "t1.a > 0" } } } } }, { "query_block": { "select_id": 3, "operation": "UNION", "filesort": { "sort_key": "t2.a", "temporary_table": { "table": { "table_name": "t2", "access_type": "ALL", "rows": 5, "filtered": 100, "attached_condition": "t2.b > 2" we can see this condition (t1.a=1) pushdown in action. 3) Memory Leak. During parsing, Item names are allocated in statement memory. During context analysis before each execution, Item renaming allocated more memory on statement memory. For statements that have a high number of executions, this would be a noticable memory leak. A recently introduced compile time flag, activated with the cmake option -DWITH_PROTECT_STATEMENT_MEMROOT:BOOL=ON triggers a failing assert on second execution prepare stmt from " with cte(c) as (select ? ) select r.c, s.c+10 from cte as r, cte as s; " ; set @a=2; execute stmt using @a; c s.c+10 2 12 set @a=5; execute stmt using @a; #6 0x0000557e017dcb5e in alloc_root (mem_root=0x7fceec20a538, length=4) at /home/rex/src/mariadb/server.10.4-MDEV-24961/mysys/my_alloc.c:225 #7 0x0000557e00a22163 in Query_arena::alloc (this=0x7fceec000db8, size=4) at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/sql_class.h:1121 #8 0x0000557e00ee4652 in make_name (thd=0x7fceec000da0, str=0x7fceec1f80c8 "c", length=1, cs=0x557e024f5300 <my_charset_utf8_general_ci>, max_octet_length=255) at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/item.cc:1153 #9 0x0000557e00ee4989 in Item::set_name (this=0x7fceec1f85d0, thd=0x7fceec000da0, str=0x7fceec1f80c8 "c", length=1, cs=0x557e024f5300 <my_charset_utf8_general_ci>) at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/item.cc:1217 #10 0x0000557e00dd132f in With_element::rename_columns_of_derived_unit (this=0x7fceec1f8fe0, thd=0x7fceec000da0, unit=0x7fceec1f87b0)
            Johnston Rex Johnston added a comment -

            Thanks Igor.

            Johnston Rex Johnston added a comment - Thanks Igor.
            igor Igor Babaev added a comment -

            ok to push into 10.4

            igor Igor Babaev added a comment - ok to push into 10.4
            marko Marko Mäkelä added a comment - The 10.5 version of this was pushed as a normal commit , not as a merge of the 10.4 commit . I just merged this to 10.6 .

            Suggested line for the changelog:
            "Fixed several issues that could be observed when using Prepared Statements that use CTEs and rename the CTE's columns. The issues manifested themselves as bugs error message and/or memory leak".

            psergei Sergei Petrunia added a comment - Suggested line for the changelog: "Fixed several issues that could be observed when using Prepared Statements that use CTEs and rename the CTE's columns. The issues manifested themselves as bugs error message and/or memory leak".

            People

              Johnston Rex Johnston
              Johnston Rex Johnston
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.