[MDEV-31995] Bogus error executing PS for query using CTE with renaming of columns - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.4.31, 11.2.1
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Optimizer - CTE, Server
Labels:
None

Description

create table t1 (a int, b int);

with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 group by col1) select * from cte;

so far so good.

prepare st from 'with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 group by col1) select * from cte';

execute st;

ERROR 1054 (42S22): Unknown column 'col1' in 'group statement'

swapping the original column name back in...

prepare st from 'with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 group by a) select * from cte';

execute st;

is OK

order by clause is similar

with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 order by col1) select * from cte;

is OK.

prepare st from 'with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 order by col1) select * from cte';

execute st;

ERROR 1054 (42S22): Unknown column 'col1' in 'order clause'

interestingly (by design) the where clause is treated differently

select a as col1, sum(b) as col2 from t1 where col1 is not null;

ERROR 1054 (42S22): Unknown column 'col1' in 'where clause'

Attachments

Issue Links

relates to

MDEV-32183 Incorrect name resolution in CTE

Open

MDEV-32708 Assertion Failed at /mariadb-11.3.0/sql/filesort.cc:1194

Closed

MDEV-32829 Crash when executing PS for query with eliminated subquery using view

Closed

Activity

Ascending order - Click to sort in descending order

Rex Johnston added a comment - 2023-09-01 03:31 - edited

The attached commit addresses column naming issues with CTEs in the use of prepared statements and procedures.
Usage of either prepared statements or procedures with Common Table Expressions and column renaming may be affected.

There are three related but different issues addressed here.

1) First execution issue. Consider the following

with cte (col1, col2) as (select a as c1, b as c2 from t order by c1) select col1, col2 from cte;

During parsing, the Item representing the first column in the derived table (cte) created, then has it's name set to c1.
When elements in the order by column are resolved during context analysis, this Item is found from it's name. Later in context analysis st_select_lex_unit::prepare(), this Item is renamed col1, for use in a more outer statement (in the above example select col1, col2 from cte).

When run as a prepared statement, context analysis is run again before statement execution, so name resolution on elements in order by will no longer find an Item with the name of c1 and will produce an error
ERROR 1054 (42S22): Unknown column 'c1' in 'order clause'

Columns of the first select in the unit representing any potential union determine the names of the resultant set, so only Items in this select are renamed to names specified in the with clause.
It would be reasonable to only restore these names to those after parsing, but doing so will lead to another problem.

2) Second execution issue. Consider this example

prepare st from "with cte (c1,c2) as

        (select a as col1, sum(b) as col2 from t1 where a > 0 group by col1

          union select a as col3, sum(b) as col4 from t2 where b > 2 group by col3)

        select * from cte where c1=1";

and what happens to Items in the second select in the union. Parsing sets the name of the first Item to col3, which is then used in resolving the group by clause.
During first execution, the name is left as col3, which is again used to resolve the group by clause.

The optimizer will then potentially push the clause "c1=1" into BOTH selects in the union and in doing so will rename all Items in all selects in the union in that position to c1 to make the clause valid.

During the next execution, the group by clause will no longer be resolvable.
ERROR 1054 (42S22): Unknown column 'col3' in 'group statement'

Examining the JSON output from an explain command on the above query produces

            "query_specifications": [

                "query_block": {

                  "select_id": 2,

                  "table": {

                    "table_name": "t1",

                    "access_type": "ALL",

                    "rows": 5,

                    "filtered": 100,

                    "attached_condition": "t1.a = 1"

},

                "query_block": {

                  "select_id": 3,

                  "operation": "UNION",

                  "table": {

                    "table_name": "t2",

                    "access_type": "ALL",

                    "rows": 5,

                    "filtered": 100,

                    "attached_condition": "t2.a = 1 and t2.b > 2"

but with set @@optimizer_switch="condition_pushdown_for_derived=off";

            "query_specifications": [

                "query_block": {

                  "select_id": 2,

                  "filesort": {

                    "sort_key": "t1.a",

                    "temporary_table": {

                      "table": {

                        "table_name": "t1",

                        "access_type": "ALL",

                        "rows": 5,

                        "filtered": 100,

                        "attached_condition": "t1.a > 0"

},

                "query_block": {

                  "select_id": 3,

                  "operation": "UNION",

                  "filesort": {

                    "sort_key": "t2.a",

                    "temporary_table": {

                      "table": {

                        "table_name": "t2",

                        "access_type": "ALL",

                        "rows": 5,

                        "filtered": 100,

                        "attached_condition": "t2.b > 2"

we can see this condition (t1.a=1) pushdown in action.

3) Memory Leak. During parsing, Item names are allocated in statement memory. During context analysis before each execution, Item renaming allocated more memory on statement memory. For statements that have a high number of executions, this would be a noticable memory leak.

A recently introduced compile time flag, activated with the cmake option -DWITH_PROTECT_STATEMENT_MEMROOT:BOOL=ON triggers a failing assert on second execution

prepare stmt from "

with cte(c) as (select ? ) select r.c, s.c+10  from cte as r, cte as s;

";

set @a=2;

execute stmt using @a;

c	s.c+10

2	12

set @a=5;

execute stmt using @a;

#6  0x0000557e017dcb5e in alloc_root (mem_root=0x7fceec20a538, length=4) at /home/rex/src/mariadb/server.10.4-MDEV-24961/mysys/my_alloc.c:225

#7  0x0000557e00a22163 in Query_arena::alloc (this=0x7fceec000db8, size=4) at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/sql_class.h:1121

#8  0x0000557e00ee4652 in make_name (thd=0x7fceec000da0, str=0x7fceec1f80c8 "c", length=1, cs=0x557e024f5300 <my_charset_utf8_general_ci>, max_octet_length=255)

    at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/item.cc:1153

#9  0x0000557e00ee4989 in Item::set_name (this=0x7fceec1f85d0, thd=0x7fceec000da0, str=0x7fceec1f80c8 "c", length=1, cs=0x557e024f5300 <my_charset_utf8_general_ci>)

    at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/item.cc:1217

#10 0x0000557e00dd132f in With_element::rename_columns_of_derived_unit (this=0x7fceec1f8fe0, thd=0x7fceec000da0, unit=0x7fceec1f87b0)

Rex Johnston added a comment - 2023-09-01 03:31 - edited The attached commit addresses column naming issues with CTEs in the use of prepared statements and procedures. Usage of either prepared statements or procedures with Common Table Expressions and column renaming may be affected. There are three related but different issues addressed here. 1) First execution issue. Consider the following with cte (col1, col2) as ( select a as c1, b as c2 from t order by c1) select col1, col2 from cte; During parsing, the Item representing the first column in the derived table (cte) created, then has it's name set to c1 . When elements in the order by column are resolved during context analysis, this Item is found from it's name. Later in context analysis st_select_lex_unit::prepare() , this Item is renamed col1 , for use in a more outer statement (in the above example select col1, col2 from cte ). When run as a prepared statement, context analysis is run again before statement execution, so name resolution on elements in order by will no longer find an Item with the name of c1 and will produce an error ERROR 1054 (42S22): Unknown column 'c1' in 'order clause' Columns of the first select in the unit representing any potential union determine the names of the resultant set, so only Items in this select are renamed to names specified in the with clause. It would be reasonable to only restore these names to those after parsing, but doing so will lead to another problem. 2) Second execution issue. Consider this example prepare st from "with cte (c1,c2) as (select a as col1, sum(b) as col2 from t1 where a > 0 group by col1 union select a as col3, sum(b) as col4 from t2 where b > 2 group by col3) select * from cte where c1=1" ; and what happens to Items in the second select in the union. Parsing sets the name of the first Item to col3 , which is then used in resolving the group by clause. During first execution, the name is left as col3 , which is again used to resolve the group by clause. The optimizer will then potentially push the clause "c1=1" into BOTH selects in the union and in doing so will rename all Items in all selects in the union in that position to c1 to make the clause valid. During the next execution, the group by clause will no longer be resolvable. ERROR 1054 (42S22): Unknown column 'col3' in 'group statement' Examining the JSON output from an explain command on the above query produces "query_specifications": [ { "query_block": { "select_id": 2, "table": { "table_name": "t1", "access_type": "ALL", "rows": 5, "filtered": 100, "attached_condition": "t1.a = 1" } } }, { "query_block": { "select_id": 3, "operation": "UNION", "table": { "table_name": "t2", "access_type": "ALL", "rows": 5, "filtered": 100, "attached_condition": "t2.a = 1 and t2.b > 2" but with set @@optimizer_switch="condition_pushdown_for_derived=off"; "query_specifications": [ { "query_block": { "select_id": 2, "filesort": { "sort_key": "t1.a", "temporary_table": { "table": { "table_name": "t1", "access_type": "ALL", "rows": 5, "filtered": 100, "attached_condition": "t1.a > 0" } } } } }, { "query_block": { "select_id": 3, "operation": "UNION", "filesort": { "sort_key": "t2.a", "temporary_table": { "table": { "table_name": "t2", "access_type": "ALL", "rows": 5, "filtered": 100, "attached_condition": "t2.b > 2" we can see this condition (t1.a=1) pushdown in action. 3) Memory Leak. During parsing, Item names are allocated in statement memory. During context analysis before each execution, Item renaming allocated more memory on statement memory. For statements that have a high number of executions, this would be a noticable memory leak. A recently introduced compile time flag, activated with the cmake option -DWITH_PROTECT_STATEMENT_MEMROOT:BOOL=ON triggers a failing assert on second execution prepare stmt from " with cte(c) as (select ? ) select r.c, s.c+10 from cte as r, cte as s; " ; set @a=2; execute stmt using @a; c s.c+10 2 12 set @a=5; execute stmt using @a; #6 0x0000557e017dcb5e in alloc_root (mem_root=0x7fceec20a538, length=4) at /home/rex/src/mariadb/server.10.4-MDEV-24961/mysys/my_alloc.c:225 #7 0x0000557e00a22163 in Query_arena::alloc (this=0x7fceec000db8, size=4) at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/sql_class.h:1121 #8 0x0000557e00ee4652 in make_name (thd=0x7fceec000da0, str=0x7fceec1f80c8 "c", length=1, cs=0x557e024f5300 <my_charset_utf8_general_ci>, max_octet_length=255) at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/item.cc:1153 #9 0x0000557e00ee4989 in Item::set_name (this=0x7fceec1f85d0, thd=0x7fceec000da0, str=0x7fceec1f80c8 "c", length=1, cs=0x557e024f5300 <my_charset_utf8_general_ci>) at /home/rex/src/mariadb/server.10.4-MDEV-24961/sql/item.cc:1217 #10 0x0000557e00dd132f in With_element::rename_columns_of_derived_unit (this=0x7fceec1f8fe0, thd=0x7fceec000da0, unit=0x7fceec1f87b0)

Rex Johnston added a comment - 2023-10-30 06:21

Thanks Igor.

Rex Johnston added a comment - 2023-10-30 06:21 Thanks Igor.

Igor Babaev (Inactive) added a comment - 2023-10-30 06:55

ok to push into 10.4

Igor Babaev (Inactive) added a comment - 2023-10-30 06:55 ok to push into 10.4

Marko Mäkelä added a comment - 2023-11-17 11:50

The 10.5 version of this was pushed as a normal commit, not as a merge of the 10.4 commit. I just merged this to 10.6.

Marko Mäkelä added a comment - 2023-11-17 11:50 The 10.5 version of this was pushed as a normal commit , not as a merge of the 10.4 commit . I just merged this to 10.6 .

Sergei Petrunia added a comment - 2024-01-08 11:07

Suggested line for the changelog:
"Fixed several issues that could be observed when using Prepared Statements that use CTEs and rename the CTE's columns. The issues manifested themselves as bugs error message and/or memory leak".

Sergei Petrunia added a comment - 2024-01-08 11:07 Suggested line for the changelog: "Fixed several issues that could be observed when using Prepared Statements that use CTEs and rename the CTE's columns. The issues manifested themselves as bugs error message and/or memory leak".

People

Assignee:: Rex Johnston

Reporter:: Rex Johnston

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2023-08-23 22:09

Updated:: 2024-04-04 11:45

Resolved:: 2023-11-07 04:00

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server