[MDEV-22534] Trivial correlation detection/removal for IN subqueries - Jira

Yuchen Pei added a comment - 2023-05-04 03:09

So we want to turn some kind of trivially correlated subquery to an uncorrelated subquery, so that the optimizer can apply materialization.

Extrapolating from the example, and the description in EXISTS-TO-IN[1], what happens is that we transform

in (select inner_col' from inner_table where inner_col = outer_col)

to

, outer_col in (select inner_col', inner_col from inner_table)

How much do we want to generalize this? Judging from EXISTS-TO-IN, we can add an inner_where:

in (select inner_col' from inner_table where inner_col = outer_col and inner_where) => , outer_col in (select inner_col', inner_col from inner_table where inner_where)

But how about expressions?

EXISTS-TO-IN seems to generalise to outer_expr=inner_col (see find_inner_outer_equalities()), so maybe we can also have

in (select inner_col' from inner_table where inner_col = outer_expr and inner_where) => , outer_expr in (select inner_col', inner_col from inner_table where inner_where)

But why not go all the way to

in (select inner_expr' from inner_table where inner_expr = outer_expr and inner_where) => , outer_expr in (select inner_expr', inner_expr from inner_table where inner_where)
?

[1] https://mariadb.com/kb/en/exists-to-in-optimization/

In any case I'll try to hack out a PoC that covers the example first.

Yuchen Pei added a comment - 2023-05-04 03:09 So we want to turn some kind of trivially correlated subquery to an uncorrelated subquery, so that the optimizer can apply materialization. Extrapolating from the example, and the description in EXISTS-TO-IN [1] , what happens is that we transform in (select inner_col' from inner_table where inner_col = outer_col) to , outer_col in (select inner_col', inner_col from inner_table) How much do we want to generalize this? Judging from EXISTS-TO-IN, we can add an inner_where: in (select inner_col' from inner_table where inner_col = outer_col and inner_where) => , outer_col in (select inner_col', inner_col from inner_table where inner_where) But how about expressions? EXISTS-TO-IN seems to generalise to outer_expr=inner_col (see find_inner_outer_equalities() ), so maybe we can also have in (select inner_col' from inner_table where inner_col = outer_expr and inner_where) => , outer_expr in (select inner_col', inner_col from inner_table where inner_where) But why not go all the way to in (select inner_expr' from inner_table where inner_expr = outer_expr and inner_where) => , outer_expr in (select inner_expr', inner_expr from inner_table where inner_where) ? [1] https://mariadb.com/kb/en/exists-to-in-optimization/ In any case I'll try to hack out a PoC that covers the example first.

Yuchen Pei added a comment - 2023-05-09 02:40

Here's a patch that works for the example test case:

https://github.com/MariaDB/server/commit/b3752311d95

Working on refining it.

Yuchen Pei added a comment - 2023-05-09 02:40 Here's a patch that works for the example test case: https://github.com/MariaDB/server/commit/b3752311d95 Working on refining it.

Sergei Petrunia added a comment - 2023-05-09 11:05 - edited

ycp,

But why not go all the way to

in (select inner_expr'

    from inner_table

    where inner_expr = outer_expr and inner_where)

=> ,

outer_expr in (select inner_expr', inner_expr

               from inner_table

               where inner_where)

This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that?

Looking at how current EXISTS-to-IN works:

create table t1 (a int, b int);

insert into t1 select seq, seq from seq_1_to_10000;

create table t2 as select * from t1;

EXISTS-to-IN is done for outer_expr=inner_col

explain

select * from t1

where exists (select 1 from t2 where t1.a+1=t2.a);

EXISTS-to-IN is NOT done for outer_col=inner_expr:

explain

select * from t1

where exists (select 1 from t2 where t1.a=t2.a+1);

(ycp, please check with sanja for reasons why... )

Sergei Petrunia added a comment - 2023-05-09 11:05 - edited ycp , But why not go all the way to in (select inner_expr' from inner_table where inner_expr = outer_expr and inner_where) => , outer_expr in (select inner_expr', inner_expr from inner_table where inner_where) This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that? Looking at how current EXISTS-to-IN works: create table t1 (a int, b int); insert into t1 select seq, seq from seq_1_to_10000; create table t2 as select * from t1; EXISTS-to-IN is done for outer_expr=inner_col explain select * from t1 where exists (select 1 from t2 where t1.a+1=t2.a); EXISTS-to-IN is NOT done for outer_col=inner_expr : explain select * from t1 where exists (select 1 from t2 where t1.a=t2.a+1); ( ycp , please check with sanja for reasons why... )

Sergei Petrunia added a comment - 2023-05-09 11:15 - edited

Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable:

alter table t2 add c varchar(32);

set optimizer_trace=1;

explain select * from t1 where exists (select 1 from t2 where t2.c=t1.a);

select left(trace,1900) from information_schema.optimizer_trace\G

+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                                           |

+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+

|    1 | PRIMARY     | t1    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |                                                                 |

|    1 | PRIMARY     | t2    | ALL  | NULL          | NULL | NULL    | NULL | 1     | Using where; FirstMatch(t1); Using join buffer (flat, BNL join) |

+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+

            "transformation": {

              "select_id": 2,

              "from": "EXISTS (SELECT)",

              "to": "IN (SELECT)",

              "upper_not": false

},

            "transformation": {

              "select_id": 2,

              "from": "IN (SELECT)",

              "to": "semijoin",

              "chosen": true

},

            "transformation": {

              "select_id": 2,

              "from": "IN (SELECT)",

              "to": "materialization",

              "possible": false,

              "cause": "types mismatch"

},

what should the code for this MDEV do in such cases?

Sergei Petrunia added a comment - 2023-05-09 11:15 - edited Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable: alter table t2 add c varchar (32); set optimizer_trace=1; explain select * from t1 where exists ( select 1 from t2 where t2.c=t1.a); select left (trace,1900) from information_schema.optimizer_trace\G +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | | | 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 1 | Using where; FirstMatch(t1); Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+ { "transformation" : { "select_id" : 2, "from" : "EXISTS (SELECT)" , "to" : "IN (SELECT)" , "upper_not" : false } }, { "transformation" : { "select_id" : 2, "from" : "IN (SELECT)" , "to" : "semijoin" , "chosen" : true } }, { "transformation" : { "select_id" : 2, "from" : "IN (SELECT)" , "to" : "materialization" , "possible" : false , "cause" : "types mismatch" } }, what should the code for this MDEV do in such cases?

Sergei Petrunia added a comment - 2023-05-09 11:40

ycp, the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well.

Sergei Petrunia added a comment - 2023-05-09 11:40 ycp , the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well.

Yuchen Pei added a comment - 2023-05-10 00:31 - edited

Hi psergei, thanks for the comments.

> This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that?

I don't see any problem, will add testcases covering this scenario. What is the concern? Are there existing testcases in the codebase addressing this kind of concerns?

Also, what do you mean by fulltext functions?

> Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable:

> what should the code for this MDEV do in such cases?

I feel that in general decorrelation is a gain, even if afterwards there's still correlation left.

But in the example you provided, assuming it does the exists2in transformation to select * from t1 where a in (select c from t2); and proceeds to executing the subquery select c from t2 with a full scan followed by a futile where a in ...., it does look like a scenario we want to avoid if possible. I will need to look more into that.

> the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well.

Will do, thanks.

Yuchen Pei added a comment - 2023-05-10 00:31 - edited Hi psergei , thanks for the comments. > This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that? I don't see any problem, will add testcases covering this scenario. What is the concern? Are there existing testcases in the codebase addressing this kind of concerns? Also, what do you mean by fulltext functions? > Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable: > what should the code for this MDEV do in such cases? I feel that in general decorrelation is a gain, even if afterwards there's still correlation left. But in the example you provided, assuming it does the exists2in transformation to select * from t1 where a in (select c from t2); and proceeds to executing the subquery select c from t2 with a full scan followed by a futile where a in .... , it does look like a scenario we want to avoid if possible. I will need to look more into that. > the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well. Will do, thanks.

Yuchen Pei added a comment - 2023-05-10 07:48

As discussed in a call with sanja, I created MDEV-31229 regarding the generalization to inner_expr=outer_expr

Yuchen Pei added a comment - 2023-05-10 07:48 As discussed in a call with sanja , I created MDEV-31229 regarding the generalization to inner_expr=outer_expr

Yuchen Pei added a comment - 2023-05-12 09:42 - edited

I added more tests, and started fixing regressions.

Most of the test failures concern mismatch in result files, while two ends in crashes: main.subselect_sj2_mat and main.subselect4. Both are caused by the use of freed items during the prepare stage in a second execution of a prepared statements. I fixed main.subselect_sj2_mat by calling thd->activate_stmt_arena_if_needed() and thd->restore_active_arena() in the processor. However, main.subselect4 is trickier, because the offending item was created at the prepare stage (presumably during the first execution) rather than inside the processor. This also seems to uncover a crash bug in the exists-to-in transformation:

set optimizer_switch=default;

CREATE TABLE t1 (a INT);

CREATE TABLE t2 (b INT);

PREPARE stmt FROM "

SELECT * FROM t2

HAVING 0 IN (

  SELECT a FROM t1

  WHERE EXISTS (

    SELECT a FROM t1

    WHERE b = a

";

EXECUTE stmt;

EXECUTE stmt;

I have not looked into it but suspect it is caused by the same issue. Filed it to MDEV-31269.

Yuchen Pei added a comment - 2023-05-12 09:42 - edited I added more tests, and started fixing regressions. Most of the test failures concern mismatch in result files, while two ends in crashes: main.subselect_sj2_mat and main.subselect4. Both are caused by the use of freed items during the prepare stage in a second execution of a prepared statements. I fixed main.subselect_sj2_mat by calling thd->activate_stmt_arena_if_needed() and thd->restore_active_arena() in the processor. However, main.subselect4 is trickier, because the offending item was created at the prepare stage (presumably during the first execution) rather than inside the processor. This also seems to uncover a crash bug in the exists-to-in transformation: set optimizer_switch= default ; CREATE TABLE t1 (a INT ); CREATE TABLE t2 (b INT ); PREPARE stmt FROM " SELECT * FROM t2 HAVING 0 IN ( SELECT a FROM t1 WHERE EXISTS ( SELECT a FROM t1 WHERE b = a ) ) " ; EXECUTE stmt; EXECUTE stmt; I have not looked into it but suspect it is caused by the same issue. Filed it to MDEV-31269 .

Yuchen Pei added a comment - 2023-05-16 06:03 - edited

To be more clear, the problem with main.subselect4 is if we run the following test with the draft patch at[1], we get sigsegv similar to that of MDEV-31269:

set optimizer_switch=default;

CREATE TABLE t1 (a INT);

CREATE TABLE t2 (b INT);

PREPARE stmt FROM "

SELECT * FROM t2

HAVING 0 IN (

  SELECT a FROM t1

  WHERE a IN (

    SELECT a FROM t1

    WHERE b = a

";

EXECUTE stmt;

EXECUTE stmt;

[1] https://github.com/MariaDB/server/commit/88ed254c3e6

I did more debugging about this, and found out that the crash is caused by the following events

1. During the JOIN::prepare() of the first execution, the Item_field referring to t2.b is replaced by an Item_ref
2. The item_ref is moved around in the decorrelation transformation and subsequent optimizations
3. At the end of the 1st ps execution, the Item_ref is freed due to its creation in the ps execution, as part of the cleanup
4. During the JOIN::prepare() of the second execution, it tries to restore the optimized version of the statement, in doing so it does a sanity check and causes a sigsegv when trying to access the freed Item_ref.

Why the problem does not occur with the transformation disabled:

during step 1 above, the replacement also adds an item to the change_list
during step 2 above, the optimizations cause the equality b = a to be moved around as a whole, thereby moving its args which is an Item ** instead of individual item *'s
during step 3 above, the change is rolled back, so the Item ** that is the args of the Item_func_eq goes from b (Item_ref) = a to b (Item_field) = a
during step 4 above, there's no use-after-free in the restored optimized statement because it automatically contains the b (Item_field) = a.

Why we can't do the same with the transformation enabled:

during step 2, we break the b = a into the outer_expr (b) and local_field (a), add the outer_expr to left_expr and local_field to the inner select item list. we cannot move the equality as a whole during this transformation.

Maybe we can fix this by checking during the decorrelation transformation whether local_field or any item in the tree of outer_expr is in the free_list that will be freed at Step 3. If so, then do not proceed with the transformation. What do you think psergei?

I also debugged the exists2in crash (MDEV-31269), and it seems to indeed have the same cause. So I wonder whether it makes sense to try to fix that one first.

Yuchen Pei added a comment - 2023-05-16 06:03 - edited To be more clear, the problem with main.subselect4 is if we run the following test with the draft patch at [1] , we get sigsegv similar to that of MDEV-31269 : set optimizer_switch= default ; CREATE TABLE t1 (a INT ); CREATE TABLE t2 (b INT ); PREPARE stmt FROM " SELECT * FROM t2 HAVING 0 IN ( SELECT a FROM t1 WHERE a IN ( SELECT a FROM t1 WHERE b = a ) ) " ; EXECUTE stmt; EXECUTE stmt; [1] https://github.com/MariaDB/server/commit/88ed254c3e6 I did more debugging about this, and found out that the crash is caused by the following events 1. During the JOIN::prepare() of the first execution, the Item_field referring to t2.b is replaced by an Item_ref 2. The item_ref is moved around in the decorrelation transformation and subsequent optimizations 3. At the end of the 1st ps execution, the Item_ref is freed due to its creation in the ps execution, as part of the cleanup 4. During the JOIN::prepare() of the second execution, it tries to restore the optimized version of the statement, in doing so it does a sanity check and causes a sigsegv when trying to access the freed Item_ref. Why the problem does not occur with the transformation disabled: during step 1 above, the replacement also adds an item to the change_list during step 2 above, the optimizations cause the equality b = a to be moved around as a whole, thereby moving its args which is an Item ** instead of individual item * 's during step 3 above, the change is rolled back, so the Item ** that is the args of the Item_func_eq goes from b (Item_ref) = a to b (Item_field) = a during step 4 above, there's no use-after-free in the restored optimized statement because it automatically contains the b (Item_field) = a . Why we can't do the same with the transformation enabled: during step 2, we break the b = a into the outer_expr (b) and local_field (a), add the outer_expr to left_expr and local_field to the inner select item list. we cannot move the equality as a whole during this transformation. Maybe we can fix this by checking during the decorrelation transformation whether local_field or any item in the tree of outer_expr is in the free_list that will be freed at Step 3. If so, then do not proceed with the transformation. What do you think psergei ? I also debugged the exists2in crash ( MDEV-31269 ), and it seems to indeed have the same cause. So I wonder whether it makes sense to try to fix that one first.

Yuchen Pei added a comment - 2023-05-19 10:16

I have a more polished patch now. Can you take a look psergei? Thank you:

https://github.com/MariaDB/server/commit/ba9cf8efeb0

The special cases where left_expr are subqueries or fulltext functions are included in the test.

The type mismatch causing no materialization down the road is not obvious to fix. I thought about checking whether the type handlers of the two sides of the equalities, but then there are "compatible" ones like longlong vs long. I could look more into it (e.g. how to determine which mismatches are compatible and which are not) but I feel it has already taken quite long, and given this is an existing issue in the existing exists-to-in transformation, maybe we can do with creating a ticket for it.

Finally, my fix of 2nd ps execution segfault requires HAVE_PSI_STATEMENT_INTERFACE because it checks thd->m_statement_state.m_parent_prepared_stmt to determine whether we are inside a ps execution. Is there a better way to determine this?

Yuchen Pei added a comment - 2023-05-19 10:16 I have a more polished patch now. Can you take a look psergei ? Thank you: https://github.com/MariaDB/server/commit/ba9cf8efeb0 The special cases where left_expr are subqueries or fulltext functions are included in the test. The type mismatch causing no materialization down the road is not obvious to fix. I thought about checking whether the type handlers of the two sides of the equalities, but then there are "compatible" ones like longlong vs long. I could look more into it (e.g. how to determine which mismatches are compatible and which are not) but I feel it has already taken quite long, and given this is an existing issue in the existing exists-to-in transformation, maybe we can do with creating a ticket for it. Finally, my fix of 2nd ps execution segfault requires HAVE_PSI_STATEMENT_INTERFACE because it checks thd->m_statement_state.m_parent_prepared_stmt to determine whether we are inside a ps execution. Is there a better way to determine this?

Yuchen Pei added a comment - 2023-05-23 07:52 - edited

I think the scope of this ticket should be the transformation. Prevention of the 2nd ps execution bug is part of MDEV-31269.

Because of that, I'm marking MDEV-31269 as blocking this ticket, because without fixing that bug and preventing it the same way for this transformation we can't push the change in this ticket, unless the transformation can be implemented in a significantly different way from exists2in such that the bug no longer manifests.

But I think the patch still can be reviewed for the transformation itself. Hope that makes sense.

Yuchen Pei added a comment - 2023-05-23 07:52 - edited I think the scope of this ticket should be the transformation. Prevention of the 2nd ps execution bug is part of MDEV-31269 . Because of that, I'm marking MDEV-31269 as blocking this ticket, because without fixing that bug and preventing it the same way for this transformation we can't push the change in this ticket, unless the transformation can be implemented in a significantly different way from exists2in such that the bug no longer manifests. But I think the patch still can be reviewed for the transformation itself. Hope that makes sense.

Sergei Petrunia added a comment - 2023-05-25 13:54

Some observations from review:

create table ten(a int);

insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);

insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table one_m(a int primary key);

insert into one_m select A.a + B.a* 1000 from one_k A, one_k B;

create table t1 (a int, b int, c int );

insert into t1 select a, a, a from one_k;

create table t2 (a int, b int, c int );

insert into t2  select a,a,a from one_m;

MySQL-8:

mysql> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333;

+----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

| id | select_type        | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |

+----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

|  1 | PRIMARY            | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   1000 |   100.00 | Using where |

|  2 | DEPENDENT SUBQUERY | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 997999 |     1.00 | Using where |

+----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

Odd, I would expect it to work.
But this matches their optimizer trace:

      "transformation": {

        "select#": 2,

        "from": "IN (SELECT)",

        "to": "materialization",

        "possible": false,

        "cause": "correlated"

},

In MariaDB with this patch:

MariaDB [j1]> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333;

+------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+

| id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |

+------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+

|    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000   | Using where |

|    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 997920 | Using where |

+------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+

But the trace shows materialization should be possible:

            "join_preparation": {

              "select_id": 2,

              "steps": [

                  "transformation": {

                    "select_id": 2,

                    "from": "IN (SELECT)",

                    "to": "materialization",

                    "sjm_scan_allowed": false,

                    "possible": true

},

            "transformation": {

              "select_id": 2,

              "from": "IN (SELECT)",

              "to": "decorrelation"

Sergei Petrunia added a comment - 2023-05-25 13:54 Some observations from review: create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table one_m(a int primary key); insert into one_m select A.a + B.a* 1000 from one_k A, one_k B; create table t1 (a int, b int, c int ); insert into t1 select a, a, a from one_k; create table t2 (a int, b int, c int ); insert into t2 select a,a,a from one_m; MySQL-8: mysql> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333; +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 997999 | 1.00 | Using where | +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ Odd, I would expect it to work. But this matches their optimizer trace: { "transformation": { "select#": 2, "from": "IN (SELECT)", "to": "materialization", "possible": false, "cause": "correlated" } }, In MariaDB with this patch: MariaDB [j1]> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333; +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 997920 | Using where | +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+ But the trace shows materialization should be possible: "join_preparation": { "select_id": 2, "steps": [ { "transformation": { "select_id": 2, "from": "IN (SELECT)", "to": "materialization", "sjm_scan_allowed": false, "possible": true } }, "transformation": { "select_id": 2, "from": "IN (SELECT)", "to": "decorrelation" }

Sergei Petrunia added a comment - 2023-05-25 19:21

First portion of review input provided by email

Sergei Petrunia added a comment - 2023-05-25 19:21 First portion of review input provided by email

Yuchen Pei added a comment - 2023-05-31 08:07 - edited

Replying to comment about the non-semijoin materialization[1]:

The trace was printed in subquery_types_allow_materialization() during the prepare stage, which was called by is_materialization_applicable() with some further checks, including correlation. So in this case the original IN subquery passes the "allow" check (whatever that means), but because it is correlated, it does not pass the "applicable" check which would result in adding the materialization strategy.

If it were a semijoin and gets decorrelated by the new transformation, it would get a second chance in convert_join_subqueries_to_semijoins() called in optimize_inner().

If it were an EXIST subquery and had an exists-to-in transformation, it would create a brand new IN subquery in the transformation and call check_and_do_in_subquery_rewrites() that adds the materialization strategy. In our case, the IN subquery is not brand new, and it has already in-to-exists strategy, so doing the same would not help, because there's a check !in_subs->has_strategy() before trying to add the materialization strategy.

So to fix the example you provided, I updated the transformation by adding a (re)consideration for materialization in case of a non-semijoin at the end of the transformation[2].

This causes a few more "possible" materializations in the trace not resulting in actual materialization mainly in the case of having an upper NOT, but given the relationship between subquery_types_allow_materialization() and is_materialization_applicable() as discussed above, as long as materialization is not expected*, I suspect it is OK. What do you think psergei?

[1] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259705&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-259705
[2] https://github.com/MariaDB/server/commit/69af0632fe5

* I checked equivalent EXISTS versions, and they do not result in materialization, unlike your non-semijoin example.

Yuchen Pei added a comment - 2023-05-31 08:07 - edited Replying to comment about the non-semijoin materialization [1] : The trace was printed in subquery_types_allow_materialization() during the prepare stage, which was called by is_materialization_applicable() with some further checks, including correlation. So in this case the original IN subquery passes the "allow" check (whatever that means), but because it is correlated, it does not pass the "applicable" check which would result in adding the materialization strategy. If it were a semijoin and gets decorrelated by the new transformation, it would get a second chance in convert_join_subqueries_to_semijoins() called in optimize_inner() . If it were an EXIST subquery and had an exists-to-in transformation, it would create a brand new IN subquery in the transformation and call check_and_do_in_subquery_rewrites() that adds the materialization strategy. In our case, the IN subquery is not brand new, and it has already in-to-exists strategy, so doing the same would not help, because there's a check !in_subs->has_strategy() before trying to add the materialization strategy. So to fix the example you provided, I updated the transformation by adding a (re)consideration for materialization in case of a non-semijoin at the end of the transformation [2] . This causes a few more "possible" materializations in the trace not resulting in actual materialization mainly in the case of having an upper NOT, but given the relationship between subquery_types_allow_materialization() and is_materialization_applicable() as discussed above, as long as materialization is not expected*, I suspect it is OK. What do you think psergei ? [1] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259705&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-259705 [2] https://github.com/MariaDB/server/commit/69af0632fe5 * I checked equivalent EXISTS versions, and they do not result in materialization, unlike your non-semijoin example.

Yuchen Pei added a comment - 2023-06-01 10:52 - edited

New commit that addresses the review input
https://github.com/MariaDB/server/commit/e80a40e1354

Can you take a look psergei? Thanks.

Yuchen Pei added a comment - 2023-06-01 10:52 - edited New commit that addresses the review input https://github.com/MariaDB/server/commit/e80a40e1354 Can you take a look psergei ? Thanks.

Sergei Petrunia added a comment - 2023-06-13 11:13 - edited

Looking at the next patch. Good to see the input was addressed.

Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql

The trivial correlation is

where t1.b=t2.y

the code puts t1.b IS NOT NULL at the top level (outside the subquery):

  "query_block": {

    "select_id": 1,

          "attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))"

But then inside the subquery we have:

              "full-scan-on-null_key": {

                "table": {

                  "table_name": "t2",

                  "access_type": "ref_or_null",

                  "possible_keys": ["y"],

                  "key": "y",

                  "key_length": "5",

                  "used_key_parts": ["y"],

the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery.

Sergei Petrunia added a comment - 2023-06-13 11:13 - edited Looking at the next patch. Good to see the input was addressed. Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql The trivial correlation is where t1.b=t2.y the code puts t1.b IS NOT NULL at the top level (outside the subquery): "query_block": { "select_id": 1, "attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))" But then inside the subquery we have: { "full-scan-on-null_key": { "table": { "table_name": "t2", "access_type": "ref_or_null", "possible_keys": ["y"], "key": "y", "key_length": "5", "used_key_parts": ["y"], the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery.

Sergei Petrunia added a comment - 2023-06-13 12:23

..., as long as materialization is not expected*, I suspect it is OK. What do you think Sergei Petrunia?

Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy.

Sergei Petrunia added a comment - 2023-06-13 12:23 ..., as long as materialization is not expected*, I suspect it is OK. What do you think Sergei Petrunia? Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy.

Sergei Petrunia added a comment - 2023-06-13 12:27

Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor().

Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:

    (!optimizer ||

...

        !first_select->leaf_tables.elements||

        with_recursive_reference)

Do we need to add these too? (I wonder what "!optimizer" check is for...)

Sergei Petrunia added a comment - 2023-06-13 12:27 Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor(). Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks: (!optimizer || ... !first_select->leaf_tables.elements|| with_recursive_reference) Do we need to add these too? (I wonder what "!optimizer" check is for...)

Sergei Petrunia added a comment - 2023-06-13 12:31

I can also see both exists2in_processor() functions have:

        !(is_top_level_item() ||

          (upper_not && upper_not->is_top_level_item())) ||    /* (2) */

although the motivation behind this limitation is not clear.

is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs.

But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN, we don't care between TRUE/NULL results.
This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result).
But if we do differentiate, why can't we handle the subquery in an arbitrary context?

Sergei Petrunia added a comment - 2023-06-13 12:31 I can also see both exists2in_processor() functions have: !(is_top_level_item() || (upper_not && upper_not->is_top_level_item())) || /* (2) */ although the motivation behind this limitation is not clear. is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs. But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN , we don't care between TRUE/NULL results. This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result). But if we do differentiate, why can't we handle the subquery in an arbitrary context?

Yuchen Pei added a comment - 2023-06-14 01:35 - edited

Hi psergei, thanks for the 3rd round comments. I started working through them before noticing something in your attachment, which I'd like to get cleared before continuing. First, what is the table one below?

explain format=json select * from t1 where t1.a not in (select t2.x from one,t2 ignore index(x) where t1.b=t2.y and t2.z=33)

Also I guess you meant an insert into t2 values preceding the following?

(   12 ,   12 ,   12 ),

(   22 , NULL ,   22 ),

( NULL ,   21 ,   21 ),

(   23 , NULL ,   23 ),

( NULL ,   23 ,   23 );

I assume seq_1_to_10 is from --source include/have_sequence.inc.

Yuchen Pei added a comment - 2023-06-14 01:35 - edited Hi psergei , thanks for the 3rd round comments. I started working through them before noticing something in your attachment, which I'd like to get cleared before continuing. First, what is the table one below? explain format=json select * from t1 where t1.a not in ( select t2.x from one,t2 ignore index (x) where t1.b=t2.y and t2.z=33) Also I guess you meant an insert into t2 values preceding the following? ( 12 , 12 , 12 ), ( 22 , NULL , 22 ), ( NULL , 21 , 21 ), ( 23 , NULL , 23 ), ( NULL , 23 , 23 ); I assume seq_1_to_10 is from --source include/have_sequence.inc .

Sergei Petrunia added a comment - 2023-06-14 08:39

ycp, sorry for errors in the attachment. The table one is

create table one (w int);

insert into one values (1);

I added it so that we get a regular join plan for the subquery, instead of special-case plans of one-table subqueries.

Also I guess you meant an insert into t2 values preceding the following?

Yes.

Sergei Petrunia added a comment - 2023-06-14 08:39 ycp , sorry for errors in the attachment. The table one is create table one (w int); insert into one values (1); I added it so that we get a regular join plan for the subquery, instead of special-case plans of one-table subqueries. Also I guess you meant an insert into t2 values preceding the following? Yes.

Sergei Petrunia added a comment - 2023-06-14 09:38

Take-aways from yesterday's optimizer call

Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:
(!optimizer ||

The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer".

I can also see both exists2in_processor() functions have:

        !(is_top_level_item() ||

          (upper_not && upper_not->is_top_level_item())) ||    /* (2) */

No idea about this one.

Sergei Petrunia added a comment - 2023-06-14 09:38 Take-aways from yesterday's optimizer call Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks: (!optimizer || The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer". I can also see both exists2in_processor() functions have: !(is_top_level_item() || (upper_not && upper_not->is_top_level_item())) || /* (2) */ No idea about this one.

Sergei Petrunia added a comment - 2023-06-14 10:05 - edited

A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2

Sergei Petrunia added a comment - 2023-06-14 10:05 - edited A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2

Yuchen Pei added a comment - 2023-06-15 06:49 - edited

Looking at the next patch. Good to see the input was addressed.

Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql

The trivial correlation is
where t1.b=t2.y 
the code puts t1.b IS NOT NULL at the top level (outside the subquery):
  "query_block": {
    "select_id": 1,
          "attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))"
But then inside the subquery we have:
            {
              "full-scan-on-null_key": {
                "table": {
                  "table_name": "t2",
                  "access_type": "ref_or_null",
                  "possible_keys": ["y"],
                  "key": "y",
                  "key_length": "5",
                  "used_key_parts": ["y"],
the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery.

I did some debugging, and it seems to be caused by the IN subselect creating Item_func_trig_cond when it is not top level during stage 2 optimization[1]. It is avoided by the exists2in transformation because it marks the newly created IN subselect as toplevel[2], even if there's an upper_not:

  in_subs->top_level_item();

[1] (backtrace): Item_in_subselect::create_row_in_to_exists_cond > Item_in_subselect::create_in_to_exists_cond > JOIN::choose_subquery_plan > make_join_statistics > JOIN::optimize_inner > JOIN::optimize > st_select_lex::optimize_unflattened_subqueries > JOIN::optimize_unflattened_subqueries > JOIN::optimize_stage2 > JOIN::optimize_inner > JOIN::optimize

[2] Item_exists_subselect::exists2in_processor

I tried to do the same thing with my decorrelate-in transformation[3], but it fails the following test:

CREATE TABLE t1 (pk INT NOT NULL, i INT NOT NULL);

INSERT INTO t1 VALUES (0,10), (1,20), (2,30), (3,40);

CREATE TABLE t2a (pk INT NOT NULL, i INT NOT NULL, PRIMARY KEY(i,pk));

INSERT INTO t2a VALUES (0,0), (1,1), (2,2), (3,3);

SELECT * FROM t1 WHERE (NULL, 1) NOT IN (SELECT t2a.i, t2a.pk FROM t2a WHERE t2a.pk = t1.pk); # outputs nothing, but should be (0, 10), (2, 30), (3, 40)

[3] https://github.com/MariaDB/server/commit/373a97067bd

I could not come up with a similar test for exists2in to check whether the forced toplevel would cause similar problems. In any case, this needs to be further investigated.

Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy.

Added a field in the trace whether the outcome is still correlated. AFAICT materialization, if applicable after the transformation, does show up in the trace. An example is the non-semijoin materialization example in the test[3], where if you remove the ".to" in json_extract path, you will see materialization after the decorrelation has the possible field equal true.

[3] https://github.com/MariaDB/server/blob/62251786f8bb2084fe9169befdc569c24f5078f2/mysql-test/main/subselect_decorrelate_in.test#L437

Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor().

Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:
    (!optimizer ||
    ...
        !first_select->leaf_tables.elements||
        with_recursive_reference)
Do we need to add these too? (I wonder what "!optimizer" check is for...)

Looking at my notes, I don't know what these checks are for and did not find them necessary for my decorrelate-in transformation. The example snippet in the tests I found that covers !first_select->leaf_tables.elements is

SELECT * FROM t2 AS alias1, t2 AS alias2

WHERE EXISTS ( SELECT 1 ) AND (alias2.pk = alias1.b )

ORDER BY alias1.b;

I wrote a similar test for IN, and it got optimized

# no need to check leaf tables

CREATE TABLE t2 (pk INT PRIMARY KEY, b INT) ENGINE=MyISAM;

INSERT INTO t2 VALUES (1,1),(2,7);

SELECT * FROM t2 AS alias1, t2 AS alias2

WHERE alias1.pk in ( SELECT 1 ) AND (alias2.pk = alias1.b )

ORDER BY alias1.b;

drop table t2;

For with_recursive_reference, the relevant snippet is from cte_recursive.test:

 with recursive

reached_objects as

select v, 'init' as m from objects where v in ('v3','v7','v9')

union

select module_results.v, module_results.m from module_results, applied_modules

where module_results.m = applied_modules.m

),

applied_modules as

select * from modules where 1=0

union

select modules.m

from

modules

where

not exists (select * from module_arguments

where module_arguments.m = modules.m and

module_arguments.v not in

(select v from reached_objects))

select * from reached_objects

There's IN in this test, and without the check in the decorrelate-in transformation, the test passes. I also tried the following test snippet and there's no problem without the check in place:

 with recursive

ancestor_ids (id, generation)

as

select father, 1 from folks where name = 'Me' and  father is not null

union all

select mother, 1 from folks where name = 'Me' and  mother is not null

union all

select father, fa.generation+1 from folks, ancestor_ids fa

where folks.id = fa.id and father is not null and

(father not in (select id from ancestor_ids))

union all

select mother, ma.generation+1 from folks, ancestor_ids ma

where folks.id = ma.id and mother is not null and

(mother not in (select id from ancestor_ids))

select generation, name from ancestor_ids a, folks

where a.id = folks.id

I can also see both exists2in_processor() functions have:
        !(is_top_level_item() ||
          (upper_not && upper_not->is_top_level_item())) ||    /* (2) */
although the motivation behind this limitation is not clear.

is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs.

But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN, we don't care between TRUE/NULL results.
This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result).
But if we do differentiate, why can't we handle the subquery in an arbitrary context?

Right, we don't have to worry about null when it is top level in, and with the added is not null in both the inner and outer where condition, the value of the outer where condition is exactly the same before and after the transformation. I wrote two possibilities[4] why we don't consider arbitrary context, which I quote here:

With this modification, theoretically we don't have to care about
whether the IN subselect is top level, however, there are two issues:

1. When the subquery is indeed top level, do the extra terms lower
performance?
2. When the subquery is not top level, how do we substitute `this`
which is an Item_in_subselect with the Item_func_and?

I suspect this is why the exists2in transformation only does the
modification when there is an upper_not, and the transformation is
skipped when the subquery is not toplevel EXISTS nor toplevel NOT
EXISTS.

I think we can fix Item 1 by handling toplevel separately from non-toplevel, that is, by not adding the is not nulls when it is top level. But do you know how fix Item 2 psergei?

[4] https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/

The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer".

This is not always true. Item_in_subselect is not always wrapped in an optimizer and it is ok (we don't need a similar check). For example you will get segfaults when you remove the if (optimizer) from the following[5] and test against the most basic case select * from t1 where a in (select a from t2 where t1.b=t2.b);

// inside bool Item_in_subselect::exists2in_processor(void *opt_arg)

  /* Update any Item_in_optimizer wrapper if exists */

  if (optimizer)

    optimizer->reset_cache();

    if (optimizer->fix_left(thd))

      res= TRUE;

      goto out;

[5] https://github.com/MariaDB/server/blob/62251786f8bb2084fe9169befdc569c24f5078f2/sql/item_subselect.cc#L3441

A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2

Thank you, I have incorporated the change, as well as the is_correlated field:

https://github.com/MariaDB/server/commit/6175a4541ff

Yuchen Pei added a comment - 2023-06-15 06:49 - edited Looking at the next patch. Good to see the input was addressed. Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql The trivial correlation is where t1.b=t2.y the code puts t1.b IS NOT NULL at the top level (outside the subquery): "query_block": { "select_id": 1, "attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))" But then inside the subquery we have: { "full-scan-on-null_key": { "table": { "table_name": "t2", "access_type": "ref_or_null", "possible_keys": ["y"], "key": "y", "key_length": "5", "used_key_parts": ["y"], the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery. I did some debugging, and it seems to be caused by the IN subselect creating Item_func_trig_cond when it is not top level during stage 2 optimization [1] . It is avoided by the exists2in transformation because it marks the newly created IN subselect as toplevel [2] , even if there's an upper_not: in_subs->top_level_item(); [1] (backtrace): Item_in_subselect::create_row_in_to_exists_cond > Item_in_subselect::create_in_to_exists_cond > JOIN::choose_subquery_plan > make_join_statistics > JOIN::optimize_inner > JOIN::optimize > st_select_lex::optimize_unflattened_subqueries > JOIN::optimize_unflattened_subqueries > JOIN::optimize_stage2 > JOIN::optimize_inner > JOIN::optimize [2] Item_exists_subselect::exists2in_processor I tried to do the same thing with my decorrelate-in transformation [3] , but it fails the following test: CREATE TABLE t1 (pk INT NOT NULL , i INT NOT NULL ); INSERT INTO t1 VALUES (0,10), (1,20), (2,30), (3,40); CREATE TABLE t2a (pk INT NOT NULL , i INT NOT NULL , PRIMARY KEY (i,pk)); INSERT INTO t2a VALUES (0,0), (1,1), (2,2), (3,3); SELECT * FROM t1 WHERE ( NULL , 1) NOT IN ( SELECT t2a.i, t2a.pk FROM t2a WHERE t2a.pk = t1.pk); # outputs nothing, but should be (0, 10), (2, 30), (3, 40) [3] https://github.com/MariaDB/server/commit/373a97067bd I could not come up with a similar test for exists2in to check whether the forced toplevel would cause similar problems. In any case, this needs to be further investigated. Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy. Added a field in the trace whether the outcome is still correlated. AFAICT materialization, if applicable after the transformation, does show up in the trace. An example is the non-semijoin materialization example in the test [3] , where if you remove the ".to" in json_extract path, you will see materialization after the decorrelation has the possible field equal true. [3] https://github.com/MariaDB/server/blob/62251786f8bb2084fe9169befdc569c24f5078f2/mysql-test/main/subselect_decorrelate_in.test#L437 Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor(). Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks: (!optimizer || ... !first_select->leaf_tables.elements|| with_recursive_reference) Do we need to add these too? (I wonder what "!optimizer" check is for...) Looking at my notes, I don't know what these checks are for and did not find them necessary for my decorrelate-in transformation. The example snippet in the tests I found that covers !first_select->leaf_tables.elements is SELECT * FROM t2 AS alias1, t2 AS alias2 WHERE EXISTS ( SELECT 1 ) AND (alias2.pk = alias1.b ) ORDER BY alias1.b; I wrote a similar test for IN, and it got optimized # no need to check leaf tables CREATE TABLE t2 (pk INT PRIMARY KEY , b INT ) ENGINE=MyISAM; INSERT INTO t2 VALUES (1,1),(2,7); SELECT * FROM t2 AS alias1, t2 AS alias2 WHERE alias1.pk in ( SELECT 1 ) AND (alias2.pk = alias1.b ) ORDER BY alias1.b; drop table t2; For with_recursive_reference , the relevant snippet is from cte_recursive.test : with recursive reached_objects as ( select v, 'init' as m from objects where v in ( 'v3' , 'v7' , 'v9' ) union select module_results.v, module_results.m from module_results, applied_modules where module_results.m = applied_modules.m ), applied_modules as ( select * from modules where 1=0 union select modules.m from modules where not exists ( select * from module_arguments where module_arguments.m = modules.m and module_arguments.v not in ( select v from reached_objects)) ) select * from reached_objects There's IN in this test, and without the check in the decorrelate-in transformation, the test passes. I also tried the following test snippet and there's no problem without the check in place: with recursive ancestor_ids (id, generation) as ( select father, 1 from folks where name = 'Me' and father is not null union all select mother, 1 from folks where name = 'Me' and mother is not null union all select father, fa.generation+1 from folks, ancestor_ids fa where folks.id = fa.id and father is not null and (father not in ( select id from ancestor_ids)) union all select mother, ma.generation+1 from folks, ancestor_ids ma where folks.id = ma.id and mother is not null and (mother not in ( select id from ancestor_ids)) ) select generation, name from ancestor_ids a, folks where a.id = folks.id I can also see both exists2in_processor() functions have: !(is_top_level_item() || (upper_not && upper_not->is_top_level_item())) || /* (2) */ although the motivation behind this limitation is not clear. is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs. But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN , we don't care between TRUE/NULL results. This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result). But if we do differentiate, why can't we handle the subquery in an arbitrary context? Right, we don't have to worry about null when it is top level in, and with the added is not null in both the inner and outer where condition, the value of the outer where condition is exactly the same before and after the transformation. I wrote two possibilities [4] why we don't consider arbitrary context, which I quote here: With this modification, theoretically we don't have to care about whether the IN subselect is top level, however, there are two issues: 1. When the subquery is indeed top level, do the extra terms lower performance? 2. When the subquery is not top level, how do we substitute `this` which is an Item_in_subselect with the Item_func_and? I suspect this is why the exists2in transformation only does the modification when there is an upper_not, and the transformation is skipped when the subquery is not toplevel EXISTS nor toplevel NOT EXISTS. I think we can fix Item 1 by handling toplevel separately from non-toplevel, that is, by not adding the is not nulls when it is top level. But do you know how fix Item 2 psergei ? [4] https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/ The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer". This is not always true. Item_in_subselect is not always wrapped in an optimizer and it is ok (we don't need a similar check). For example you will get segfaults when you remove the if (optimizer) from the following [5] and test against the most basic case select * from t1 where a in (select a from t2 where t1.b=t2.b); // inside bool Item_in_subselect::exists2in_processor(void *opt_arg) /* Update any Item_in_optimizer wrapper if exists */ if (optimizer) { optimizer->reset_cache(); if (optimizer->fix_left(thd)) { res= TRUE; goto out; } } [5] https://github.com/MariaDB/server/blob/62251786f8bb2084fe9169befdc569c24f5078f2/sql/item_subselect.cc#L3441 A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2 Thank you, I have incorporated the change, as well as the is_correlated field: https://github.com/MariaDB/server/commit/6175a4541ff

Sergei Petrunia added a comment - 2023-06-15 13:07

Thanks, I continue to look at the updated patch.

Sergei Petrunia added a comment - 2023-06-15 13:07 Thanks, I continue to look at the updated patch.

Sergei Petrunia added a comment - 2023-06-15 13:09

One more observation for the old patch: looking at Item_exists_subselect::exists2in_and_is_not_nulls() and its "original" code I see a discrepancy:

Old code

      for (size_t i= 0; i < eqs.elements(); i++)

        if (optimizer->arguments()[0]->maybe_null())

...

New code:

      for (uint i= offset; i < left_exp->cols(); i++)

        if (left_exp->element_index(i)->maybe_null())

I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i]

Sergei Petrunia added a comment - 2023-06-15 13:09 One more observation for the old patch: looking at Item_exists_subselect::exists2in_and_is_not_nulls() and its "original" code I see a discrepancy: Old code for ( size_t i= 0; i < eqs.elements(); i++) { if (optimizer->arguments()[0]->maybe_null()) ... New code: for (uint i= offset; i < left_exp->cols(); i++) { if (left_exp->element_index(i)->maybe_null()) I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i]

Yuchen Pei added a comment - 2023-06-16 00:52

> I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i]

Yes I think so.

Yuchen Pei added a comment - 2023-06-16 00:52 > I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i] Yes I think so.

Sergei Petrunia added a comment - 2023-06-19 09:44

Ok, I was trying to see if I could get everything finished to catch the release train and pushed these:

commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2)

Author: Sergei Petrunia <sergey@mariadb.com>

Date:   Mon Jun 19 11:27:51 2023 +0300

    Make the new tests --view-protocol friendly. Still don't pass.

commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e

Author: Sergei Petrunia <sergey@mariadb.com>

Date:   Sun Jun 18 23:38:09 2023 +0300

    Remove NULL handling of left expressions which cannot be NULL.

However this still didn't resolve all the issues and now the release train has left the station.

Sergei Petrunia added a comment - 2023-06-19 09:44 Ok, I was trying to see if I could get everything finished to catch the release train and pushed these: commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2) Author: Sergei Petrunia <sergey@mariadb.com> Date: Mon Jun 19 11:27:51 2023 +0300 Make the new tests --view-protocol friendly. Still don't pass. commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e Author: Sergei Petrunia <sergey@mariadb.com> Date: Sun Jun 18 23:38:09 2023 +0300 Remove NULL handling of left expressions which cannot be NULL. However this still didn't resolve all the issues and now the release train has left the station.

Sergei Petrunia added a comment - 2023-06-19 10:01

I've also tried to fix this part, without success:

/*

      If we do not single out this case, and merge it with the >1

      case, we would get an infinite loop in resolving

      Item_direct_view_ref::real_item() like in MDEV-3881 when

      left_exp has scalar type

*/

    else if (and_list->elements == 1)

      exp= new (thd->mem_root) Item_cond_and(thd, and_list->head(), exp);

I confirm that regular EXISTS-to-IN code hit this by accident. It handles the case of one equality in a special way, then the code path serving multiple equalities will also add multiple NOT NULL conditions even when they are redundant. If it added one condition, it would hit the same crash.

The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself.

Sergei Petrunia added a comment - 2023-06-19 10:01 I've also tried to fix this part, without success: /* If we do not single out this case, and merge it with the >1 case, we would get an infinite loop in resolving Item_direct_view_ref::real_item() like in MDEV-3881 when left_exp has scalar type */ else if (and_list->elements == 1) exp = new (thd->mem_root) Item_cond_and(thd, and_list->head(), exp ); I confirm that regular EXISTS-to-IN code hit this by accident. It handles the case of one equality in a special way, then the code path serving multiple equalities will also add multiple NOT NULL conditions even when they are redundant. If it added one condition, it would hit the same crash. The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself.

Sergei Petrunia added a comment - 2023-06-19 10:04

Ok let's return to this in the development cycle.

Sergei Petrunia added a comment - 2023-06-19 10:04 Ok let's return to this in the development cycle.

Yuchen Pei added a comment - 2023-06-20 02:22 - edited

Thanks for the updates.

commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2)
Author: Sergei Petrunia <sergey@mariadb.com>
Date: Mon Jun 19 11:27:51 2023 +0300

Make the new tests --view-protocol friendly. Still don't pass.

Ok, I didn't know the test does not work with --view-protocol, because
the new bb does not seem to have any builders running with this flag.

I wonder what caused this, because it is not every non-empty
json_extract output fails. The ones that do not fail include first PS
executions, and in the cases where outer_exp has subquery or fulltext
function, and the non-semijoin materialization.

I looked around and found the "umbrella ticket" ~~MDEV-27691~~ - do you
know which of the included tickets covers this case?

commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e
Author: Sergei Petrunia <sergey@mariadb.com>
Date: Sun Jun 18 23:38:09 2023 +0300

Remove NULL handling of left expressions which cannot be NULL.

Thanks, so that's how you fix the full-scan-on-null_key issue.

The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself.

I guess you mean substitute_for_best_equal_field(). Is there an
MDEV for this?

Ok let's return to this in the development cycle.

Ok

Yuchen Pei added a comment - 2023-06-20 02:22 - edited Thanks for the updates. commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2) Author: Sergei Petrunia <sergey@mariadb.com> Date: Mon Jun 19 11:27:51 2023 +0300 Make the new tests --view-protocol friendly. Still don't pass. Ok, I didn't know the test does not work with --view-protocol, because the new bb does not seem to have any builders running with this flag. I wonder what caused this, because it is not every non-empty json_extract output fails. The ones that do not fail include first PS executions, and in the cases where outer_exp has subquery or fulltext function, and the non-semijoin materialization. I looked around and found the "umbrella ticket" MDEV-27691 - do you know which of the included tickets covers this case? commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e Author: Sergei Petrunia <sergey@mariadb.com> Date: Sun Jun 18 23:38:09 2023 +0300 Remove NULL handling of left expressions which cannot be NULL. Thanks, so that's how you fix the full-scan-on-null_key issue. The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself. I guess you mean substitute_for_best_equal_field() . Is there an MDEV for this? Ok let's return to this in the development cycle. Ok

Yuchen Pei added a comment - 2023-06-21 00:48

Follow up on the meeting from yesterday:

view-protocol: we wait for ~~MDEV-31408~~ to be fixed before addressing
this issue
arbitrary subquery context (not just toplevel in and toplevel not
in): we will do this separately in MDEV-31510
self-referencing Item_direct_view_ref: this is the only issue
blocking the present task at this moment. I will debug it when I get
back to this task.

Yuchen Pei added a comment - 2023-06-21 00:48 Follow up on the meeting from yesterday: view-protocol: we wait for MDEV-31408 to be fixed before addressing this issue arbitrary subquery context (not just toplevel in and toplevel not in): we will do this separately in MDEV-31510 self-referencing Item_direct_view_ref : this is the only issue blocking the present task at this moment. I will debug it when I get back to this task.

Yuchen Pei added a comment - 2023-07-28 08:02 - edited

As I'm getting back to this task, it just took me a while to reprod
the Item_direct_view_ref issue, so I'll note it here: the triggered
failure is on the test for ~~MDEV-3881~~ in main.subselect_exists2in:

--echo #

--echo # MDEV-3881: Endless loop and crash in Item_ref::real_item with

--echo # exists_to_in=on, NOT EXISTS subquery, merge view or from subquery,

--echo # constant table

--echo #

SET optimizer_switch = 'exists_to_in=on';

CREATE TABLE t1 (a INT) ENGINE=MyISAM;

INSERT INTO t1 VALUES (1),(7);

CREATE TABLE t2 (b INT) ENGINE=MyISAM;

INSERT INTO t2 VALUES (8);

CREATE ALGORITHM=MERGE VIEW v1 AS SELECT * FROM t2;

CREATE TABLE t3 (c INT) ENGINE=MyISAM;

INSERT INTO t3 VALUES (4),(6);

SELECT * FROM t1, v1 WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b;

drop view v1;

drop table t1, t2, t3;

CREATE TABLE t1 (a INT) ENGINE=MyISAM;

INSERT INTO t1 VALUES (1),(7);

CREATE TABLE t2 (b INT) ENGINE=MyISAM;

INSERT INTO t2 VALUES (8);

CREATE TABLE t3 (c INT) ENGINE=MyISAM;

INSERT INTO t3 VALUES (4),(6);

SELECT * FROM t1, ( SELECT * FROM t2 ) alias WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b;

drop table t1, t2, t3;

Yuchen Pei added a comment - 2023-07-28 08:02 - edited As I'm getting back to this task, it just took me a while to reprod the Item_direct_view_ref issue, so I'll note it here: the triggered failure is on the test for MDEV-3881 in main.subselect_exists2in: --echo # --echo # MDEV-3881: Endless loop and crash in Item_ref::real_item with --echo # exists_to_in=on, NOT EXISTS subquery, merge view or from subquery, --echo # constant table --echo # SET optimizer_switch = 'exists_to_in=on' ; CREATE TABLE t1 (a INT ) ENGINE=MyISAM; INSERT INTO t1 VALUES (1),(7); CREATE TABLE t2 (b INT ) ENGINE=MyISAM; INSERT INTO t2 VALUES (8); CREATE ALGORITHM=MERGE VIEW v1 AS SELECT * FROM t2; CREATE TABLE t3 (c INT ) ENGINE=MyISAM; INSERT INTO t3 VALUES (4),(6); SELECT * FROM t1, v1 WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b; drop view v1; drop table t1, t2, t3; CREATE TABLE t1 (a INT ) ENGINE=MyISAM; INSERT INTO t1 VALUES (1),(7); CREATE TABLE t2 (b INT ) ENGINE=MyISAM; INSERT INTO t2 VALUES (8); CREATE TABLE t3 (c INT ) ENGINE=MyISAM; INSERT INTO t3 VALUES (4),(6); SELECT * FROM t1, ( SELECT * FROM t2 ) alias WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b; drop table t1, t2, t3;

Yuchen Pei added a comment - 2023-08-01 06:26

So the infinite recursion is caused by self-referencing, caused by the
transformer replace_equal_field. The goal of the transformer
Item::replace_equal_field() is to replace fields with equivalent
ones, as an optimisation. Currently two Item subclasses implement this
transformer nontrivially, Item_field and Item_direct_view_ref.
The Item_direct_view_ref implementation recurses to the
Item_field implementation, but not vice versa, which is good
because otherwise the infinite self-referencing would occur at this
stage.

In the infinite recursion scenario, we have the following chain of
walking down the item tree during the transformation, where each
vertex is a pari of item type and dbug_print output, and each edge /
arrow is annotated by the operation that goes from vertext to the
next.

(Item_direct_view_ref*) 8 -.ref-> (Item**) -deref-> (Item_field*) 8

-.item_equal-> (Item_equal*) multiple_equal(8, t1.a) -.get_const()->

(Item_direct_view_ref*) 8 -.ref-> (Item**) same as above.

The first and the last {{Item_direct_view_ref*}}s have the same value
in their ref field. By replacing the Item_field* with the
second Item_direct_view_ref* we get the self-referencing that
causes the infinite real_item() call.

There's no guarantee that if such a self-reference exists it would
have length 1 like above. So to be safe we may need to try to detect
self-reference of arbitrary length, which is what I did in [1], but
I am not sure how much this extra check affects performance.
Alternatively we can resort to a length-1 check only, and update the
fix if that proves insufficient. What do you think psergei?

For all changes squashed into one commit, see [2].

[1] https://github.com/MariaDB/server/commit/94172e01d63
[2] https://github.com/MariaDB/server/commit/0bf54aef789

Yuchen Pei added a comment - 2023-08-01 06:26 So the infinite recursion is caused by self-referencing, caused by the transformer replace_equal_field . The goal of the transformer Item::replace_equal_field() is to replace fields with equivalent ones, as an optimisation. Currently two Item subclasses implement this transformer nontrivially, Item_field and Item_direct_view_ref . The Item_direct_view_ref implementation recurses to the Item_field implementation, but not vice versa, which is good because otherwise the infinite self-referencing would occur at this stage. In the infinite recursion scenario, we have the following chain of walking down the item tree during the transformation, where each vertex is a pari of item type and dbug_print output, and each edge / arrow is annotated by the operation that goes from vertext to the next. (Item_direct_view_ref*) 8 -.ref-> (Item**) -deref-> (Item_field*) 8 -.item_equal-> (Item_equal*) multiple_equal(8, t1.a) -.get_const()-> (Item_direct_view_ref*) 8 -.ref-> (Item**) same as above. The first and the last {{Item_direct_view_ref*}}s have the same value in their ref field. By replacing the Item_field* with the second Item_direct_view_ref* we get the self-referencing that causes the infinite real_item() call. There's no guarantee that if such a self-reference exists it would have length 1 like above. So to be safe we may need to try to detect self-reference of arbitrary length, which is what I did in [1] , but I am not sure how much this extra check affects performance. Alternatively we can resort to a length-1 check only, and update the fix if that proves insufficient. What do you think psergei ? For all changes squashed into one commit, see [2] . [1] https://github.com/MariaDB/server/commit/94172e01d63 [2] https://github.com/MariaDB/server/commit/0bf54aef789

Yuchen Pei added a comment - 2023-08-02 01:31

Note from the meeting yesterday:

We will simplify the MDEV-31269 workaround condition to simply
skipping on !thd->stmt_arena->is_conventional()
I will investigate further the cause of the self-referencing
Item_direct_view_ref

Yuchen Pei added a comment - 2023-08-02 01:31 Note from the meeting yesterday: We will simplify the MDEV-31269 workaround condition to simply skipping on !thd->stmt_arena->is_conventional() I will investigate further the cause of the self-referencing Item_direct_view_ref

Yuchen Pei added a comment - 2023-08-03 05:46 - edited

OK, here's another fix (h/t Johnston for discussions):

https://github.com/MariaDB/server/commit/c40d4706010

(squashed version: https://github.com/MariaDB/server/commit/cabd6b79922
which also includes the simplified workaround for MDEV-31269
mentioned in the previous comment.)

At this moment a major issue in fixing the infinite recusion issue is
the meaning of the Item_field::equal_field and its twin
Item_direct_view_ref::equal_field. A natural guess of the meaning
of Item_field::equal_field may be "a pointer to the multiple
equality the field belongs to", but this guess breaks apart at the
following function, where the Item_direct_view_ref steals the
equal_field from the Item_field it points to:

Item *Item_direct_view_ref::propagate_equal_fields(THD *thd,

                                                   const Context &ctx,

                                                   COND_EQUAL *cond)

  Item *field_item= real_item();

  if (field_item->type() != FIELD_ITEM)

    return this;

  Item *item= field_item->propagate_equal_fields(thd, ctx, cond);

  set_item_equal(field_item->get_item_equal());

  field_item->set_item_equal(NULL);

  if (item != field_item)

    return item;

  return this;

So the assumption can be adjusted to "let x be the multiple equality
an Item_field belongs to. If the Item_field is not referenced by an
Item_direct_view_ref, then its equal_field points to x, otherwise its
equal_field is NULL and the equal_field of Item_direct_view_ref points
to x". And this second guess seems to make sense in light of the
following function, where the Item_direct_view_ref temporarily
lends its own item_equal to the real_item(), before restoring it to
NULL.

Item *Item_direct_view_ref::replace_equal_field(THD *thd, uchar *arg)

  Item *field_item= real_item();

  if (field_item->type() != FIELD_ITEM)

    return this;

  field_item->set_item_equal(item_equal);

  Item *item= field_item->replace_equal_field(thd, arg);

  field_item->set_item_equal(0);

  return item != field_item ? item : this;

So the above fix patches Item_ref::propagate_equal_fields() so
that it delegates to the propagate_equal_fields() of the item it
references, so that when it references to an Item_direct_view_ref,
it will let the latter set the correct equal_field.

psergei can you comment on this assumption and/or the new fix?

Yuchen Pei added a comment - 2023-08-03 05:46 - edited OK, here's another fix (h/t Johnston for discussions): https://github.com/MariaDB/server/commit/c40d4706010 (squashed version: https://github.com/MariaDB/server/commit/cabd6b79922 which also includes the simplified workaround for MDEV-31269 mentioned in the previous comment.) At this moment a major issue in fixing the infinite recusion issue is the meaning of the Item_field::equal_field and its twin Item_direct_view_ref::equal_field . A natural guess of the meaning of Item_field::equal_field may be "a pointer to the multiple equality the field belongs to", but this guess breaks apart at the following function, where the Item_direct_view_ref steals the equal_field from the Item_field it points to: Item *Item_direct_view_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { Item *field_item= real_item(); if (field_item->type() != FIELD_ITEM) return this ; Item *item= field_item->propagate_equal_fields(thd, ctx, cond); set_item_equal(field_item->get_item_equal()); field_item->set_item_equal(NULL); if (item != field_item) return item; return this ; } So the assumption can be adjusted to "let x be the multiple equality an Item_field belongs to. If the Item_field is not referenced by an Item_direct_view_ref, then its equal_field points to x, otherwise its equal_field is NULL and the equal_field of Item_direct_view_ref points to x". And this second guess seems to make sense in light of the following function, where the Item_direct_view_ref temporarily lends its own item_equal to the real_item(), before restoring it to NULL. Item *Item_direct_view_ref::replace_equal_field(THD *thd, uchar *arg) { Item *field_item= real_item(); if (field_item->type() != FIELD_ITEM) return this ; field_item->set_item_equal(item_equal); Item *item= field_item->replace_equal_field(thd, arg); field_item->set_item_equal(0); return item != field_item ? item : this ; } So the above fix patches Item_ref::propagate_equal_fields() so that it delegates to the propagate_equal_fields() of the item it references, so that when it references to an Item_direct_view_ref , it will let the latter set the correct equal_field . psergei can you comment on this assumption and/or the new fix?

Sergei Petrunia added a comment - 2023-08-16 19:00

So the assumption can be adjusted to "let x be the multiple equality
an Item_field belongs to. If the Item_field is not referenced by an
Item_direct_view_ref, then its equal_field points to x, otherwise its
equal_field is NULL and the equal_field of Item_direct_view_ref points
to x".

Having studied the topic a little, this suggestion makes sense to me, too.
Let me continue to study the patch.

Sergei Petrunia added a comment - 2023-08-16 19:00 So the assumption can be adjusted to "let x be the multiple equality an Item_field belongs to. If the Item_field is not referenced by an Item_direct_view_ref, then its equal_field points to x, otherwise its equal_field is NULL and the equal_field of Item_direct_view_ref points to x". Having studied the topic a little, this suggestion makes sense to me, too. Let me continue to study the patch.

Sergei Petrunia added a comment - 2023-08-17 11:11 - edited

ycp, the patch changes one piece of cryptic code in Item_ref::propagate_equal_fields for another.
It's not clear what the new code does.

Looking at the new code:

Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,

                                       COND_EQUAL *cond)

  Item *derefed = *ref;

  if (derefed->type() != REF_ITEM)

    derefed= real_item();

Suppose *ref points something that's not an Item_ref or its descendant.
Then, derefed->type() != REF_ITEM, if branch is taken.

But when *ref is not an Item_ref or its descendant, real_item() will return *ref.
What is the point of this line then:

    derefed= real_item();

?
(I'll continue looking at this)

Sergei Petrunia added a comment - 2023-08-17 11:11 - edited ycp , the patch changes one piece of cryptic code in Item_ref::propagate_equal_fields for another. It's not clear what the new code does. Looking at the new code: Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { Item *derefed = * ref ; if (derefed->type() != REF_ITEM) { derefed= real_item(); Suppose *ref points something that's not an Item_ref or its descendant. Then, derefed->type() != REF_ITEM , if branch is taken. But when *ref is not an Item_ref or its descendant, real_item() will return *ref . What is the point of this line then: derefed= real_item(); ? (I'll continue looking at this)

Sergei Petrunia added a comment - 2023-08-17 19:25 - edited

Experimenting, I've made this change (to the latest code with ycp's changes) and it didn't have any effect on mtr :

diff --git a/sql/item.cc b/sql/item.cc

index fa05a90c011..d7615521fc6 100644

--- a/sql/item.cc

+++ b/sql/item.cc

@@ -9416,12 +9416,14 @@ Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,

                                        COND_EQUAL *cond)

   Item *derefed = *ref;

+#if 0

   if (derefed->type() != REF_ITEM)

     derefed= real_item();

     if (derefed->type() != FIELD_ITEM)

       return this;

+#endif

   Item *item= derefed->propagate_equal_fields(thd, ctx, cond);

   if (item != derefed)

     return item;

I don't yet understand the purpose of the code that's #ifdef-ed away...

Sergei Petrunia added a comment - 2023-08-17 19:25 - edited Experimenting, I've made this change (to the latest code with ycp 's changes) and it didn't have any effect on mtr : diff --git a/sql/item.cc b/sql/item.cc index fa05a90c011..d7615521fc6 100644 --- a/sql/item.cc +++ b/sql/item.cc @@ -9416,12 +9416,14 @@ Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { Item *derefed = *ref; +#if 0 if (derefed->type() != REF_ITEM) { derefed= real_item(); if (derefed->type() != FIELD_ITEM) return this; } +#endif Item *item= derefed->propagate_equal_fields(thd, ctx, cond); if (item != derefed) return item; I don't yet understand the purpose of the code that's #ifdef-ed away...

Sergei Petrunia added a comment - 2023-08-18 10:26

Looking at the code in Item_ref::propagate_equal_fields():

Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,

                                       COND_EQUAL *cond)

...

  Item *item= field_item->propagate_equal_fields(thd, ctx, cond);

  if (item != field_item)

    return item;  // (POINT1)

  return this;

Suppose the object this is called for is an Item_ref (not Item_direct_ref).

Item_ref objects are used to "wrap" items when they are used in the post-grouping context and one needs to use result calls to get the value like it is done for example in Item_ref::val_int:

longlong Item_ref::val_int()

...

  longlong tmp=(*ref)->val_int_result();

  return tmp;

But if Item_ref::propagate_equal_fields() substitutes itself for the item it has got from (*ref), as it does on he line marked with (POINT1)? Conversion from val() calls to val*result() calls won't be done anymore...

Sergei Petrunia added a comment - 2023-08-18 10:26 Looking at the code in Item_ref::propagate_equal_fields(): Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { ... Item *item= field_item->propagate_equal_fields(thd, ctx, cond); if (item != field_item) return item; // (POINT1) return this ; } Suppose the object this is called for is an Item_ref (not Item_direct_ref). Item_ref objects are used to "wrap" items when they are used in the post-grouping context and one needs to use result calls to get the value like it is done for example in Item_ref::val_int: longlong Item_ref::val_int() { ... longlong tmp=(* ref )->val_int_result(); return tmp; } But if Item_ref::propagate_equal_fields() substitutes itself for the item it has got from (*ref), as it does on he line marked with (POINT1)? Conversion from val() calls to val*result() calls won't be done anymore...

Sergei Petrunia added a comment - 2023-08-21 09:48

.. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution

  if (item != field_item)

    return item;  // (POINT1)

is only taken when the item is a constant item (for constants, val() and val_result() are the same).

Sergei Petrunia added a comment - 2023-08-21 09:48 .. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution if (item != field_item) return item; // (POINT1) is only taken when the item is a constant item (for constants, val() and val_result() are the same).

Yuchen Pei added a comment - 2023-08-22 01:52 - edited

> But when *ref is not an Item_ref or its descendant, real_item() will return *ref.

I didn't know this - is it documented somewhere?

In any case, with this assertion, my changed version is equivalent to

Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,

                                       COND_EQUAL *cond)

  Item *derefed = *ref;

  if (derefed->type() != REF_ITEM && derefed->type() != FIELD_ITEM)

    return this;

  Item *item= derefed->propagate_equal_fields(thd, ctx, cond);

  if (item != derefed)

    return item;

  return this;

which I include in an updated commit at
https://github.com/MariaDB/server/commit/3eeab3c4508.

psergei I wonder whether this version is still cryptic? My
explanation is that it adds inside
Item_ref::propagate_equal_fields() a handling of the case where
*ref points to another Item_ref, by delegating to the
propagate_equal_fields() of the referenced Item_ref, so that
the result is consistent for the latter. By consistency I mean for
example these two cases:

1. Item_ref *ref> Item_direct_view_ref *ref> Item_field
2. Item_func *args> Item_direct_view_ref *ref> Item_field

Both should result in the same item_equal for the
Item_direct_view_ref.

> Experimenting, I've made this change (to the latest code with ycp's changes) and it didn't have any effect on mtr :

Given the above, does this mean that the ref of an Item_ref
can only point to either a REF_ITEM or a FIELD_ITEM? Perhaps
this is another assertion I'm not aware of?

> .. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution ... is only taken when the item is a constant item (for constants, val() and val_result() are the same).

This sounds right, as Item_field::propagate_equal_fields() only
returns something new if the new thing is the result of
field->get_equal_const_item(thd, ctx, item). Just to validate
this assumption, I added assertions in
Item_ref::propagate_equal_fields() and
Item_direct_view_ref::propagate_equal_fields() on top of my
commit[1]:

if (item == derefed)

  item= this;

DBUG_ASSERT(item->type() == REF_ITEM || item->const_item());

return item;

[1] https://github.com/MariaDB/server/commit/ca390e57642

All main.subselect* tests still pass. Shall we add this assertion?

The squashed version of all these changes, including the assertion
addition is at https://github.com/MariaDB/server/commit/4830c70cdb8

Yuchen Pei added a comment - 2023-08-22 01:52 - edited > But when *ref is not an Item_ref or its descendant, real_item() will return *ref . I didn't know this - is it documented somewhere? In any case, with this assertion, my changed version is equivalent to Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { Item *derefed = * ref ; if (derefed->type() != REF_ITEM && derefed->type() != FIELD_ITEM) return this ; Item *item= derefed->propagate_equal_fields(thd, ctx, cond); if (item != derefed) return item; return this ; } which I include in an updated commit at https://github.com/MariaDB/server/commit/3eeab3c4508 . psergei I wonder whether this version is still cryptic? My explanation is that it adds inside Item_ref::propagate_equal_fields() a handling of the case where *ref points to another Item_ref , by delegating to the propagate_equal_fields() of the referenced Item_ref , so that the result is consistent for the latter. By consistency I mean for example these two cases: 1. Item_ref *ref > Item_direct_view_ref *ref > Item_field 2. Item_func *args > Item_direct_view_ref *ref > Item_field Both should result in the same item_equal for the Item_direct_view_ref . > Experimenting, I've made this change (to the latest code with ycp 's changes) and it didn't have any effect on mtr : Given the above, does this mean that the ref of an Item_ref can only point to either a REF_ITEM or a FIELD_ITEM ? Perhaps this is another assertion I'm not aware of? > .. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution ... is only taken when the item is a constant item (for constants, val() and val_result() are the same). This sounds right, as Item_field::propagate_equal_fields() only returns something new if the new thing is the result of field->get_equal_const_item(thd, ctx, item) . Just to validate this assumption, I added assertions in Item_ref::propagate_equal_fields() and Item_direct_view_ref::propagate_equal_fields() on top of my commit [1] : if (item == derefed) item= this ; DBUG_ASSERT(item->type() == REF_ITEM || item->const_item()); return item; [1] https://github.com/MariaDB/server/commit/ca390e57642 All main.subselect* tests still pass. Shall we add this assertion? The squashed version of all these changes, including the assertion addition is at https://github.com/MariaDB/server/commit/4830c70cdb8

Yuchen Pei added a comment - 2023-08-22 03:36 - edited

BTW, the simplification of the MDEV-31269 workaround[1] also causes
test failures with --ps-protocol[2]:

[amd64-ubuntu-2204-debug-ps | failed test (failure)]

json.json_table

main.explain_json

main.group_min_max

main.index_merge_myisam

main.innodb_ext_key

main.innodb_ext_key

main.innodb_ext_key

main.innodb_ext_key

main.multi_update

main.opt_trace

main.subselect

main.subselect4

main.subselect_decorrelate_in

main.subselect_exists2in

main.subselect_exists2in_costmat

main.subselect_extra_no_semijoin

main.subselect_mat

main.subselect_mat_cost_bugs

main.subselect_sj2_mat

[1] https://github.com/MariaDB/server/commit/ffc61fb039e
[2] https://buildbot.mariadb.org/#/builders/534/builds/8770

When I revert the change [1] above all these tests pass.

psergei, shall we update these tests with --disable-ps-protocol and
--enable-ps-protocol, or shall we revert to the more expensive
workaround?

Yuchen Pei added a comment - 2023-08-22 03:36 - edited BTW, the simplification of the MDEV-31269 workaround [1] also causes test failures with --ps-protocol [2] : [amd64-ubuntu-2204-debug-ps | failed test (failure)] json.json_table main.explain_json main.group_min_max main.index_merge_myisam main.innodb_ext_key main.innodb_ext_key main.innodb_ext_key main.innodb_ext_key main.multi_update main.opt_trace main.subselect main.subselect4 main.subselect_decorrelate_in main.subselect_exists2in main.subselect_exists2in_costmat main.subselect_extra_no_semijoin main.subselect_mat main.subselect_mat_cost_bugs main.subselect_sj2_mat [1] https://github.com/MariaDB/server/commit/ffc61fb039e [2] https://buildbot.mariadb.org/#/builders/534/builds/8770 When I revert the change [1] above all these tests pass. psergei , shall we update these tests with --disable-ps-protocol and --enable-ps-protocol , or shall we revert to the more expensive workaround?

Lena Startseva added a comment - 2023-08-30 10:31

ycp, possible problem is in the 2nd execution of prepare statement (now ps-protocol execute prepare statement twice and returns only the results of the second execution ). You can disable the second execution in testcases with "--disable-ps2-protocol"

Lena Startseva added a comment - 2023-08-30 10:31 ycp , possible problem is in the 2nd execution of prepare statement (now ps-protocol execute prepare statement twice and returns only the results of the second execution ). You can disable the second execution in testcases with "--disable-ps2-protocol"

Yuchen Pei added a comment - 2023-08-31 00:10 - edited

Thanks lstartseva, are the problem and workaround you mentioned
concerning the failures with testing with --ps-protocol as mentioned
in my previous comment? If so, following up on the optimizer
meeting, psergei suggested that we make the skip condition a bit
more strict, by also checking for ref items, to avoid the failures
with --ps-protocol.

And here's the updated commit. It checks if either side of any of
the chosen equalities is a ref item, and skip the transformation if
so: https://github.com/MariaDB/server/commit/b2b0d16b68d

As a result, I had to do a --disable_ps_protocol and
--enable_ps_protocol in two explain extended cases in
main.subselect_exists2in.

Yuchen Pei added a comment - 2023-08-31 00:10 - edited Thanks lstartseva , are the problem and workaround you mentioned concerning the failures with testing with --ps-protocol as mentioned in my previous comment? If so, following up on the optimizer meeting, psergei suggested that we make the skip condition a bit more strict, by also checking for ref items, to avoid the failures with --ps-protocol. And here's the updated commit. It checks if either side of any of the chosen equalities is a ref item, and skip the transformation if so: https://github.com/MariaDB/server/commit/b2b0d16b68d As a result, I had to do a --disable_ps_protocol and --enable_ps_protocol in two explain extended cases in main.subselect_exists2in .

Sergei Petrunia added a comment - 2023-09-04 09:55

A self-contained explanation of how we get a loop of Item_direct_view_ref objects:

Testcase

CREATE TABLE t1 ( a INT );

INSERT INTO t1 VALUES (1),(5);

CREATE TABLE t2 ( b INT ) ENGINE=MyISAM;

INSERT INTO t2 VALUES (1);

CREATE TABLE t3 ( c INT );

INSERT INTO t3 VALUES (4),(5);

SELECT *

FROM

t1,

   ( SELECT * FROM t2 ) alias

WHERE

   NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b;

Here, NOT EXISTS is converted into NOT IN. This rewrite also adds left_expresssion IS NOT NULL so the NOT EXISTS becomes:

  "!(<in_optimizer>(t2.b,<exists>(subquery#3)) and t2.b is not null)"

The "t2.b is not null" has this structure:

   Item_func_is_not_null(

     Item_direct_ref(    // denote this as $ITEM_DIRECT_REF

       Item_direct_view_ref(

         Item_field("t2.b") //  // denote this $IT_FLD

The top-level WHERE clause is used to construct Item_equal which includes

Item_field(t1.a)
Item_direct_view_ref(Item_field(alias.b)) // denote ITEM_DIRECT_REF_2

Then, Item_ref::propagate_equal_fields() is called for $ITEM_DIRECT_REF object

The code does this (comments are mine):

  // This will assign field_item = $IT_FLD

  Item *field_item= real_item();

  // This will make $IT_FLD->item_equal point to multiple equality.

  // This is already wrong as there can be multiple Item_direct_view_ref

  //  objects in different contexts which refer to the same Item_field object.

  //  See commit comment for 8d9dd21d for details.

  Item *item= field_item->propagate_equal_fields(thd, ctx, cond);

Then, Item_equal::update_const() sets another the $ITEM_DIRECT_REF_2 Item_direct_view_ref object to be the "the const item".

Then, substitute_for_best_equal_field() calls

Item_ref::transform(..., &Item::replace_equal_field)

for $ITEM_DIRECT_REF object.

It eventually calls:

  Item *new_item= (*ref)->transform(thd, transformer, arg);

Here, *ref points to $IT_FLD Item_field.
new_item is now $ITEM_DIRECT_REF_2, the constant Item_direct_view_ref.
Then, we do this:

  if (*ref != new_item)

    thd->change_item_tree(ref, new_item);

And this creates a loop because "ref" is the VIEW's field translation entry. Which now points to an Item_direct_view_ref which has ref pointing to the VIEW's field translation...

Sergei Petrunia added a comment - 2023-09-04 09:55 A self-contained explanation of how we get a loop of Item_direct_view_ref objects: Testcase CREATE TABLE t1 ( a INT ); INSERT INTO t1 VALUES (1),(5); CREATE TABLE t2 ( b INT ) ENGINE=MyISAM; INSERT INTO t2 VALUES (1); CREATE TABLE t3 ( c INT ); INSERT INTO t3 VALUES (4),(5); SELECT * FROM t1, ( SELECT * FROM t2 ) alias WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b; Here, NOT EXISTS is converted into NOT IN. This rewrite also adds left_expresssion IS NOT NULL so the NOT EXISTS becomes: "!(<in_optimizer>(t2.b,<exists>(subquery#3)) and t2.b is not null)" The "t2.b is not null" has this structure: Item_func_is_not_null( Item_direct_ref( // denote this as $ITEM_DIRECT_REF Item_direct_view_ref( Item_field("t2.b") // // denote this $IT_FLD ) ) The top-level WHERE clause is used to construct Item_equal which includes Item_field(t1.a) Item_direct_view_ref(Item_field(alias.b)) // denote ITEM_DIRECT_REF_2 Then, Item_ref::propagate_equal_fields() is called for $ITEM_DIRECT_REF object The code does this (comments are mine): // This will assign field_item = $IT_FLD Item *field_item= real_item(); // This will make $IT_FLD->item_equal point to multiple equality. // This is already wrong as there can be multiple Item_direct_view_ref // objects in different contexts which refer to the same Item_field object. // See commit comment for 8d9dd21d for details. Item *item= field_item->propagate_equal_fields(thd, ctx, cond); Then, Item_equal::update_const() sets another the $ITEM_DIRECT_REF_2 Item_direct_view_ref object to be the "the const item". Then, substitute_for_best_equal_field() calls Item_ref::transform(..., &Item::replace_equal_field) for $ITEM_DIRECT_REF object. It eventually calls: Item *new_item= (*ref)->transform(thd, transformer, arg); Here, *ref points to $IT_FLD Item_field. new_item is now $ITEM_DIRECT_REF_2, the constant Item_direct_view_ref. Then, we do this: if (*ref != new_item) thd->change_item_tree(ref, new_item); And this creates a loop because "ref" is the VIEW's field translation entry. Which now points to an Item_direct_view_ref which has ref pointing to the VIEW's field translation...

Yuchen Pei added a comment - 2023-09-08 05:35 - edited

Some comments on 2nd ps segfault workarounds.

trade-offs:

the more cases it skips the transformation, the more false
positives there are, and the more ps-protocols it will fail
the less cases it skips the transformation, the more false
negatives there are, and the more likely the 2nd ps segfault will
occur

tried workarounds:
1. skip if not conventional: too many false positives,
https://github.com/MariaDB/server/commit/ffc61fb039e
2. skip if not conventional and either side is ref: too many false
positives, https://github.com/MariaDB/server/commit/b2b0d16b68d
3. skip if not conventional and any item in the trees is ref
(mentioned and tried by psergei): too many false positives
4. skip if not conventional and either side intersects free list:
seems to work, less work than "skip if any item in the trees
intersects free list", but more work,
https://github.com/MariaDB/server/commit/28de41f9b9b
5. skip if not conventional and any item in the trees intersects free
list: the original check, seems to be the most accurate, but also
requiring the most work, with the apparent downside of performance,
see e.g. https://github.com/MariaDB/server/commit/a87f22770b for a
implementation

In terms of time required to check: 1 < 2 < 3,4 < 5.

In terms of accuracy, only 4 and 5 do not cause massive ps-protocol
failures.

Yuchen Pei added a comment - 2023-09-08 05:35 - edited Some comments on 2nd ps segfault workarounds. trade-offs: the more cases it skips the transformation, the more false positives there are, and the more ps-protocols it will fail the less cases it skips the transformation, the more false negatives there are, and the more likely the 2nd ps segfault will occur tried workarounds: 1. skip if not conventional: too many false positives, https://github.com/MariaDB/server/commit/ffc61fb039e 2. skip if not conventional and either side is ref: too many false positives, https://github.com/MariaDB/server/commit/b2b0d16b68d 3. skip if not conventional and any item in the trees is ref (mentioned and tried by psergei ): too many false positives 4. skip if not conventional and either side intersects free list: seems to work, less work than "skip if any item in the trees intersects free list", but more work, https://github.com/MariaDB/server/commit/28de41f9b9b 5. skip if not conventional and any item in the trees intersects free list: the original check, seems to be the most accurate, but also requiring the most work, with the apparent downside of performance, see e.g. https://github.com/MariaDB/server/commit/a87f22770b for a implementation In terms of time required to check: 1 < 2 < 3,4 < 5. In terms of accuracy, only 4 and 5 do not cause massive ps-protocol failures.

Sergei Petrunia added a comment - 2023-09-08 07:52 - edited

Status update from the calls this week:

Igor seems to be fine with "A self-contained explanation of how we get a loop of Item_direct_view_ref" above.
- The bug is apparently in the old (pre-MDEV-22534) code. Can one construct a testcase for it?
- SergeiP: I had no success in constructing such testcase so far, perhaps it's because Item_direct_ref objects are used in a limited number of scenarios in the server.

Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the same bug? what happens if there are nested views? SergeiP: No it doesn't have the same bug. I think the code works correctly there. Can't explain off the top of the head.

Igor: the workarounds for MDEV-31269 are bad, I have a patch for it here: https://github.com/MariaDB/server/commit/34083cf34bc26a5cd95e56feffd01466f9f4917f . It's targeted at 10.4, has no description yet and is large.
Igor suggested that ~~MDEV-22434~~ code is rebased on top of that commit (but didn't clarify what for?)
- Discussed this with Serg, no outcome so far.

Sergei Petrunia added a comment - 2023-09-08 07:52 - edited Status update from the calls this week: Igor seems to be fine with "A self-contained explanation of how we get a loop of Item_direct_view_ref" above. The bug is apparently in the old (pre- MDEV-22534 ) code. Can one construct a testcase for it? SergeiP: I had no success in constructing such testcase so far, perhaps it's because Item_direct_ref objects are used in a limited number of scenarios in the server. Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the same bug? what happens if there are nested views? SergeiP: No it doesn't have the same bug. I think the code works correctly there. Can't explain off the top of the head. Igor: the workarounds for MDEV-31269 are bad, I have a patch for it here: https://github.com/MariaDB/server/commit/34083cf34bc26a5cd95e56feffd01466f9f4917f . It's targeted at 10.4, has no description yet and is large. Igor suggested that MDEV-22434 code is rebased on top of that commit (but didn't clarify what for?) Discussed this with Serg, no outcome so far.

Yuchen Pei added a comment - 2023-09-11 01:14 - edited

This is a reply to a message from igor on slack about the
relation between this task and ~~MDEV-3881~~. Messages in commits to
MDEV-22534 reference ~~MDEV-3881~~ because the issues seen in ~~MDEV-3881~~
are encountered in MDEV-22534. They have the same symptom, but
nobody can say for sure whether it is the same issue caused by the
same problem because ~~MDEV-3881~~ is so ancient. Now let's assume it is
the same.

Looking at the fix for ~~MDEV-3881~~, I cannot verify it myself whether
the commit did fix it. First, it is not even clear which commit was
meant as a fix for ~~MDEV-3881~~. I suspect it is
e3ac306157ab9ade137c9afc9fff270a2f50d7ec[0] because that is where
the ~~MDEV-3881~~ test case was added. But as you can see, it was also
the commit that introduced the exists-to-in transformation. The
bug relied on exists-to-in to manifest, so in this case, it is not
even possible to determine which version of the feature that caused
the bug, let alone studying the fix. Second, it was a commit from 10
years ago, and even the author could not recall what was the fix,
see the thread[1].

On the face value, my suspicion is that the bug was not fixed, and
we have just been relying on a workaround (see below) to avoid it.
Or maybe it was fixed then, but some new changes over the last 10
years caused a regression (e.g. the Gallina commit mentioned by
psergei). In any case, I don't think archaeology helps fix this
bug. In the current codebase, the exists-to-in code relies on a
(perhaps unintentional) workaround to avoid the bug, which is, when
the subselect is a toplevel NOT, we need to AND the new IN subquery
with IS NOT NULLs because 3-value logic is funny[2]. When doing so,
the exists2in processor constructs the AND item in different orders
depending on the number of picked-out equalities. If there is only
one, then the AND item has the IS NOT NULL item as the first
argument. If there is more than one, then the AND item has the IN
subquery as the first argument. There's no reason the two cases
should be treated differently, and the infinite loop happens when we
try to unify the two cases and always have the IN subquery as the
first argument, see for example [1]. Note that this was already the
case in the initial exists-to-in commit[0]. I also made a minimal
change that has nothing to do with the main task in MDEV-22534 to
reproduce the issue[3], just run the test
main.subselect_exists2in that contains the ~~MDEV-3881~~ test. We
can keep the workaround in the new transformation introduced in this
ticket, which is the case in earlier patches, but I was told by
psergei and monty that it is generally not a good idea have a
workaround like this unless there is a compelling reason.

By the way, the commit [3] (and [5] for a 10.4 version) below also
answers the following question in the previous comment. The testcase
is main.subselect_exists2in, more specifically the ~~MDEV-3881~~
part in that test, also quoted in comment[4] above.

> ** The bug is apparently in the old (pre-MDEV-22534) code. Can one
> construct a testcase for it?

As to the question why swapping the orders matter, here's an
explanation, using the self-contained example given by psergei
above.

Exists2in transforms NOT EXISTS ( SELECT * FROM t3 WHERE c = b )
to

!(Item_in_optimizer and Item_func_is_not_null) # patched version

or

!(Item_func_is_not_null and Item_in_optimizer) # original version

where we have the following structures for the two items in the
conjunction

   Item_func_is_not_null(

     Item_direct_ref(    // denote this r

       Item_direct_view_ref( // denote this vr

         Item_field("t2.b") // denote this fld

and

   Item_in_optimizer(

     Item_direct_view_ref( // same vr

       Item_field("t2.b") // same fld

),

...

In Item_cond::propagate_equal_fields() for the conjunction, it
iterates over the two items and call propagate_equal_fields() on
them one by one, and whatever side effects happen during these
calls, the last call "wins". So in the patched version,
r->propagate_equal_fields() is called after and we end up with fld
rather than vr participating in the multiple equality which is
wrong. In the original version, vr->propagate_equal_fields() is
called after and we end up with vr participating which is correct.
If we can make it always correct as we do in patches in this ticket,
then we don't have to worry about the order.

[0] https://github.com/MariaDB/server/commit/e3ac306157ab9ade137c9afc9fff270a2f50d7ec
[1] https://mariadb.slack.com/archives/C021E77G7K2/p1690952975445929
[2]
https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/
[3] https://github.com/MariaDB/server/commit/02da5af0aa1
[4]
https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=265762&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-265762
[5] https://github.com/MariaDB/server/commit/8c0835459aa

Yuchen Pei added a comment - 2023-09-11 01:14 - edited This is a reply to a message from igor on slack about the relation between this task and MDEV-3881 . Messages in commits to MDEV-22534 reference MDEV-3881 because the issues seen in MDEV-3881 are encountered in MDEV-22534 . They have the same symptom, but nobody can say for sure whether it is the same issue caused by the same problem because MDEV-3881 is so ancient. Now let's assume it is the same. Looking at the fix for MDEV-3881 , I cannot verify it myself whether the commit did fix it. First, it is not even clear which commit was meant as a fix for MDEV-3881 . I suspect it is e3ac306157ab9ade137c9afc9fff270a2f50d7ec [0] because that is where the MDEV-3881 test case was added. But as you can see, it was also the commit that introduced the exists-to-in transformation. The bug relied on exists-to-in to manifest, so in this case, it is not even possible to determine which version of the feature that caused the bug, let alone studying the fix. Second, it was a commit from 10 years ago, and even the author could not recall what was the fix, see the thread [1] . On the face value, my suspicion is that the bug was not fixed, and we have just been relying on a workaround (see below) to avoid it. Or maybe it was fixed then, but some new changes over the last 10 years caused a regression (e.g. the Gallina commit mentioned by psergei ). In any case, I don't think archaeology helps fix this bug. In the current codebase, the exists-to-in code relies on a (perhaps unintentional) workaround to avoid the bug, which is, when the subselect is a toplevel NOT, we need to AND the new IN subquery with IS NOT NULLs because 3-value logic is funny [2] . When doing so, the exists2in processor constructs the AND item in different orders depending on the number of picked-out equalities. If there is only one, then the AND item has the IS NOT NULL item as the first argument. If there is more than one, then the AND item has the IN subquery as the first argument. There's no reason the two cases should be treated differently, and the infinite loop happens when we try to unify the two cases and always have the IN subquery as the first argument, see for example [1] . Note that this was already the case in the initial exists-to-in commit [0] . I also made a minimal change that has nothing to do with the main task in MDEV-22534 to reproduce the issue [3] , just run the test main.subselect_exists2in that contains the MDEV-3881 test. We can keep the workaround in the new transformation introduced in this ticket, which is the case in earlier patches, but I was told by psergei and monty that it is generally not a good idea have a workaround like this unless there is a compelling reason. By the way, the commit [3] (and [5] for a 10.4 version) below also answers the following question in the previous comment. The testcase is main.subselect_exists2in , more specifically the MDEV-3881 part in that test, also quoted in comment [4] above. > ** The bug is apparently in the old (pre- MDEV-22534 ) code. Can one > construct a testcase for it? As to the question why swapping the orders matter, here's an explanation, using the self-contained example given by psergei above. Exists2in transforms NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) to !(Item_in_optimizer and Item_func_is_not_null) # patched version or !(Item_func_is_not_null and Item_in_optimizer) # original version where we have the following structures for the two items in the conjunction Item_func_is_not_null( Item_direct_ref( // denote this r Item_direct_view_ref( // denote this vr Item_field("t2.b") // denote this fld ) ) and Item_in_optimizer( Item_direct_view_ref( // same vr Item_field("t2.b") // same fld ), ... ) In Item_cond::propagate_equal_fields() for the conjunction, it iterates over the two items and call propagate_equal_fields() on them one by one, and whatever side effects happen during these calls, the last call "wins". So in the patched version, r->propagate_equal_fields() is called after and we end up with fld rather than vr participating in the multiple equality which is wrong. In the original version, vr->propagate_equal_fields() is called after and we end up with vr participating which is correct. If we can make it always correct as we do in patches in this ticket, then we don't have to worry about the order. [0] https://github.com/MariaDB/server/commit/e3ac306157ab9ade137c9afc9fff270a2f50d7ec [1] https://mariadb.slack.com/archives/C021E77G7K2/p1690952975445929 [2] https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/ [3] https://github.com/MariaDB/server/commit/02da5af0aa1 [4] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=265762&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-265762 [5] https://github.com/MariaDB/server/commit/8c0835459aa

Yuchen Pei added a comment - 2023-09-11 06:22 - edited

After digesting psergei's comment above, and discussions with him,
here is an explanation why not allowing Item_fields to participate in
multiple equalities when they are referenced by Item_direct_view_ref
fixes the self-reference bug (provided some assumption mentioned at the
end of this comment holds).

The job of Item::propagate_equal_fields() is to assign a multiple
equality to the item, or make it so field-like items participate in
the correct multiple equalities. Here we say an item i participates in
multiple equality e if i->item_equal == e, and we say i does not
participate in any multiple equality if i->item_equal == NULL.

We say an Item_direct_view_ref vr points to a Item_field
field if vr->real_item() == field. Multiple
Item_direct_view_ref's in different contexts may point to the same
Item_field. To avoid joining disjoint equalities, when an
Item_field is pointed to by an Item_direct_view_ref, the
former should no longer participate in a multiple equality, but the
latter should. For example, consider {{alias.a = t.col1 OR alias.a =
t.col2}}. If the field underlying alias.a participates in both
multiple equalities, the two multiple equalities will join into one
with three elements all equal to each other, which is a logical error.
(It would be nice to have a test case for this, though.) However, this
is not the case in the codebase, the violation of this rule causes the
bug, and the rectification fixes the bug provided certain assumptions
hold.

The job of Item::replace_equal_field() is to replace a field-like item
by an equal field-like item. When the Item is an Item_field, it
replaces itself by the const or first item of the multiple equality it
participates, if any and if it agrees with cond_equal. When the Item i
is an Item_ref, including Item_direct_view_ref, in
Item_ref::transform(.., replace_equal_field, ..), it calls the
transformer on the item it derefs to (i.e. *i->ref).

  Item *new_item= (*ref)->transform(thd, transformer, arg); /* 1 */

And if the transformer returns a new item that is different from
*i->ref, we make i deref to this new item.

  if (*ref != new_item)

    thd->change_item_tree(ref, new_item); /* 2 */

After that we replace call replace_equal_field on i itself.

  return (this->*transformer)(thd, arg); /* 3 */

So when we have Item_direct_view_ref vr1 that derefs to an Item_field
field, i.e. *vr1->ref == field, and field participates in a
multiple equality whose const item is another Item_direct_view_ref vr2
that not only also derefs to field, but also has the same ref field as
vr1 (vr1->ref == vr2->ref), step 1 above returns vr2, and step 2
above makes vr1 deref to vr2, causing a self-reference:
*vr2->ref == *vr1->ref == vr2.

Now, if we do not allow Item_field to participate in a multiple
equality (i.e. when vr1->ref == field, we do {{vr1->item_equal=
item_equal}} and field->item_equal = NULL), then the
self-reference would not occur in this case. In step 1 above, the
right hand side will evaluate to Item_field, because its cond_equal is
NULL.

Item *Item_field::replace_equal_field(THD *thd, uchar *arg)

  REPLACE_EQUAL_FIELD_ARG* param= (REPLACE_EQUAL_FIELD_ARG*)arg;

  if (item_equal && item_equal == param->item_equal)

...

  return this;

It will therefore skip step 2, and in step 3 above, vr1 temporarily
lets field participate in the multiple equality, and we get vr2 as a
result, which is then used to replace vr1 directly, rather than
*vr1->ref.

We can also answer the following question in a previous comment:

> Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the
> same bug? what happens if there are nested views?

If there are nested views, say *vr1->ref == vr2,
*vr2->ref == field, vr1->item_equal->get_const() == vr3,
vr1->item_equal == cond_equal->current_level->item_equal,
and {{vr2->item_equal == field->item_equal == NULL}
then the following happens, and vr3 replaces vr1, and no
self-reference occurs.

  vr1->transform: new_item= vr2->transform(...)

    vr2->transform: new_item= field->transform(...)

      field->replace_equal_field: returns itself because its item_equal is NULL

      vr2->replace_equal_field: returns itself because its item_equal is NULL

    vr1->replace_equal_field: temporarily lets field participate and returns vr3

Because an Item_direct_view_ref does not allow anything downstream
to participate in a multiple equality, anything returned by the
replace_equal_field call on an Item_field would replace the
top-level Item_direct_view_ref. As such, no self-reference occurs.

Yuchen Pei added a comment - 2023-09-11 06:22 - edited After digesting psergei 's comment above, and discussions with him, here is an explanation why not allowing Item_fields to participate in multiple equalities when they are referenced by Item_direct_view_ref fixes the self-reference bug (provided some assumption mentioned at the end of this comment holds). The job of Item::propagate_equal_fields() is to assign a multiple equality to the item, or make it so field-like items participate in the correct multiple equalities. Here we say an item i participates in multiple equality e if i->item_equal == e , and we say i does not participate in any multiple equality if i->item_equal == NULL . We say an Item_direct_view_ref vr points to a Item_field field if vr->real_item() == field . Multiple Item_direct_view_ref 's in different contexts may point to the same Item_field . To avoid joining disjoint equalities, when an Item_field is pointed to by an Item_direct_view_ref , the former should no longer participate in a multiple equality, but the latter should. For example, consider {{alias.a = t.col1 OR alias.a = t.col2}}. If the field underlying alias.a participates in both multiple equalities, the two multiple equalities will join into one with three elements all equal to each other, which is a logical error. (It would be nice to have a test case for this, though.) However, this is not the case in the codebase, the violation of this rule causes the bug, and the rectification fixes the bug provided certain assumptions hold. The job of Item::replace_equal_field() is to replace a field-like item by an equal field-like item. When the Item is an Item_field, it replaces itself by the const or first item of the multiple equality it participates, if any and if it agrees with cond_equal. When the Item i is an Item_ref, including Item_direct_view_ref, in Item_ref::transform(.., replace_equal_field, ..), it calls the transformer on the item it derefs to (i.e. *i->ref ). Item *new_item= (* ref )->transform(thd, transformer, arg); /* 1 */ And if the transformer returns a new item that is different from *i->ref , we make i deref to this new item. if (* ref != new_item) thd->change_item_tree( ref , new_item); /* 2 */ After that we replace call replace_equal_field on i itself. return ( this ->*transformer)(thd, arg); /* 3 */ So when we have Item_direct_view_ref vr1 that derefs to an Item_field field, i.e. *vr1->ref == field , and field participates in a multiple equality whose const item is another Item_direct_view_ref vr2 that not only also derefs to field, but also has the same ref field as vr1 ( vr1->ref == vr2->ref ), step 1 above returns vr2, and step 2 above makes vr1 deref to vr2, causing a self-reference: *vr2->ref == *vr1->ref == vr2 . Now, if we do not allow Item_field to participate in a multiple equality (i.e. when vr1->ref == field , we do {{vr1->item_equal= item_equal}} and field->item_equal = NULL ), then the self-reference would not occur in this case. In step 1 above, the right hand side will evaluate to Item_field, because its cond_equal is NULL. Item *Item_field::replace_equal_field(THD *thd, uchar *arg) { REPLACE_EQUAL_FIELD_ARG* param= (REPLACE_EQUAL_FIELD_ARG*)arg; if (item_equal && item_equal == param->item_equal) { ... } return this ; } It will therefore skip step 2, and in step 3 above, vr1 temporarily lets field participate in the multiple equality, and we get vr2 as a result, which is then used to replace vr1 directly, rather than *vr1->ref . We can also answer the following question in a previous comment: > Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the > same bug? what happens if there are nested views? If there are nested views, say *vr1->ref == vr2 , *vr2->ref == field , vr1->item_equal->get_const() == vr3 , vr1->item_equal == cond_equal->current_level->item_equal , and {{vr2->item_equal == field->item_equal == NULL} then the following happens, and vr3 replaces vr1, and no self-reference occurs. vr1->transform: new_item= vr2->transform(...) vr2->transform: new_item= field->transform(...) field->replace_equal_field: returns itself because its item_equal is NULL vr2->replace_equal_field: returns itself because its item_equal is NULL vr1->replace_equal_field: temporarily lets field participate and returns vr3 Because an Item_direct_view_ref does not allow anything downstream to participate in a multiple equality, anything returned by the replace_equal_field call on an Item_field would replace the top-level Item_direct_view_ref. As such, no self-reference occurs.

Yuchen Pei added a comment - 2023-09-12 02:54 - edited

Anyway here are the latest commits. I wonder how important it is to
add a test covering nested Item_direct_view_ref and one that shows
logical errors may occur if an Item_field participates in a multiple
equality when there is more than one Item_direct_view_ref pointing
to it. What do you think psergei?

1084af9e91a MDEV-22534 Fix self-referencing Item_direct_view_ref

4d72160ae4e MDEV-22534 Simplify the workaround for MDEV-31269

f393ecea3af MDEV-22534 Decorrelate IN subquery

I think the plan should be:

1. Have the self-reference fix as the first commit and push it to
10.4, since 10.4 is affected
2. wait for there to be an 11.3 version of the MDEV-30073 patch so
that we can test our main patch on top
3. wait for the MDEV-30073 patch to have made it to 11.3, get rid of
our MDEV-31269 workaround and push our main patch to 11.3

OTOH our patch does not make the problem in MDEV-31269 worse, so
maybe step 2 and 3 are redundant, but then again none of the
workarounds has gained consensus support...

Yuchen Pei added a comment - 2023-09-12 02:54 - edited Anyway here are the latest commits. I wonder how important it is to add a test covering nested Item_direct_view_ref and one that shows logical errors may occur if an Item_field participates in a multiple equality when there is more than one Item_direct_view_ref pointing to it. What do you think psergei ? 1084af9e91a MDEV-22534 Fix self-referencing Item_direct_view_ref 4d72160ae4e MDEV-22534 Simplify the workaround for MDEV-31269 f393ecea3af MDEV-22534 Decorrelate IN subquery I think the plan should be: 1. Have the self-reference fix as the first commit and push it to 10.4, since 10.4 is affected 2. wait for there to be an 11.3 version of the MDEV-30073 patch so that we can test our main patch on top 3. wait for the MDEV-30073 patch to have made it to 11.3, get rid of our MDEV-31269 workaround and push our main patch to 11.3 OTOH our patch does not make the problem in MDEV-31269 worse, so maybe step 2 and 3 are redundant, but then again none of the workarounds has gained consensus support...

Yuchen Pei added a comment - 2023-11-02 04:01 - edited

On why Item_direct_ref's are used in the creation of the extra "IS
NOT NULL" items.

If we replace them with Item_ref's, then ~~MDEV-3906~~ resurfaces,
i.e. the testcase in main.subselect_exists2in corresponding to
~~MDEV-3906~~ fails with a crash.

Like ~~MDEV-3881~~, the fix of ~~MDEV-3906~~ is "included" in the original
commit adding the exists2in transformation.

e3ac306157ab9ade137c9afc9fff270a2f50d7ec

Commit:     unknown <sanja@montyprogram.com>

CommitDate: Tue Feb 26 01:20:17 2013 +0200

[NOT] EXISTS to IN transformation.

sanja's last comment in ~~MDEV-3906~~[1] seems to suggest that
Item_direct_ref was used to fix this issue.

[1] https://jira.mariadb.org/browse/MDEV-3906?focusedCommentId=28801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28801

Interestingly, if we apply the more crude workaround for MDEV-31269,
then the ~~MDEV-3906~~ regression is gone:

80a094f1355 * upstream/bb-11.3-mdev-22534-item-ref-no-direct MDEV-22534 [demo] Use Item_ref instead of Item_direct_ref

fc1027ef992 * MDEV-22534 Simplify the workaround for MDEV-31269

f393ecea3af * MDEV-22534 Decorrelate IN subquery

Given this, and the fact that ~~MDEV-3906~~ is also a 2nd PS crash issue,
I also made the changes of using Item_ref on top of a commit for
MDEV-30073, and the regression also does not appear:

8dffff917bc upstream/bb-10.4-mdev-31269-3906-fixed-by-mdev-30073 MDEV-22534 [demo] MDEV-30073 fixes MDEV-31269 and MDEV-3906

34083cf34bc upstream/bb-10.4-mdev-30073 MDEV-30073 Wrong result on 2nd execution of PS for query with NOT EXISTS

So it seems:

MDEV-30073 fixes both 2nd PS issues, and thus allows the use of
Item_ref in the creations of extra IS NOT NULL's in
exists2in/decorrelation transformation
Without the fix of MDEV-30073, the only way to pass all tests,
including the ps-protocol ones, is to check for free list
intersections during ps execution, as well as keep the
Item_direct_ref with the infinite cycle fix
The infinite cycle fix is probably still needed because the
IS_NOT_NULL item creations are not the only place using
Item_direct_ref. Though if we rely on the MDEV-30073 fix, this could
be separated out as an independent bug, if a testcase can be
created.

There are two ways moving forward:
1. Block this ticket with MDEV-30073, and use Item_ref and no
workaround in our patches once MDEV-30073 is fixed
2. Use the Item_direct_ref, the infinite cycle fix, and either the
fourth or the fifth workaround options in the previous comment[2].

[2] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=269076&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-269076

Which of these two shall we go with? psergei serg igor

Yuchen Pei added a comment - 2023-11-02 04:01 - edited On why Item_direct_ref 's are used in the creation of the extra "IS NOT NULL" items. If we replace them with Item_ref 's, then MDEV-3906 resurfaces, i.e. the testcase in main.subselect_exists2in corresponding to MDEV-3906 fails with a crash. Like MDEV-3881 , the fix of MDEV-3906 is "included" in the original commit adding the exists2in transformation. e3ac306157ab9ade137c9afc9fff270a2f50d7ec Commit: unknown <sanja@montyprogram.com> CommitDate: Tue Feb 26 01:20:17 2013 +0200 [NOT] EXISTS to IN transformation. sanja 's last comment in MDEV-3906 [1] seems to suggest that Item_direct_ref was used to fix this issue. [1] https://jira.mariadb.org/browse/MDEV-3906?focusedCommentId=28801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28801 Interestingly, if we apply the more crude workaround for MDEV-31269 , then the MDEV-3906 regression is gone: 80a094f1355 * upstream/bb-11.3-mdev-22534-item-ref-no-direct MDEV-22534 [demo] Use Item_ref instead of Item_direct_ref fc1027ef992 * MDEV-22534 Simplify the workaround for MDEV-31269 f393ecea3af * MDEV-22534 Decorrelate IN subquery Given this, and the fact that MDEV-3906 is also a 2nd PS crash issue, I also made the changes of using Item_ref on top of a commit for MDEV-30073 , and the regression also does not appear: 8dffff917bc upstream/bb-10.4-mdev-31269-3906-fixed-by-mdev-30073 MDEV-22534 [demo] MDEV-30073 fixes MDEV-31269 and MDEV-3906 34083cf34bc upstream/bb-10.4-mdev-30073 MDEV-30073 Wrong result on 2nd execution of PS for query with NOT EXISTS So it seems: MDEV-30073 fixes both 2nd PS issues, and thus allows the use of Item_ref in the creations of extra IS NOT NULL 's in exists2in/decorrelation transformation Without the fix of MDEV-30073 , the only way to pass all tests, including the ps-protocol ones, is to check for free list intersections during ps execution, as well as keep the Item_direct_ref with the infinite cycle fix The infinite cycle fix is probably still needed because the IS_NOT_NULL item creations are not the only place using Item_direct_ref. Though if we rely on the MDEV-30073 fix, this could be separated out as an independent bug, if a testcase can be created. There are two ways moving forward: 1. Block this ticket with MDEV-30073 , and use Item_ref and no workaround in our patches once MDEV-30073 is fixed 2. Use the Item_direct_ref, the infinite cycle fix, and either the fourth or the fifth workaround options in the previous comment [2] . [2] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=269076&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-269076 Which of these two shall we go with? psergei serg igor

MariaDB Server

Trivial correlation detection/removal for IN subqueries

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration