Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22534

Trivial correlation detection/removal for IN subqueries

Details

    Description

      EXISTS-to-IN optimization performs trivial correlation detection and removal.

      MySQL 8 has got it too, but in addition to EXISTS subqueries, they perform de-correlation also for IN subqueries. This allows them to use Materialization strategy for trivially-correlated IN- subqueries:

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table t1 (a int, b int, c int);
      insert into t1 select a,a,a from ten;
      create table t2 select * from t1;
      explain select * from t1 where a in (select a from t2 where t1.b=t2.b);
      

      id     select_type     table   partitions      type    possible_keys   key     key_len ref     rows    filtered        Extra
      1      SIMPLE  t1      NULL    ALL     NULL    NULL    NULL    NULL    10      100.00  Using where
      1      SIMPLE  <subquery2>     NULL    eq_ref  <auto_distinct_key>     <auto_distinct_key>     10      test.t1.a,test.t1.b     1       100.00  NULL
      2      MATERIALIZED    t2      NULL    ALL     NULL    NULL    NULL    NULL    10      100.00  NULL
      

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            So we want to turn some kind of trivially correlated subquery to an uncorrelated subquery, so that the optimizer can apply materialization.

            Extrapolating from the example, and the description in EXISTS-TO-IN[1], what happens is that we transform

            in (select inner_col' from inner_table where inner_col = outer_col)

            to

            , outer_col in (select inner_col', inner_col from inner_table)

            How much do we want to generalize this? Judging from EXISTS-TO-IN, we can add an inner_where:

            in (select inner_col' from inner_table where inner_col = outer_col and inner_where) => , outer_col in (select inner_col', inner_col from inner_table where inner_where)

            But how about expressions?

            EXISTS-TO-IN seems to generalise to outer_expr=inner_col (see find_inner_outer_equalities()), so maybe we can also have

            in (select inner_col' from inner_table where inner_col = outer_expr and inner_where) => , outer_expr in (select inner_col', inner_col from inner_table where inner_where)

            But why not go all the way to

            in (select inner_expr' from inner_table where inner_expr = outer_expr and inner_where) => , outer_expr in (select inner_expr', inner_expr from inner_table where inner_where)
            ?

            [1] https://mariadb.com/kb/en/exists-to-in-optimization/

            In any case I'll try to hack out a PoC that covers the example first.

            ycp Yuchen Pei added a comment - So we want to turn some kind of trivially correlated subquery to an uncorrelated subquery, so that the optimizer can apply materialization. Extrapolating from the example, and the description in EXISTS-TO-IN [1] , what happens is that we transform in (select inner_col' from inner_table where inner_col = outer_col) to , outer_col in (select inner_col', inner_col from inner_table) How much do we want to generalize this? Judging from EXISTS-TO-IN, we can add an inner_where: in (select inner_col' from inner_table where inner_col = outer_col and inner_where) => , outer_col in (select inner_col', inner_col from inner_table where inner_where) But how about expressions? EXISTS-TO-IN seems to generalise to outer_expr=inner_col (see find_inner_outer_equalities() ), so maybe we can also have in (select inner_col' from inner_table where inner_col = outer_expr and inner_where) => , outer_expr in (select inner_col', inner_col from inner_table where inner_where) But why not go all the way to in (select inner_expr' from inner_table where inner_expr = outer_expr and inner_where) => , outer_expr in (select inner_expr', inner_expr from inner_table where inner_where) ? [1] https://mariadb.com/kb/en/exists-to-in-optimization/ In any case I'll try to hack out a PoC that covers the example first.
            ycp Yuchen Pei added a comment -

            Here's a patch that works for the example test case:

            https://github.com/MariaDB/server/commit/b3752311d95

            Working on refining it.

            ycp Yuchen Pei added a comment - Here's a patch that works for the example test case: https://github.com/MariaDB/server/commit/b3752311d95 Working on refining it.
            psergei Sergei Petrunia added a comment - - edited

            ycp,

            But why not go all the way to

            in (select inner_expr' 
                from inner_table 
                where inner_expr = outer_expr and inner_where) 
            

            => ,

            outer_expr in (select inner_expr', inner_expr 
                           from inner_table 
                           where inner_where)
            

            This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that?

            Looking at how current EXISTS-to-IN works:

            create table t1 (a int, b int);
            insert into t1 select seq, seq from seq_1_to_10000;
            create table t2 as select * from t1;
            

            EXISTS-to-IN is done for outer_expr=inner_col

            explain 
            select * from t1 
            where exists (select 1 from t2 where t1.a+1=t2.a);
            

            EXISTS-to-IN is NOT done for outer_col=inner_expr:

            explain 
            select * from t1 
            where exists (select 1 from t2 where t1.a=t2.a+1);
            

            (ycp, please check with sanja for reasons why... )

            psergei Sergei Petrunia added a comment - - edited ycp , But why not go all the way to in (select inner_expr' from inner_table where inner_expr = outer_expr and inner_where) => , outer_expr in (select inner_expr', inner_expr from inner_table where inner_where) This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that? Looking at how current EXISTS-to-IN works: create table t1 (a int, b int); insert into t1 select seq, seq from seq_1_to_10000; create table t2 as select * from t1; EXISTS-to-IN is done for outer_expr=inner_col explain select * from t1 where exists (select 1 from t2 where t1.a+1=t2.a); EXISTS-to-IN is NOT done for outer_col=inner_expr : explain select * from t1 where exists (select 1 from t2 where t1.a=t2.a+1); ( ycp , please check with sanja for reasons why... )
            psergei Sergei Petrunia added a comment - - edited

            Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable:

            alter table t2 add c varchar(32);
            set optimizer_trace=1;
            explain select * from t1 where exists (select 1 from t2 where t2.c=t1.a);
            select left(trace,1900) from information_schema.optimizer_trace\G
            

            +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                                           |
            +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+
            |    1 | PRIMARY     | t1    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |                                                                 |
            |    1 | PRIMARY     | t2    | ALL  | NULL          | NULL | NULL    | NULL | 1     | Using where; FirstMatch(t1); Using join buffer (flat, BNL join) |
            +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+
            

                      {
                        "transformation": {
                          "select_id": 2,
                          "from": "EXISTS (SELECT)",
                          "to": "IN (SELECT)",
                          "upper_not": false
                        }
                      },
                      {
                        "transformation": {
                          "select_id": 2,
                          "from": "IN (SELECT)",
                          "to": "semijoin",
                          "chosen": true
                        }
                      },
                      {
                        "transformation": {
                          "select_id": 2,
                          "from": "IN (SELECT)",
                          "to": "materialization",
                          "possible": false,
                          "cause": "types mismatch"
                        }
                      },
            

            what should the code for this MDEV do in such cases?

            psergei Sergei Petrunia added a comment - - edited Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable: alter table t2 add c varchar (32); set optimizer_trace=1; explain select * from t1 where exists ( select 1 from t2 where t2.c=t1.a); select left (trace,1900) from information_schema.optimizer_trace\G +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | | | 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 1 | Using where; FirstMatch(t1); Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+ { "transformation" : { "select_id" : 2, "from" : "EXISTS (SELECT)" , "to" : "IN (SELECT)" , "upper_not" : false } }, { "transformation" : { "select_id" : 2, "from" : "IN (SELECT)" , "to" : "semijoin" , "chosen" : true } }, { "transformation" : { "select_id" : 2, "from" : "IN (SELECT)" , "to" : "materialization" , "possible" : false , "cause" : "types mismatch" } }, what should the code for this MDEV do in such cases?

            ycp, the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well.

            psergei Sergei Petrunia added a comment - ycp , the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well.
            ycp Yuchen Pei added a comment - - edited

            Hi psergei, thanks for the comments.

            > This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that?

            I don't see any problem, will add testcases covering this scenario. What is the concern? Are there existing testcases in the codebase addressing this kind of concerns?

            Also, what do you mean by fulltext functions?

            > Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable:

            > what should the code for this MDEV do in such cases?

            I feel that in general decorrelation is a gain, even if afterwards there's still correlation left.

            But in the example you provided, assuming it does the exists2in transformation to select * from t1 where a in (select c from t2); and proceeds to executing the subquery select c from t2 with a full scan followed by a futile where a in ...., it does look like a scenario we want to avoid if possible. I will need to look more into that.

            > the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well.

            Will do, thanks.

            ycp Yuchen Pei added a comment - - edited Hi psergei , thanks for the comments. > This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that? I don't see any problem, will add testcases covering this scenario. What is the concern? Are there existing testcases in the codebase addressing this kind of concerns? Also, what do you mean by fulltext functions? > Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable: > what should the code for this MDEV do in such cases? I feel that in general decorrelation is a gain, even if afterwards there's still correlation left. But in the example you provided, assuming it does the exists2in transformation to select * from t1 where a in (select c from t2); and proceeds to executing the subquery select c from t2 with a full scan followed by a futile where a in .... , it does look like a scenario we want to avoid if possible. I will need to look more into that. > the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well. Will do, thanks.
            ycp Yuchen Pei added a comment -

            As discussed in a call with sanja, I created MDEV-31229 regarding the generalization to inner_expr=outer_expr

            ycp Yuchen Pei added a comment - As discussed in a call with sanja , I created MDEV-31229 regarding the generalization to inner_expr=outer_expr
            ycp Yuchen Pei added a comment - - edited

            I added more tests, and started fixing regressions.

            Most of the test failures concern mismatch in result files, while two ends in crashes: main.subselect_sj2_mat and main.subselect4. Both are caused by the use of freed items during the prepare stage in a second execution of a prepared statements. I fixed main.subselect_sj2_mat by calling thd->activate_stmt_arena_if_needed() and thd->restore_active_arena() in the processor. However, main.subselect4 is trickier, because the offending item was created at the prepare stage (presumably during the first execution) rather than inside the processor. This also seems to uncover a crash bug in the exists-to-in transformation:

            set optimizer_switch=default;
            CREATE TABLE t1 (a INT);
            CREATE TABLE t2 (b INT);
            PREPARE stmt FROM "
            SELECT * FROM t2
            HAVING 0 IN (
              SELECT a FROM t1
              WHERE EXISTS ( 
                SELECT a FROM t1
                WHERE b = a
              )
            )
            ";
             
            EXECUTE stmt;
            EXECUTE stmt;
            

            I have not looked into it but suspect it is caused by the same issue. Filed it to MDEV-31269.

            ycp Yuchen Pei added a comment - - edited I added more tests, and started fixing regressions. Most of the test failures concern mismatch in result files, while two ends in crashes: main.subselect_sj2_mat and main.subselect4. Both are caused by the use of freed items during the prepare stage in a second execution of a prepared statements. I fixed main.subselect_sj2_mat by calling thd->activate_stmt_arena_if_needed() and thd->restore_active_arena() in the processor. However, main.subselect4 is trickier, because the offending item was created at the prepare stage (presumably during the first execution) rather than inside the processor. This also seems to uncover a crash bug in the exists-to-in transformation: set optimizer_switch= default ; CREATE TABLE t1 (a INT ); CREATE TABLE t2 (b INT ); PREPARE stmt FROM " SELECT * FROM t2 HAVING 0 IN ( SELECT a FROM t1 WHERE EXISTS ( SELECT a FROM t1 WHERE b = a ) ) " ;   EXECUTE stmt; EXECUTE stmt; I have not looked into it but suspect it is caused by the same issue. Filed it to MDEV-31269 .
            ycp Yuchen Pei added a comment - - edited

            To be more clear, the problem with main.subselect4 is if we run the following test with the draft patch at[1], we get sigsegv similar to that of MDEV-31269:

            set optimizer_switch=default;
            CREATE TABLE t1 (a INT);
            CREATE TABLE t2 (b INT);
            PREPARE stmt FROM "
            SELECT * FROM t2
            HAVING 0 IN (
              SELECT a FROM t1
              WHERE a IN ( 
                SELECT a FROM t1
                WHERE b = a
              )
            )
            ";
             
            EXECUTE stmt;
            EXECUTE stmt;
            

            [1] https://github.com/MariaDB/server/commit/88ed254c3e6

            I did more debugging about this, and found out that the crash is caused by the following events

            1. During the JOIN::prepare() of the first execution, the Item_field referring to t2.b is replaced by an Item_ref
            2. The item_ref is moved around in the decorrelation transformation and subsequent optimizations
            3. At the end of the 1st ps execution, the Item_ref is freed due to its creation in the ps execution, as part of the cleanup
            4. During the JOIN::prepare() of the second execution, it tries to restore the optimized version of the statement, in doing so it does a sanity check and causes a sigsegv when trying to access the freed Item_ref.

            Why the problem does not occur with the transformation disabled:

            • during step 1 above, the replacement also adds an item to the change_list
            • during step 2 above, the optimizations cause the equality b = a to be moved around as a whole, thereby moving its args which is an Item ** instead of individual item *'s
            • during step 3 above, the change is rolled back, so the Item ** that is the args of the Item_func_eq goes from b (Item_ref) = a to b (Item_field) = a
            • during step 4 above, there's no use-after-free in the restored optimized statement because it automatically contains the b (Item_field) = a.

            Why we can't do the same with the transformation enabled:

            • during step 2, we break the b = a into the outer_expr (b) and local_field (a), add the outer_expr to left_expr and local_field to the inner select item list. we cannot move the equality as a whole during this transformation.

            Maybe we can fix this by checking during the decorrelation transformation whether local_field or any item in the tree of outer_expr is in the free_list that will be freed at Step 3. If so, then do not proceed with the transformation. What do you think psergei?

            I also debugged the exists2in crash (MDEV-31269), and it seems to indeed have the same cause. So I wonder whether it makes sense to try to fix that one first.

            ycp Yuchen Pei added a comment - - edited To be more clear, the problem with main.subselect4 is if we run the following test with the draft patch at [1] , we get sigsegv similar to that of MDEV-31269 : set optimizer_switch= default ; CREATE TABLE t1 (a INT ); CREATE TABLE t2 (b INT ); PREPARE stmt FROM " SELECT * FROM t2 HAVING 0 IN ( SELECT a FROM t1 WHERE a IN ( SELECT a FROM t1 WHERE b = a ) ) " ;   EXECUTE stmt; EXECUTE stmt; [1] https://github.com/MariaDB/server/commit/88ed254c3e6 I did more debugging about this, and found out that the crash is caused by the following events 1. During the JOIN::prepare() of the first execution, the Item_field referring to t2.b is replaced by an Item_ref 2. The item_ref is moved around in the decorrelation transformation and subsequent optimizations 3. At the end of the 1st ps execution, the Item_ref is freed due to its creation in the ps execution, as part of the cleanup 4. During the JOIN::prepare() of the second execution, it tries to restore the optimized version of the statement, in doing so it does a sanity check and causes a sigsegv when trying to access the freed Item_ref. Why the problem does not occur with the transformation disabled: during step 1 above, the replacement also adds an item to the change_list during step 2 above, the optimizations cause the equality b = a to be moved around as a whole, thereby moving its args which is an Item ** instead of individual item * 's during step 3 above, the change is rolled back, so the Item ** that is the args of the Item_func_eq goes from b (Item_ref) = a to b (Item_field) = a during step 4 above, there's no use-after-free in the restored optimized statement because it automatically contains the b (Item_field) = a . Why we can't do the same with the transformation enabled: during step 2, we break the b = a into the outer_expr (b) and local_field (a), add the outer_expr to left_expr and local_field to the inner select item list. we cannot move the equality as a whole during this transformation. Maybe we can fix this by checking during the decorrelation transformation whether local_field or any item in the tree of outer_expr is in the free_list that will be freed at Step 3. If so, then do not proceed with the transformation. What do you think psergei ? I also debugged the exists2in crash ( MDEV-31269 ), and it seems to indeed have the same cause. So I wonder whether it makes sense to try to fix that one first.
            ycp Yuchen Pei added a comment -

            I have a more polished patch now. Can you take a look psergei? Thank you:

            https://github.com/MariaDB/server/commit/ba9cf8efeb0

            The special cases where left_expr are subqueries or fulltext functions are included in the test.

            The type mismatch causing no materialization down the road is not obvious to fix. I thought about checking whether the type handlers of the two sides of the equalities, but then there are "compatible" ones like longlong vs long. I could look more into it (e.g. how to determine which mismatches are compatible and which are not) but I feel it has already taken quite long, and given this is an existing issue in the existing exists-to-in transformation, maybe we can do with creating a ticket for it.

            Finally, my fix of 2nd ps execution segfault requires HAVE_PSI_STATEMENT_INTERFACE because it checks thd->m_statement_state.m_parent_prepared_stmt to determine whether we are inside a ps execution. Is there a better way to determine this?

            ycp Yuchen Pei added a comment - I have a more polished patch now. Can you take a look psergei ? Thank you: https://github.com/MariaDB/server/commit/ba9cf8efeb0 The special cases where left_expr are subqueries or fulltext functions are included in the test. The type mismatch causing no materialization down the road is not obvious to fix. I thought about checking whether the type handlers of the two sides of the equalities, but then there are "compatible" ones like longlong vs long. I could look more into it (e.g. how to determine which mismatches are compatible and which are not) but I feel it has already taken quite long, and given this is an existing issue in the existing exists-to-in transformation, maybe we can do with creating a ticket for it. Finally, my fix of 2nd ps execution segfault requires HAVE_PSI_STATEMENT_INTERFACE because it checks thd->m_statement_state.m_parent_prepared_stmt to determine whether we are inside a ps execution. Is there a better way to determine this?
            ycp Yuchen Pei added a comment - - edited

            I think the scope of this ticket should be the transformation. Prevention of the 2nd ps execution bug is part of MDEV-31269.

            Because of that, I'm marking MDEV-31269 as blocking this ticket, because without fixing that bug and preventing it the same way for this transformation we can't push the change in this ticket, unless the transformation can be implemented in a significantly different way from exists2in such that the bug no longer manifests.

            But I think the patch still can be reviewed for the transformation itself. Hope that makes sense.

            ycp Yuchen Pei added a comment - - edited I think the scope of this ticket should be the transformation. Prevention of the 2nd ps execution bug is part of MDEV-31269 . Because of that, I'm marking MDEV-31269 as blocking this ticket, because without fixing that bug and preventing it the same way for this transformation we can't push the change in this ticket, unless the transformation can be implemented in a significantly different way from exists2in such that the bug no longer manifests. But I think the patch still can be reviewed for the transformation itself. Hope that makes sense.

            Some observations from review:

            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
             
            create table one_k(a int);
            insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
             
            create table one_m(a int primary key);
            insert into one_m select A.a + B.a* 1000 from one_k A, one_k B;
             
            create table t1 (a int, b int, c int );
             
            insert into t1 select a, a, a from one_k;
             
            create table t2 (a int, b int, c int );
            insert into t2  select a,a,a from one_m;
            

            MySQL-8:

            mysql> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333;
            +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
            | id | select_type        | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
            +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
            |  1 | PRIMARY            | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   1000 |   100.00 | Using where |
            |  2 | DEPENDENT SUBQUERY | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 997999 |     1.00 | Using where |
            +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
            

            Odd, I would expect it to work.
            But this matches their optimizer trace:

                {
                  "transformation": {
                    "select#": 2,
                    "from": "IN (SELECT)",
                    "to": "materialization",
                    "possible": false,
                    "cause": "correlated"
                  }
                },
            

            In MariaDB with this patch:

            MariaDB [j1]> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333;
            +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
            | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
            +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
            |    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000   | Using where |
            |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 997920 | Using where |
            +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
            

            But the trace shows materialization should be possible:

                        "join_preparation": {
                          "select_id": 2,
                          "steps": [
                            {
                              "transformation": {
                                "select_id": 2,
                                "from": "IN (SELECT)",
                                "to": "materialization",
                                "sjm_scan_allowed": false,
                                "possible": true
                              }
                            },
            

                        "transformation": {
                          "select_id": 2,
                          "from": "IN (SELECT)",
                          "to": "decorrelation"
                        }
            

            psergei Sergei Petrunia added a comment - Some observations from review: create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);   create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;   create table one_m(a int primary key); insert into one_m select A.a + B.a* 1000 from one_k A, one_k B;   create table t1 (a int, b int, c int );   insert into t1 select a, a, a from one_k;   create table t2 (a int, b int, c int ); insert into t2 select a,a,a from one_m; MySQL-8: mysql> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333; +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 997999 | 1.00 | Using where | +----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ Odd, I would expect it to work. But this matches their optimizer trace: { "transformation": { "select#": 2, "from": "IN (SELECT)", "to": "materialization", "possible": false, "cause": "correlated" } }, In MariaDB with this patch: MariaDB [j1]> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333; +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 997920 | Using where | +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+ But the trace shows materialization should be possible: "join_preparation": { "select_id": 2, "steps": [ { "transformation": { "select_id": 2, "from": "IN (SELECT)", "to": "materialization", "sjm_scan_allowed": false, "possible": true } }, "transformation": { "select_id": 2, "from": "IN (SELECT)", "to": "decorrelation" }

            First portion of review input provided by email

            psergei Sergei Petrunia added a comment - First portion of review input provided by email
            ycp Yuchen Pei added a comment - - edited

            Replying to comment about the non-semijoin materialization[1]:

            The trace was printed in subquery_types_allow_materialization() during the prepare stage, which was called by is_materialization_applicable() with some further checks, including correlation. So in this case the original IN subquery passes the "allow" check (whatever that means), but because it is correlated, it does not pass the "applicable" check which would result in adding the materialization strategy.

            If it were a semijoin and gets decorrelated by the new transformation, it would get a second chance in convert_join_subqueries_to_semijoins() called in optimize_inner().

            If it were an EXIST subquery and had an exists-to-in transformation, it would create a brand new IN subquery in the transformation and call check_and_do_in_subquery_rewrites() that adds the materialization strategy. In our case, the IN subquery is not brand new, and it has already in-to-exists strategy, so doing the same would not help, because there's a check !in_subs->has_strategy() before trying to add the materialization strategy.

            So to fix the example you provided, I updated the transformation by adding a (re)consideration for materialization in case of a non-semijoin at the end of the transformation[2].

            This causes a few more "possible" materializations in the trace not resulting in actual materialization mainly in the case of having an upper NOT, but given the relationship between subquery_types_allow_materialization() and is_materialization_applicable() as discussed above, as long as materialization is not expected*, I suspect it is OK. What do you think psergei?

            [1] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259705&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-259705
            [2] https://github.com/MariaDB/server/commit/69af0632fe5

            * I checked equivalent EXISTS versions, and they do not result in materialization, unlike your non-semijoin example.

            ycp Yuchen Pei added a comment - - edited Replying to comment about the non-semijoin materialization [1] : The trace was printed in subquery_types_allow_materialization() during the prepare stage, which was called by is_materialization_applicable() with some further checks, including correlation. So in this case the original IN subquery passes the "allow" check (whatever that means), but because it is correlated, it does not pass the "applicable" check which would result in adding the materialization strategy. If it were a semijoin and gets decorrelated by the new transformation, it would get a second chance in convert_join_subqueries_to_semijoins() called in optimize_inner() . If it were an EXIST subquery and had an exists-to-in transformation, it would create a brand new IN subquery in the transformation and call check_and_do_in_subquery_rewrites() that adds the materialization strategy. In our case, the IN subquery is not brand new, and it has already in-to-exists strategy, so doing the same would not help, because there's a check !in_subs->has_strategy() before trying to add the materialization strategy. So to fix the example you provided, I updated the transformation by adding a (re)consideration for materialization in case of a non-semijoin at the end of the transformation [2] . This causes a few more "possible" materializations in the trace not resulting in actual materialization mainly in the case of having an upper NOT, but given the relationship between subquery_types_allow_materialization() and is_materialization_applicable() as discussed above, as long as materialization is not expected*, I suspect it is OK. What do you think psergei ? [1] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259705&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-259705 [2] https://github.com/MariaDB/server/commit/69af0632fe5 * I checked equivalent EXISTS versions, and they do not result in materialization, unlike your non-semijoin example.
            ycp Yuchen Pei added a comment - - edited

            New commit that addresses the review input
            https://github.com/MariaDB/server/commit/e80a40e1354

            Can you take a look psergei? Thanks.

            ycp Yuchen Pei added a comment - - edited New commit that addresses the review input https://github.com/MariaDB/server/commit/e80a40e1354 Can you take a look psergei ? Thanks.
            psergei Sergei Petrunia added a comment - - edited

            Looking at the next patch. Good to see the input was addressed.

            Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql

            The trivial correlation is

            where t1.b=t2.y 
            

            the code puts t1.b IS NOT NULL at the top level (outside the subquery):

              "query_block": {
                "select_id": 1,
                      "attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))"
            

            But then inside the subquery we have:

                        {
                          "full-scan-on-null_key": {
                            "table": {
                              "table_name": "t2",
                              "access_type": "ref_or_null",
                              "possible_keys": ["y"],
                              "key": "y",
                              "key_length": "5",
                              "used_key_parts": ["y"],
            

            the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery.

            psergei Sergei Petrunia added a comment - - edited Looking at the next patch. Good to see the input was addressed. Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql The trivial correlation is where t1.b=t2.y the code puts t1.b IS NOT NULL at the top level (outside the subquery): "query_block": { "select_id": 1, "attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))" But then inside the subquery we have: { "full-scan-on-null_key": { "table": { "table_name": "t2", "access_type": "ref_or_null", "possible_keys": ["y"], "key": "y", "key_length": "5", "used_key_parts": ["y"], the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery.

            ..., as long as materialization is not expected*, I suspect it is OK. What do you think Sergei Petrunia?

            Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy.

            psergei Sergei Petrunia added a comment - ..., as long as materialization is not expected*, I suspect it is OK. What do you think Sergei Petrunia? Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy.

            Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor().

            Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:

                (!optimizer ||
                ...
                    !first_select->leaf_tables.elements||
                    with_recursive_reference)
            

            Do we need to add these too? (I wonder what "!optimizer" check is for...)

            psergei Sergei Petrunia added a comment - Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor(). Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks: (!optimizer || ... !first_select->leaf_tables.elements|| with_recursive_reference) Do we need to add these too? (I wonder what "!optimizer" check is for...)

            I can also see both exists2in_processor() functions have:

                    !(is_top_level_item() ||
                      (upper_not && upper_not->is_top_level_item())) ||    /* (2) */
            

            although the motivation behind this limitation is not clear.

            is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs.

            But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN, we don't care between TRUE/NULL results.
            This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result).
            But if we do differentiate, why can't we handle the subquery in an arbitrary context?

            psergei Sergei Petrunia added a comment - I can also see both exists2in_processor() functions have: !(is_top_level_item() || (upper_not && upper_not->is_top_level_item())) || /* (2) */ although the motivation behind this limitation is not clear. is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs. But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN , we don't care between TRUE/NULL results. This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result). But if we do differentiate, why can't we handle the subquery in an arbitrary context?
            ycp Yuchen Pei added a comment - - edited

            Hi psergei, thanks for the 3rd round comments. I started working through them before noticing something in your attachment, which I'd like to get cleared before continuing. First, what is the table one below?

            explain format=json select * from t1 where t1.a not in (select t2.x from one,t2 ignore index(x) where t1.b=t2.y and t2.z=33)

            Also I guess you meant an insert into t2 values preceding the following?

            (   12 ,   12 ,   12 ),
            (   22 , NULL ,   22 ),
            ( NULL ,   21 ,   21 ),
            (   23 , NULL ,   23 ),
            ( NULL ,   23 ,   23 );

            I assume seq_1_to_10 is from --source include/have_sequence.inc.

            ycp Yuchen Pei added a comment - - edited Hi psergei , thanks for the 3rd round comments. I started working through them before noticing something in your attachment, which I'd like to get cleared before continuing. First, what is the table one below? explain format=json select * from t1 where t1.a not in ( select t2.x from one,t2 ignore index (x) where t1.b=t2.y and t2.z=33) Also I guess you meant an insert into t2 values preceding the following? ( 12 , 12 , 12 ), ( 22 , NULL , 22 ), ( NULL , 21 , 21 ), ( 23 , NULL , 23 ), ( NULL , 23 , 23 ); I assume seq_1_to_10 is from --source include/have_sequence.inc .

            ycp, sorry for errors in the attachment. The table one is

            create table one (w int);
            insert into one values (1);
            

            I added it so that we get a regular join plan for the subquery, instead of special-case plans of one-table subqueries.

            Also I guess you meant an insert into t2 values preceding the following?

            Yes.

            psergei Sergei Petrunia added a comment - ycp , sorry for errors in the attachment. The table one is create table one (w int); insert into one values (1); I added it so that we get a regular join plan for the subquery, instead of special-case plans of one-table subqueries. Also I guess you meant an insert into t2 values preceding the following? Yes.

            Take-aways from yesterday's optimizer call

            Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:
            (!optimizer ||

            The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer".

            I can also see both exists2in_processor() functions have:

                    !(is_top_level_item() ||
                      (upper_not && upper_not->is_top_level_item())) ||    /* (2) */
            

            No idea about this one.

            psergei Sergei Petrunia added a comment - Take-aways from yesterday's optimizer call Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks: (!optimizer || The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer". I can also see both exists2in_processor() functions have: !(is_top_level_item() || (upper_not && upper_not->is_top_level_item())) || /* (2) */ No idea about this one.
            psergei Sergei Petrunia added a comment - - edited A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2
            ycp Yuchen Pei added a comment - - edited

            Looking at the next patch. Good to see the input was addressed.

            Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql

            The trivial correlation is

            where t1.b=t2.y 

            the code puts t1.b IS NOT NULL at the top level (outside the subquery):

              "query_block": {
                "select_id": 1,
                      "attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))"

            But then inside the subquery we have:

                        {
                          "full-scan-on-null_key": {
                            "table": {
                              "table_name": "t2",
                              "access_type": "ref_or_null",
                              "possible_keys": ["y"],
                              "key": "y",
                              "key_length": "5",
                              "used_key_parts": ["y"],

            the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery.

            I did some debugging, and it seems to be caused by the IN subselect creating Item_func_trig_cond when it is not top level during stage 2 optimization[1]. It is avoided by the exists2in transformation because it marks the newly created IN subselect as toplevel[2], even if there's an upper_not:

              in_subs->top_level_item();

            [1] (backtrace): Item_in_subselect::create_row_in_to_exists_cond > Item_in_subselect::create_in_to_exists_cond > JOIN::choose_subquery_plan > make_join_statistics > JOIN::optimize_inner > JOIN::optimize > st_select_lex::optimize_unflattened_subqueries > JOIN::optimize_unflattened_subqueries > JOIN::optimize_stage2 > JOIN::optimize_inner > JOIN::optimize

            [2] Item_exists_subselect::exists2in_processor

            I tried to do the same thing with my decorrelate-in transformation[3], but it fails the following test:

            CREATE TABLE t1 (pk INT NOT NULL, i INT NOT NULL);
            INSERT INTO t1 VALUES (0,10), (1,20), (2,30), (3,40);
            CREATE TABLE t2a (pk INT NOT NULL, i INT NOT NULL, PRIMARY KEY(i,pk));
            INSERT INTO t2a VALUES (0,0), (1,1), (2,2), (3,3);
            SELECT * FROM t1 WHERE (NULL, 1) NOT IN (SELECT t2a.i, t2a.pk FROM t2a WHERE t2a.pk = t1.pk); # outputs nothing, but should be (0, 10), (2, 30), (3, 40)

            [3] https://github.com/MariaDB/server/commit/373a97067bd

            I could not come up with a similar test for exists2in to check whether the forced toplevel would cause similar problems. In any case, this needs to be further investigated.

            Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy.

            Added a field in the trace whether the outcome is still correlated. AFAICT materialization, if applicable after the transformation, does show up in the trace. An example is the non-semijoin materialization example in the test[3], where if you remove the ".to" in json_extract path, you will see materialization after the decorrelation has the possible field equal true.

            [3] https://github.com/MariaDB/server/blob/62251786f8bb2084fe9169befdc569c24f5078f2/mysql-test/main/subselect_decorrelate_in.test#L437

            Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor().

            Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:

                (!optimizer ||
                ...
                    !first_select->leaf_tables.elements||
                    with_recursive_reference)

            Do we need to add these too? (I wonder what "!optimizer" check is for...)

            Looking at my notes, I don't know what these checks are for and did not find them necessary for my decorrelate-in transformation. The example snippet in the tests I found that covers !first_select->leaf_tables.elements is

            SELECT * FROM t2 AS alias1, t2 AS alias2
            WHERE EXISTS ( SELECT 1 ) AND (alias2.pk = alias1.b )
            ORDER BY alias1.b;

            I wrote a similar test for IN, and it got optimized

            # no need to check leaf tables
            CREATE TABLE t2 (pk INT PRIMARY KEY, b INT) ENGINE=MyISAM;
            INSERT INTO t2 VALUES (1,1),(2,7);
            SELECT * FROM t2 AS alias1, t2 AS alias2
            WHERE alias1.pk in ( SELECT 1 ) AND (alias2.pk = alias1.b )
            ORDER BY alias1.b;
            drop table t2;

            For with_recursive_reference, the relevant snippet is from cte_recursive.test:

             with recursive
            reached_objects as 
            (
            select v, 'init' as m from objects where v in ('v3','v7','v9')
            union
            select module_results.v, module_results.m from module_results, applied_modules
            where module_results.m = applied_modules.m
            ),
            applied_modules as
            (
            select * from modules where 1=0
            union
            select modules.m
            from
            modules
            where
            not exists (select * from module_arguments
            where module_arguments.m = modules.m and
            module_arguments.v not in 
            (select v from reached_objects))
            )
            select * from reached_objects

            There's IN in this test, and without the check in the decorrelate-in transformation, the test passes. I also tried the following test snippet and there's no problem without the check in place:

             with recursive
            ancestor_ids (id, generation)
            as
            (
            select father, 1 from folks where name = 'Me' and  father is not null
            union all
            select mother, 1 from folks where name = 'Me' and  mother is not null
            union all
            select father, fa.generation+1 from folks, ancestor_ids fa
            where folks.id = fa.id and father is not null and
            (father not in (select id from ancestor_ids))
            union all
            select mother, ma.generation+1 from folks, ancestor_ids ma
            where folks.id = ma.id and mother is not null and
            (mother not in (select id from ancestor_ids))
            )
            select generation, name from ancestor_ids a, folks
            where a.id = folks.id

            I can also see both exists2in_processor() functions have:

                    !(is_top_level_item() ||
                      (upper_not && upper_not->is_top_level_item())) ||    /* (2) */

            although the motivation behind this limitation is not clear.

            is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs.

            But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN, we don't care between TRUE/NULL results.
            This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result).
            But if we do differentiate, why can't we handle the subquery in an arbitrary context?

            Right, we don't have to worry about null when it is top level in, and with the added is not null in both the inner and outer where condition, the value of the outer where condition is exactly the same before and after the transformation. I wrote two possibilities[4] why we don't consider arbitrary context, which I quote here:

            With this modification, theoretically we don't have to care about
            whether the IN subselect is top level, however, there are two issues:

            1. When the subquery is indeed top level, do the extra terms lower
            performance?
            2. When the subquery is not top level, how do we substitute `this`
            which is an Item_in_subselect with the Item_func_and?

            I suspect this is why the exists2in transformation only does the
            modification when there is an upper_not, and the transformation is
            skipped when the subquery is not toplevel EXISTS nor toplevel NOT
            EXISTS.

            I think we can fix Item 1 by handling toplevel separately from non-toplevel, that is, by not adding the is not nulls when it is top level. But do you know how fix Item 2 psergei?

            [4] https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/

            The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer".

            This is not always true. Item_in_subselect is not always wrapped in an optimizer and it is ok (we don't need a similar check). For example you will get segfaults when you remove the if (optimizer) from the following[5] and test against the most basic case select * from t1 where a in (select a from t2 where t1.b=t2.b);

            // inside bool Item_in_subselect::exists2in_processor(void *opt_arg)
              /* Update any Item_in_optimizer wrapper if exists */
              if (optimizer)
              {
                optimizer->reset_cache();
                if (optimizer->fix_left(thd))
                {
                  res= TRUE;
                  goto out;
                }
              }

            [5] https://github.com/MariaDB/server/blob/62251786f8bb2084fe9169befdc569c24f5078f2/sql/item_subselect.cc#L3441

            A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2

            Thank you, I have incorporated the change, as well as the is_correlated field:

            https://github.com/MariaDB/server/commit/6175a4541ff

            ycp Yuchen Pei added a comment - - edited Looking at the next patch. Good to see the input was addressed. Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql The trivial correlation is where t1.b=t2.y the code puts t1.b IS NOT NULL at the top level (outside the subquery): "query_block": { "select_id": 1, "attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))" But then inside the subquery we have: { "full-scan-on-null_key": { "table": { "table_name": "t2", "access_type": "ref_or_null", "possible_keys": ["y"], "key": "y", "key_length": "5", "used_key_parts": ["y"], the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery. I did some debugging, and it seems to be caused by the IN subselect creating Item_func_trig_cond when it is not top level during stage 2 optimization [1] . It is avoided by the exists2in transformation because it marks the newly created IN subselect as toplevel [2] , even if there's an upper_not: in_subs->top_level_item(); [1] (backtrace): Item_in_subselect::create_row_in_to_exists_cond > Item_in_subselect::create_in_to_exists_cond > JOIN::choose_subquery_plan > make_join_statistics > JOIN::optimize_inner > JOIN::optimize > st_select_lex::optimize_unflattened_subqueries > JOIN::optimize_unflattened_subqueries > JOIN::optimize_stage2 > JOIN::optimize_inner > JOIN::optimize [2] Item_exists_subselect::exists2in_processor I tried to do the same thing with my decorrelate-in transformation [3] , but it fails the following test: CREATE TABLE t1 (pk INT NOT NULL , i INT NOT NULL ); INSERT INTO t1 VALUES (0,10), (1,20), (2,30), (3,40); CREATE TABLE t2a (pk INT NOT NULL , i INT NOT NULL , PRIMARY KEY (i,pk)); INSERT INTO t2a VALUES (0,0), (1,1), (2,2), (3,3); SELECT * FROM t1 WHERE ( NULL , 1) NOT IN ( SELECT t2a.i, t2a.pk FROM t2a WHERE t2a.pk = t1.pk); # outputs nothing, but should be (0, 10), (2, 30), (3, 40) [3] https://github.com/MariaDB/server/commit/373a97067bd I could not come up with a similar test for exists2in to check whether the forced toplevel would cause similar problems. In any case, this needs to be further investigated. Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy. Added a field in the trace whether the outcome is still correlated. AFAICT materialization, if applicable after the transformation, does show up in the trace. An example is the non-semijoin materialization example in the test [3] , where if you remove the ".to" in json_extract path, you will see materialization after the decorrelation has the possible field equal true. [3] https://github.com/MariaDB/server/blob/62251786f8bb2084fe9169befdc569c24f5078f2/mysql-test/main/subselect_decorrelate_in.test#L437 Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor(). Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks: (!optimizer || ... !first_select->leaf_tables.elements|| with_recursive_reference) Do we need to add these too? (I wonder what "!optimizer" check is for...) Looking at my notes, I don't know what these checks are for and did not find them necessary for my decorrelate-in transformation. The example snippet in the tests I found that covers !first_select->leaf_tables.elements is SELECT * FROM t2 AS alias1, t2 AS alias2 WHERE EXISTS ( SELECT 1 ) AND (alias2.pk = alias1.b ) ORDER BY alias1.b; I wrote a similar test for IN, and it got optimized # no need to check leaf tables CREATE TABLE t2 (pk INT PRIMARY KEY , b INT ) ENGINE=MyISAM; INSERT INTO t2 VALUES (1,1),(2,7); SELECT * FROM t2 AS alias1, t2 AS alias2 WHERE alias1.pk in ( SELECT 1 ) AND (alias2.pk = alias1.b ) ORDER BY alias1.b; drop table t2; For with_recursive_reference , the relevant snippet is from cte_recursive.test : with recursive reached_objects as ( select v, 'init' as m from objects where v in ( 'v3' , 'v7' , 'v9' ) union select module_results.v, module_results.m from module_results, applied_modules where module_results.m = applied_modules.m ), applied_modules as ( select * from modules where 1=0 union select modules.m from modules where not exists ( select * from module_arguments where module_arguments.m = modules.m and module_arguments.v not in ( select v from reached_objects)) ) select * from reached_objects There's IN in this test, and without the check in the decorrelate-in transformation, the test passes. I also tried the following test snippet and there's no problem without the check in place: with recursive ancestor_ids (id, generation) as ( select father, 1 from folks where name = 'Me' and father is not null union all select mother, 1 from folks where name = 'Me' and mother is not null union all select father, fa.generation+1 from folks, ancestor_ids fa where folks.id = fa.id and father is not null and (father not in ( select id from ancestor_ids)) union all select mother, ma.generation+1 from folks, ancestor_ids ma where folks.id = ma.id and mother is not null and (mother not in ( select id from ancestor_ids)) ) select generation, name from ancestor_ids a, folks where a.id = folks.id I can also see both exists2in_processor() functions have: !(is_top_level_item() || (upper_not && upper_not->is_top_level_item())) || /* (2) */ although the motivation behind this limitation is not clear. is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs. But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN , we don't care between TRUE/NULL results. This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result). But if we do differentiate, why can't we handle the subquery in an arbitrary context? Right, we don't have to worry about null when it is top level in, and with the added is not null in both the inner and outer where condition, the value of the outer where condition is exactly the same before and after the transformation. I wrote two possibilities [4] why we don't consider arbitrary context, which I quote here: With this modification, theoretically we don't have to care about whether the IN subselect is top level, however, there are two issues: 1. When the subquery is indeed top level, do the extra terms lower performance? 2. When the subquery is not top level, how do we substitute `this` which is an Item_in_subselect with the Item_func_and? I suspect this is why the exists2in transformation only does the modification when there is an upper_not, and the transformation is skipped when the subquery is not toplevel EXISTS nor toplevel NOT EXISTS. I think we can fix Item 1 by handling toplevel separately from non-toplevel, that is, by not adding the is not nulls when it is top level. But do you know how fix Item 2 psergei ? [4] https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/ The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer". This is not always true. Item_in_subselect is not always wrapped in an optimizer and it is ok (we don't need a similar check). For example you will get segfaults when you remove the if (optimizer) from the following [5] and test against the most basic case select * from t1 where a in (select a from t2 where t1.b=t2.b); // inside bool Item_in_subselect::exists2in_processor(void *opt_arg) /* Update any Item_in_optimizer wrapper if exists */ if (optimizer) { optimizer->reset_cache(); if (optimizer->fix_left(thd)) { res= TRUE; goto out; } } [5] https://github.com/MariaDB/server/blob/62251786f8bb2084fe9169befdc569c24f5078f2/sql/item_subselect.cc#L3441 A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2 Thank you, I have incorporated the change, as well as the is_correlated field: https://github.com/MariaDB/server/commit/6175a4541ff

            Thanks, I continue to look at the updated patch.

            psergei Sergei Petrunia added a comment - Thanks, I continue to look at the updated patch.

            One more observation for the old patch: looking at Item_exists_subselect::exists2in_and_is_not_nulls() and its "original" code I see a discrepancy:

            Old code

                  for (size_t i= 0; i < eqs.elements(); i++)
                  {
                    if (optimizer->arguments()[0]->maybe_null())
                      ...
            

            New code:

                  for (uint i= offset; i < left_exp->cols(); i++)
                  {
                    if (left_exp->element_index(i)->maybe_null())
            

            I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i]

            psergei Sergei Petrunia added a comment - One more observation for the old patch: looking at Item_exists_subselect::exists2in_and_is_not_nulls() and its "original" code I see a discrepancy: Old code for ( size_t i= 0; i < eqs.elements(); i++) { if (optimizer->arguments()[0]->maybe_null()) ... New code: for (uint i= offset; i < left_exp->cols(); i++) { if (left_exp->element_index(i)->maybe_null()) I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i]
            ycp Yuchen Pei added a comment -

            > I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i]

            Yes I think so.

            ycp Yuchen Pei added a comment - > I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i] Yes I think so.

            Ok, I was trying to see if I could get everything finished to catch the release train and pushed these:

            commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Mon Jun 19 11:27:51 2023 +0300
             
                Make the new tests --view-protocol friendly. Still don't pass.
             
            commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Sun Jun 18 23:38:09 2023 +0300
             
                Remove NULL handling of left expressions which cannot be NULL.
            
            

            However this still didn't resolve all the issues and now the release train has left the station.

            psergei Sergei Petrunia added a comment - Ok, I was trying to see if I could get everything finished to catch the release train and pushed these: commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2) Author: Sergei Petrunia <sergey@mariadb.com> Date: Mon Jun 19 11:27:51 2023 +0300   Make the new tests --view-protocol friendly. Still don't pass.   commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e Author: Sergei Petrunia <sergey@mariadb.com> Date: Sun Jun 18 23:38:09 2023 +0300   Remove NULL handling of left expressions which cannot be NULL. However this still didn't resolve all the issues and now the release train has left the station.

            I've also tried to fix this part, without success:

                /*
                  If we do not single out this case, and merge it with the >1
                  case, we would get an infinite loop in resolving
                  Item_direct_view_ref::real_item() like in MDEV-3881 when
                  left_exp has scalar type
                */
                else if (and_list->elements == 1)
                  exp= new (thd->mem_root) Item_cond_and(thd, and_list->head(), exp);
            

            I confirm that regular EXISTS-to-IN code hit this by accident. It handles the case of one equality in a special way, then the code path serving multiple equalities will also add multiple NOT NULL conditions even when they are redundant. If it added one condition, it would hit the same crash.

            The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself.

            psergei Sergei Petrunia added a comment - I've also tried to fix this part, without success: /* If we do not single out this case, and merge it with the >1 case, we would get an infinite loop in resolving Item_direct_view_ref::real_item() like in MDEV-3881 when left_exp has scalar type */ else if (and_list->elements == 1) exp = new (thd->mem_root) Item_cond_and(thd, and_list->head(), exp ); I confirm that regular EXISTS-to-IN code hit this by accident. It handles the case of one equality in a special way, then the code path serving multiple equalities will also add multiple NOT NULL conditions even when they are redundant. If it added one condition, it would hit the same crash. The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself.

            Ok let's return to this in the development cycle.

            psergei Sergei Petrunia added a comment - Ok let's return to this in the development cycle.
            ycp Yuchen Pei added a comment - - edited

            Thanks for the updates.

            commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date: Mon Jun 19 11:27:51 2023 +0300

            Make the new tests --view-protocol friendly. Still don't pass.

            Ok, I didn't know the test does not work with --view-protocol, because
            the new bb does not seem to have any builders running with this flag.

            I wonder what caused this, because it is not every non-empty
            json_extract output fails. The ones that do not fail include first PS
            executions, and in the cases where outer_exp has subquery or fulltext
            function, and the non-semijoin materialization.

            I looked around and found the "umbrella ticket" MDEV-27691 - do you
            know which of the included tickets covers this case?

            commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date: Sun Jun 18 23:38:09 2023 +0300

            Remove NULL handling of left expressions which cannot be NULL.

            Thanks, so that's how you fix the full-scan-on-null_key issue.

            The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself.

            I guess you mean substitute_for_best_equal_field(). Is there an
            MDEV for this?

            Ok let's return to this in the development cycle.

            Ok

            ycp Yuchen Pei added a comment - - edited Thanks for the updates. commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2) Author: Sergei Petrunia <sergey@mariadb.com> Date: Mon Jun 19 11:27:51 2023 +0300 Make the new tests --view-protocol friendly. Still don't pass. Ok, I didn't know the test does not work with --view-protocol, because the new bb does not seem to have any builders running with this flag. I wonder what caused this, because it is not every non-empty json_extract output fails. The ones that do not fail include first PS executions, and in the cases where outer_exp has subquery or fulltext function, and the non-semijoin materialization. I looked around and found the "umbrella ticket" MDEV-27691 - do you know which of the included tickets covers this case? commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e Author: Sergei Petrunia <sergey@mariadb.com> Date: Sun Jun 18 23:38:09 2023 +0300 Remove NULL handling of left expressions which cannot be NULL. Thanks, so that's how you fix the full-scan-on-null_key issue. The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself. I guess you mean substitute_for_best_equal_field() . Is there an MDEV for this? Ok let's return to this in the development cycle. Ok
            ycp Yuchen Pei added a comment -

            Follow up on the meeting from yesterday:

            • view-protocol: we wait for MDEV-31408 to be fixed before addressing
              this issue
            • arbitrary subquery context (not just toplevel in and toplevel not
              in): we will do this separately in MDEV-31510
            • self-referencing Item_direct_view_ref: this is the only issue
              blocking the present task at this moment. I will debug it when I get
              back to this task.
            ycp Yuchen Pei added a comment - Follow up on the meeting from yesterday: view-protocol: we wait for MDEV-31408 to be fixed before addressing this issue arbitrary subquery context (not just toplevel in and toplevel not in): we will do this separately in MDEV-31510 self-referencing Item_direct_view_ref : this is the only issue blocking the present task at this moment. I will debug it when I get back to this task.
            ycp Yuchen Pei added a comment - - edited

            As I'm getting back to this task, it just took me a while to reprod
            the Item_direct_view_ref issue, so I'll note it here: the triggered
            failure is on the test for MDEV-3881 in main.subselect_exists2in:

            --echo #
            --echo # MDEV-3881: Endless loop and crash in Item_ref::real_item with
            --echo # exists_to_in=on, NOT EXISTS subquery, merge view or from subquery,
            --echo # constant table
            --echo #
            SET optimizer_switch = 'exists_to_in=on';
             
            CREATE TABLE t1 (a INT) ENGINE=MyISAM;
            INSERT INTO t1 VALUES (1),(7);
             
            CREATE TABLE t2 (b INT) ENGINE=MyISAM;
            INSERT INTO t2 VALUES (8);
            CREATE ALGORITHM=MERGE VIEW v1 AS SELECT * FROM t2;
             
            CREATE TABLE t3 (c INT) ENGINE=MyISAM;
            INSERT INTO t3 VALUES (4),(6);
             
            SELECT * FROM t1, v1 WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b;
             
            drop view v1;
            drop table t1, t2, t3;
             
            CREATE TABLE t1 (a INT) ENGINE=MyISAM;
            INSERT INTO t1 VALUES (1),(7);
             
            CREATE TABLE t2 (b INT) ENGINE=MyISAM;
            INSERT INTO t2 VALUES (8);
             
            CREATE TABLE t3 (c INT) ENGINE=MyISAM;
            INSERT INTO t3 VALUES (4),(6);
             
            SELECT * FROM t1, ( SELECT * FROM t2 ) alias WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b;
             
            drop table t1, t2, t3;

            ycp Yuchen Pei added a comment - - edited As I'm getting back to this task, it just took me a while to reprod the Item_direct_view_ref issue, so I'll note it here: the triggered failure is on the test for MDEV-3881 in main.subselect_exists2in: --echo # --echo # MDEV-3881: Endless loop and crash in Item_ref::real_item with --echo # exists_to_in=on, NOT EXISTS subquery, merge view or from subquery, --echo # constant table --echo # SET optimizer_switch = 'exists_to_in=on' ;   CREATE TABLE t1 (a INT ) ENGINE=MyISAM; INSERT INTO t1 VALUES (1),(7);   CREATE TABLE t2 (b INT ) ENGINE=MyISAM; INSERT INTO t2 VALUES (8); CREATE ALGORITHM=MERGE VIEW v1 AS SELECT * FROM t2;   CREATE TABLE t3 (c INT ) ENGINE=MyISAM; INSERT INTO t3 VALUES (4),(6);   SELECT * FROM t1, v1 WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b;   drop view v1; drop table t1, t2, t3;   CREATE TABLE t1 (a INT ) ENGINE=MyISAM; INSERT INTO t1 VALUES (1),(7);   CREATE TABLE t2 (b INT ) ENGINE=MyISAM; INSERT INTO t2 VALUES (8);   CREATE TABLE t3 (c INT ) ENGINE=MyISAM; INSERT INTO t3 VALUES (4),(6);   SELECT * FROM t1, ( SELECT * FROM t2 ) alias WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b;   drop table t1, t2, t3;
            ycp Yuchen Pei added a comment -

            So the infinite recursion is caused by self-referencing, caused by the
            transformer replace_equal_field. The goal of the transformer
            Item::replace_equal_field() is to replace fields with equivalent
            ones, as an optimisation. Currently two Item subclasses implement this
            transformer nontrivially, Item_field and Item_direct_view_ref.
            The Item_direct_view_ref implementation recurses to the
            Item_field implementation, but not vice versa, which is good
            because otherwise the infinite self-referencing would occur at this
            stage.

            In the infinite recursion scenario, we have the following chain of
            walking down the item tree during the transformation, where each
            vertex is a pari of item type and dbug_print output, and each edge /
            arrow is annotated by the operation that goes from vertext to the
            next.

            (Item_direct_view_ref*) 8 -.ref-> (Item**) -deref-> (Item_field*) 8
            -.item_equal-> (Item_equal*) multiple_equal(8, t1.a) -.get_const()->
            (Item_direct_view_ref*) 8 -.ref-> (Item**) same as above.

            The first and the last {{Item_direct_view_ref*}}s have the same value
            in their ref field. By replacing the Item_field* with the
            second Item_direct_view_ref* we get the self-referencing that
            causes the infinite real_item() call.

            There's no guarantee that if such a self-reference exists it would
            have length 1 like above. So to be safe we may need to try to detect
            self-reference of arbitrary length, which is what I did in [1], but
            I am not sure how much this extra check affects performance.
            Alternatively we can resort to a length-1 check only, and update the
            fix if that proves insufficient. What do you think psergei?

            For all changes squashed into one commit, see [2].

            [1] https://github.com/MariaDB/server/commit/94172e01d63
            [2] https://github.com/MariaDB/server/commit/0bf54aef789

            ycp Yuchen Pei added a comment - So the infinite recursion is caused by self-referencing, caused by the transformer replace_equal_field . The goal of the transformer Item::replace_equal_field() is to replace fields with equivalent ones, as an optimisation. Currently two Item subclasses implement this transformer nontrivially, Item_field and Item_direct_view_ref . The Item_direct_view_ref implementation recurses to the Item_field implementation, but not vice versa, which is good because otherwise the infinite self-referencing would occur at this stage. In the infinite recursion scenario, we have the following chain of walking down the item tree during the transformation, where each vertex is a pari of item type and dbug_print output, and each edge / arrow is annotated by the operation that goes from vertext to the next. (Item_direct_view_ref*) 8 -.ref-> (Item**) -deref-> (Item_field*) 8 -.item_equal-> (Item_equal*) multiple_equal(8, t1.a) -.get_const()-> (Item_direct_view_ref*) 8 -.ref-> (Item**) same as above. The first and the last {{Item_direct_view_ref*}}s have the same value in their ref field. By replacing the Item_field* with the second Item_direct_view_ref* we get the self-referencing that causes the infinite real_item() call. There's no guarantee that if such a self-reference exists it would have length 1 like above. So to be safe we may need to try to detect self-reference of arbitrary length, which is what I did in [1] , but I am not sure how much this extra check affects performance. Alternatively we can resort to a length-1 check only, and update the fix if that proves insufficient. What do you think psergei ? For all changes squashed into one commit, see [2] . [1] https://github.com/MariaDB/server/commit/94172e01d63 [2] https://github.com/MariaDB/server/commit/0bf54aef789
            ycp Yuchen Pei added a comment -

            Note from the meeting yesterday:

            • We will simplify the MDEV-31269 workaround condition to simply
              skipping on !thd->stmt_arena->is_conventional()
            • I will investigate further the cause of the self-referencing
              Item_direct_view_ref
            ycp Yuchen Pei added a comment - Note from the meeting yesterday: We will simplify the MDEV-31269 workaround condition to simply skipping on !thd->stmt_arena->is_conventional() I will investigate further the cause of the self-referencing Item_direct_view_ref
            ycp Yuchen Pei added a comment - - edited

            OK, here's another fix (h/t Johnston for discussions):

            https://github.com/MariaDB/server/commit/c40d4706010

            (squashed version: https://github.com/MariaDB/server/commit/cabd6b79922
            which also includes the simplified workaround for MDEV-31269
            mentioned in the previous comment.)

            At this moment a major issue in fixing the infinite recusion issue is
            the meaning of the Item_field::equal_field and its twin
            Item_direct_view_ref::equal_field. A natural guess of the meaning
            of Item_field::equal_field may be "a pointer to the multiple
            equality the field belongs to", but this guess breaks apart at the
            following function, where the Item_direct_view_ref steals the
            equal_field from the Item_field it points to:

            Item *Item_direct_view_ref::propagate_equal_fields(THD *thd,
                                                               const Context &ctx,
                                                               COND_EQUAL *cond)
            {
              Item *field_item= real_item();
              if (field_item->type() != FIELD_ITEM)
                return this;
              Item *item= field_item->propagate_equal_fields(thd, ctx, cond);
              set_item_equal(field_item->get_item_equal());
              field_item->set_item_equal(NULL);
              if (item != field_item)
                return item;
              return this;
            }
            

            So the assumption can be adjusted to "let x be the multiple equality
            an Item_field belongs to. If the Item_field is not referenced by an
            Item_direct_view_ref, then its equal_field points to x, otherwise its
            equal_field is NULL and the equal_field of Item_direct_view_ref points
            to x". And this second guess seems to make sense in light of the
            following function, where the Item_direct_view_ref temporarily
            lends its own item_equal to the real_item(), before restoring it to
            NULL.

            Item *Item_direct_view_ref::replace_equal_field(THD *thd, uchar *arg)
            {
              Item *field_item= real_item();
              if (field_item->type() != FIELD_ITEM)
                return this;
              field_item->set_item_equal(item_equal);
              Item *item= field_item->replace_equal_field(thd, arg);
              field_item->set_item_equal(0);
              return item != field_item ? item : this;
            }
            

            So the above fix patches Item_ref::propagate_equal_fields() so
            that it delegates to the propagate_equal_fields() of the item it
            references, so that when it references to an Item_direct_view_ref,
            it will let the latter set the correct equal_field.

            psergei can you comment on this assumption and/or the new fix?

            ycp Yuchen Pei added a comment - - edited OK, here's another fix (h/t Johnston for discussions): https://github.com/MariaDB/server/commit/c40d4706010 (squashed version: https://github.com/MariaDB/server/commit/cabd6b79922 which also includes the simplified workaround for MDEV-31269 mentioned in the previous comment.) At this moment a major issue in fixing the infinite recusion issue is the meaning of the Item_field::equal_field and its twin Item_direct_view_ref::equal_field . A natural guess of the meaning of Item_field::equal_field may be "a pointer to the multiple equality the field belongs to", but this guess breaks apart at the following function, where the Item_direct_view_ref steals the equal_field from the Item_field it points to: Item *Item_direct_view_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { Item *field_item= real_item(); if (field_item->type() != FIELD_ITEM) return this ; Item *item= field_item->propagate_equal_fields(thd, ctx, cond); set_item_equal(field_item->get_item_equal()); field_item->set_item_equal(NULL); if (item != field_item) return item; return this ; } So the assumption can be adjusted to "let x be the multiple equality an Item_field belongs to. If the Item_field is not referenced by an Item_direct_view_ref, then its equal_field points to x, otherwise its equal_field is NULL and the equal_field of Item_direct_view_ref points to x". And this second guess seems to make sense in light of the following function, where the Item_direct_view_ref temporarily lends its own item_equal to the real_item(), before restoring it to NULL. Item *Item_direct_view_ref::replace_equal_field(THD *thd, uchar *arg) { Item *field_item= real_item(); if (field_item->type() != FIELD_ITEM) return this ; field_item->set_item_equal(item_equal); Item *item= field_item->replace_equal_field(thd, arg); field_item->set_item_equal(0); return item != field_item ? item : this ; } So the above fix patches Item_ref::propagate_equal_fields() so that it delegates to the propagate_equal_fields() of the item it references, so that when it references to an Item_direct_view_ref , it will let the latter set the correct equal_field . psergei can you comment on this assumption and/or the new fix?

            So the assumption can be adjusted to "let x be the multiple equality
            an Item_field belongs to. If the Item_field is not referenced by an
            Item_direct_view_ref, then its equal_field points to x, otherwise its
            equal_field is NULL and the equal_field of Item_direct_view_ref points
            to x".

            Having studied the topic a little, this suggestion makes sense to me, too.
            Let me continue to study the patch.

            psergei Sergei Petrunia added a comment - So the assumption can be adjusted to "let x be the multiple equality an Item_field belongs to. If the Item_field is not referenced by an Item_direct_view_ref, then its equal_field points to x, otherwise its equal_field is NULL and the equal_field of Item_direct_view_ref points to x". Having studied the topic a little, this suggestion makes sense to me, too. Let me continue to study the patch.
            psergei Sergei Petrunia added a comment - - edited

            ycp, the patch changes one piece of cryptic code in Item_ref::propagate_equal_fields for another.
            It's not clear what the new code does.

            Looking at the new code:

            Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,
                                                   COND_EQUAL *cond)
            {  
              Item *derefed = *ref;
              if (derefed->type() != REF_ITEM)
              {
                derefed= real_item();
            

            Suppose *ref points something that's not an Item_ref or its descendant.
            Then, derefed->type() != REF_ITEM, if branch is taken.

            But when *ref is not an Item_ref or its descendant, real_item() will return *ref.
            What is the point of this line then:

                derefed= real_item();
            

            ?
            (I'll continue looking at this)

            psergei Sergei Petrunia added a comment - - edited ycp , the patch changes one piece of cryptic code in Item_ref::propagate_equal_fields for another. It's not clear what the new code does. Looking at the new code: Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { Item *derefed = * ref ; if (derefed->type() != REF_ITEM) { derefed= real_item(); Suppose *ref points something that's not an Item_ref or its descendant. Then, derefed->type() != REF_ITEM , if branch is taken. But when *ref is not an Item_ref or its descendant, real_item() will return *ref . What is the point of this line then: derefed= real_item(); ? (I'll continue looking at this)
            psergei Sergei Petrunia added a comment - - edited

            Experimenting, I've made this change (to the latest code with ycp's changes) and it didn't have any effect on mtr :

            diff --git a/sql/item.cc b/sql/item.cc
            index fa05a90c011..d7615521fc6 100644
            --- a/sql/item.cc
            +++ b/sql/item.cc
            @@ -9416,12 +9416,14 @@ Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,
                                                    COND_EQUAL *cond)
             {
               Item *derefed = *ref;
            +#if 0  
               if (derefed->type() != REF_ITEM)
               {
                 derefed= real_item();
                 if (derefed->type() != FIELD_ITEM)
                   return this;
               }
            +#endif  
               Item *item= derefed->propagate_equal_fields(thd, ctx, cond);
               if (item != derefed)
                 return item;
            

            I don't yet understand the purpose of the code that's #ifdef-ed away...

            psergei Sergei Petrunia added a comment - - edited Experimenting, I've made this change (to the latest code with ycp 's changes) and it didn't have any effect on mtr : diff --git a/sql/item.cc b/sql/item.cc index fa05a90c011..d7615521fc6 100644 --- a/sql/item.cc +++ b/sql/item.cc @@ -9416,12 +9416,14 @@ Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { Item *derefed = *ref; +#if 0 if (derefed->type() != REF_ITEM) { derefed= real_item(); if (derefed->type() != FIELD_ITEM) return this; } +#endif Item *item= derefed->propagate_equal_fields(thd, ctx, cond); if (item != derefed) return item; I don't yet understand the purpose of the code that's #ifdef-ed away...

            Looking at the code in Item_ref::propagate_equal_fields():

            Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,
                                                   COND_EQUAL *cond)
            {
              ...  
              Item *item= field_item->propagate_equal_fields(thd, ctx, cond);
             
              if (item != field_item)
                return item;  // (POINT1)
              return this;
            }
            

            Suppose the object this is called for is an Item_ref (not Item_direct_ref).

            Item_ref objects are used to "wrap" items when they are used in the post-grouping context and one needs to use result calls to get the value like it is done for example in Item_ref::val_int:

            longlong Item_ref::val_int()
            {
              ...
              longlong tmp=(*ref)->val_int_result();
              return tmp;
            }
            

            But if Item_ref::propagate_equal_fields() substitutes itself for the item it has got from (*ref), as it does on he line marked with (POINT1)? Conversion from val() calls to val*result() calls won't be done anymore...

            psergei Sergei Petrunia added a comment - Looking at the code in Item_ref::propagate_equal_fields(): Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { ... Item *item= field_item->propagate_equal_fields(thd, ctx, cond);   if (item != field_item) return item; // (POINT1) return this ; } Suppose the object this is called for is an Item_ref (not Item_direct_ref). Item_ref objects are used to "wrap" items when they are used in the post-grouping context and one needs to use result calls to get the value like it is done for example in Item_ref::val_int: longlong Item_ref::val_int() { ... longlong tmp=(* ref )->val_int_result(); return tmp; } But if Item_ref::propagate_equal_fields() substitutes itself for the item it has got from (*ref), as it does on he line marked with (POINT1)? Conversion from val() calls to val*result() calls won't be done anymore...

            .. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution

              if (item != field_item)
                return item;  // (POINT1)
            

            is only taken when the item is a constant item (for constants, val() and val_result() are the same).

            psergei Sergei Petrunia added a comment - .. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution if (item != field_item) return item; // (POINT1) is only taken when the item is a constant item (for constants, val() and val_result() are the same).
            ycp Yuchen Pei added a comment - - edited

            > But when *ref is not an Item_ref or its descendant, real_item() will return *ref.

            I didn't know this - is it documented somewhere?

            In any case, with this assertion, my changed version is equivalent to

            Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,
                                                   COND_EQUAL *cond)
            {
              Item *derefed = *ref;
              if (derefed->type() != REF_ITEM && derefed->type() != FIELD_ITEM)
                return this;
              Item *item= derefed->propagate_equal_fields(thd, ctx, cond);
              if (item != derefed)
                return item;
              return this;
            }

            which I include in an updated commit at
            https://github.com/MariaDB/server/commit/3eeab3c4508.

            psergei I wonder whether this version is still cryptic? My
            explanation is that it adds inside
            Item_ref::propagate_equal_fields() a handling of the case where
            *ref points to another Item_ref, by delegating to the
            propagate_equal_fields() of the referenced Item_ref, so that
            the result is consistent for the latter. By consistency I mean for
            example these two cases:

            1. Item_ref *ref> Item_direct_view_ref *ref> Item_field
            2. Item_func *args> Item_direct_view_ref *ref> Item_field

            Both should result in the same item_equal for the
            Item_direct_view_ref.

            > Experimenting, I've made this change (to the latest code with ycp's changes) and it didn't have any effect on mtr :

            Given the above, does this mean that the ref of an Item_ref
            can only point to either a REF_ITEM or a FIELD_ITEM? Perhaps
            this is another assertion I'm not aware of?

            > .. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution ... is only taken when the item is a constant item (for constants, val() and val_result() are the same).

            This sounds right, as Item_field::propagate_equal_fields() only
            returns something new if the new thing is the result of
            field->get_equal_const_item(thd, ctx, item). Just to validate
            this assumption, I added assertions in
            Item_ref::propagate_equal_fields() and
            Item_direct_view_ref::propagate_equal_fields() on top of my
            commit[1]:

            if (item == derefed)
              item= this;
            DBUG_ASSERT(item->type() == REF_ITEM || item->const_item());
            return item;

            [1] https://github.com/MariaDB/server/commit/ca390e57642

            All main.subselect* tests still pass. Shall we add this assertion?

            The squashed version of all these changes, including the assertion
            addition is at https://github.com/MariaDB/server/commit/4830c70cdb8

            ycp Yuchen Pei added a comment - - edited > But when *ref is not an Item_ref or its descendant, real_item() will return *ref . I didn't know this - is it documented somewhere? In any case, with this assertion, my changed version is equivalent to Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, COND_EQUAL *cond) { Item *derefed = * ref ; if (derefed->type() != REF_ITEM && derefed->type() != FIELD_ITEM) return this ; Item *item= derefed->propagate_equal_fields(thd, ctx, cond); if (item != derefed) return item; return this ; } which I include in an updated commit at https://github.com/MariaDB/server/commit/3eeab3c4508 . psergei I wonder whether this version is still cryptic? My explanation is that it adds inside Item_ref::propagate_equal_fields() a handling of the case where *ref points to another Item_ref , by delegating to the propagate_equal_fields() of the referenced Item_ref , so that the result is consistent for the latter. By consistency I mean for example these two cases: 1. Item_ref *ref > Item_direct_view_ref *ref > Item_field 2. Item_func *args > Item_direct_view_ref *ref > Item_field Both should result in the same item_equal for the Item_direct_view_ref . > Experimenting, I've made this change (to the latest code with ycp 's changes) and it didn't have any effect on mtr : Given the above, does this mean that the ref of an Item_ref can only point to either a REF_ITEM or a FIELD_ITEM ? Perhaps this is another assertion I'm not aware of? > .. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution ... is only taken when the item is a constant item (for constants, val() and val_result() are the same). This sounds right, as Item_field::propagate_equal_fields() only returns something new if the new thing is the result of field->get_equal_const_item(thd, ctx, item) . Just to validate this assumption, I added assertions in Item_ref::propagate_equal_fields() and Item_direct_view_ref::propagate_equal_fields() on top of my commit [1] : if (item == derefed) item= this ; DBUG_ASSERT(item->type() == REF_ITEM || item->const_item()); return item; [1] https://github.com/MariaDB/server/commit/ca390e57642 All main.subselect* tests still pass. Shall we add this assertion? The squashed version of all these changes, including the assertion addition is at https://github.com/MariaDB/server/commit/4830c70cdb8
            ycp Yuchen Pei added a comment - - edited

            BTW, the simplification of the MDEV-31269 workaround[1] also causes
            test failures with --ps-protocol[2]:

            [amd64-ubuntu-2204-debug-ps | failed test (failure)]
            json.json_table
            main.explain_json
            main.group_min_max
            main.index_merge_myisam
            main.innodb_ext_key
            main.innodb_ext_key
            main.innodb_ext_key
            main.innodb_ext_key
            main.multi_update
            main.opt_trace
            main.subselect
            main.subselect4
            main.subselect_decorrelate_in
            main.subselect_exists2in
            main.subselect_exists2in_costmat
            main.subselect_extra_no_semijoin
            main.subselect_mat
            main.subselect_mat_cost_bugs
            main.subselect_sj2_mat

            [1] https://github.com/MariaDB/server/commit/ffc61fb039e
            [2] https://buildbot.mariadb.org/#/builders/534/builds/8770

            When I revert the change [1] above all these tests pass.

            psergei, shall we update these tests with --disable-ps-protocol and
            --enable-ps-protocol, or shall we revert to the more expensive
            workaround?

            ycp Yuchen Pei added a comment - - edited BTW, the simplification of the MDEV-31269 workaround [1] also causes test failures with --ps-protocol [2] : [amd64-ubuntu-2204-debug-ps | failed test (failure)] json.json_table main.explain_json main.group_min_max main.index_merge_myisam main.innodb_ext_key main.innodb_ext_key main.innodb_ext_key main.innodb_ext_key main.multi_update main.opt_trace main.subselect main.subselect4 main.subselect_decorrelate_in main.subselect_exists2in main.subselect_exists2in_costmat main.subselect_extra_no_semijoin main.subselect_mat main.subselect_mat_cost_bugs main.subselect_sj2_mat [1] https://github.com/MariaDB/server/commit/ffc61fb039e [2] https://buildbot.mariadb.org/#/builders/534/builds/8770 When I revert the change [1] above all these tests pass. psergei , shall we update these tests with --disable-ps-protocol and --enable-ps-protocol , or shall we revert to the more expensive workaround?

            ycp, possible problem is in the 2nd execution of prepare statement (now ps-protocol execute prepare statement twice and returns only the results of the second execution ). You can disable the second execution in testcases with "--disable-ps2-protocol"

            lstartseva Lena Startseva added a comment - ycp , possible problem is in the 2nd execution of prepare statement (now ps-protocol execute prepare statement twice and returns only the results of the second execution ). You can disable the second execution in testcases with "--disable-ps2-protocol"
            ycp Yuchen Pei added a comment - - edited

            Thanks lstartseva, are the problem and workaround you mentioned
            concerning the failures with testing with --ps-protocol as mentioned
            in my previous comment? If so, following up on the optimizer
            meeting, psergei suggested that we make the skip condition a bit
            more strict, by also checking for ref items, to avoid the failures
            with --ps-protocol.

            And here's the updated commit. It checks if either side of any of
            the chosen equalities is a ref item, and skip the transformation if
            so: https://github.com/MariaDB/server/commit/b2b0d16b68d

            As a result, I had to do a --disable_ps_protocol and
            --enable_ps_protocol in two explain extended cases in
            main.subselect_exists2in.

            ycp Yuchen Pei added a comment - - edited Thanks lstartseva , are the problem and workaround you mentioned concerning the failures with testing with --ps-protocol as mentioned in my previous comment? If so, following up on the optimizer meeting, psergei suggested that we make the skip condition a bit more strict, by also checking for ref items, to avoid the failures with --ps-protocol. And here's the updated commit. It checks if either side of any of the chosen equalities is a ref item, and skip the transformation if so: https://github.com/MariaDB/server/commit/b2b0d16b68d As a result, I had to do a --disable_ps_protocol and --enable_ps_protocol in two explain extended cases in main.subselect_exists2in .

            A self-contained explanation of how we get a loop of Item_direct_view_ref objects:

            Testcase

            CREATE TABLE t1 ( a INT );
            INSERT INTO t1 VALUES (1),(5);
            CREATE TABLE t2 ( b INT ) ENGINE=MyISAM;
            INSERT INTO t2 VALUES (1);
             
            CREATE TABLE t3 ( c INT );
            INSERT INTO t3 VALUES (4),(5);
            SELECT *  
            FROM
               t1,    
               ( SELECT * FROM t2 ) alias
            WHERE
               NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b;
            

            Here, NOT EXISTS is converted into NOT IN. This rewrite also adds left_expresssion IS NOT NULL so the NOT EXISTS becomes:

              "!(<in_optimizer>(t2.b,<exists>(subquery#3)) and t2.b is not null)"
            

            The "t2.b is not null" has this structure:

               Item_func_is_not_null(
                 Item_direct_ref(    // denote this as $ITEM_DIRECT_REF
                   Item_direct_view_ref(
                     Item_field("t2.b") //  // denote this $IT_FLD
                   )
                 )
            

            The top-level WHERE clause is used to construct Item_equal which includes

            • Item_field(t1.a)
            • Item_direct_view_ref(Item_field(alias.b)) // denote ITEM_DIRECT_REF_2

            Then, Item_ref::propagate_equal_fields() is called for $ITEM_DIRECT_REF object

            The code does this (comments are mine):

              // This will assign field_item = $IT_FLD
              Item *field_item= real_item();
             
              // This will make $IT_FLD->item_equal point to multiple equality.
              // This is already wrong as there can be multiple Item_direct_view_ref
              //  objects in different contexts which refer to the same Item_field object.
              //  See commit comment for 8d9dd21d for details.
              Item *item= field_item->propagate_equal_fields(thd, ctx, cond);
            

            Then, Item_equal::update_const() sets another the $ITEM_DIRECT_REF_2 Item_direct_view_ref object to be the "the const item".

            Then, substitute_for_best_equal_field() calls

            Item_ref::transform(..., &Item::replace_equal_field) 
            

            for $ITEM_DIRECT_REF object.

            It eventually calls:

              Item *new_item= (*ref)->transform(thd, transformer, arg);
            

            Here, *ref points to $IT_FLD Item_field.
            new_item is now $ITEM_DIRECT_REF_2, the constant Item_direct_view_ref.
            Then, we do this:

              if (*ref != new_item)
                thd->change_item_tree(ref, new_item);
            

            And this creates a loop because "ref" is the VIEW's field translation entry. Which now points to an Item_direct_view_ref which has ref pointing to the VIEW's field translation...

            psergei Sergei Petrunia added a comment - A self-contained explanation of how we get a loop of Item_direct_view_ref objects: Testcase CREATE TABLE t1 ( a INT ); INSERT INTO t1 VALUES (1),(5); CREATE TABLE t2 ( b INT ) ENGINE=MyISAM; INSERT INTO t2 VALUES (1);   CREATE TABLE t3 ( c INT ); INSERT INTO t3 VALUES (4),(5); SELECT * FROM t1, ( SELECT * FROM t2 ) alias WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b; Here, NOT EXISTS is converted into NOT IN. This rewrite also adds left_expresssion IS NOT NULL so the NOT EXISTS becomes: "!(<in_optimizer>(t2.b,<exists>(subquery#3)) and t2.b is not null)" The "t2.b is not null" has this structure: Item_func_is_not_null( Item_direct_ref( // denote this as $ITEM_DIRECT_REF Item_direct_view_ref( Item_field("t2.b") // // denote this $IT_FLD ) ) The top-level WHERE clause is used to construct Item_equal which includes Item_field(t1.a) Item_direct_view_ref(Item_field(alias.b)) // denote ITEM_DIRECT_REF_2 Then, Item_ref::propagate_equal_fields() is called for $ITEM_DIRECT_REF object The code does this (comments are mine): // This will assign field_item = $IT_FLD Item *field_item= real_item();   // This will make $IT_FLD->item_equal point to multiple equality. // This is already wrong as there can be multiple Item_direct_view_ref // objects in different contexts which refer to the same Item_field object. // See commit comment for 8d9dd21d for details. Item *item= field_item->propagate_equal_fields(thd, ctx, cond); Then, Item_equal::update_const() sets another the $ITEM_DIRECT_REF_2 Item_direct_view_ref object to be the "the const item". Then, substitute_for_best_equal_field() calls Item_ref::transform(..., &Item::replace_equal_field) for $ITEM_DIRECT_REF object. It eventually calls: Item *new_item= (*ref)->transform(thd, transformer, arg); Here, *ref points to $IT_FLD Item_field. new_item is now $ITEM_DIRECT_REF_2, the constant Item_direct_view_ref. Then, we do this: if (*ref != new_item) thd->change_item_tree(ref, new_item); And this creates a loop because "ref" is the VIEW's field translation entry. Which now points to an Item_direct_view_ref which has ref pointing to the VIEW's field translation...
            ycp Yuchen Pei added a comment - - edited

            Some comments on 2nd ps segfault workarounds.

            trade-offs:

            • the more cases it skips the transformation, the more false
              positives there are, and the more ps-protocols it will fail
            • the less cases it skips the transformation, the more false
              negatives there are, and the more likely the 2nd ps segfault will
              occur

            tried workarounds:
            1. skip if not conventional: too many false positives,
            https://github.com/MariaDB/server/commit/ffc61fb039e
            2. skip if not conventional and either side is ref: too many false
            positives, https://github.com/MariaDB/server/commit/b2b0d16b68d
            3. skip if not conventional and any item in the trees is ref
            (mentioned and tried by psergei): too many false positives
            4. skip if not conventional and either side intersects free list:
            seems to work, less work than "skip if any item in the trees
            intersects free list", but more work,
            https://github.com/MariaDB/server/commit/28de41f9b9b
            5. skip if not conventional and any item in the trees intersects free
            list: the original check, seems to be the most accurate, but also
            requiring the most work, with the apparent downside of performance,
            see e.g. https://github.com/MariaDB/server/commit/a87f22770b for a
            implementation

            In terms of time required to check: 1 < 2 < 3,4 < 5.

            In terms of accuracy, only 4 and 5 do not cause massive ps-protocol
            failures.

            ycp Yuchen Pei added a comment - - edited Some comments on 2nd ps segfault workarounds. trade-offs: the more cases it skips the transformation, the more false positives there are, and the more ps-protocols it will fail the less cases it skips the transformation, the more false negatives there are, and the more likely the 2nd ps segfault will occur tried workarounds: 1. skip if not conventional: too many false positives, https://github.com/MariaDB/server/commit/ffc61fb039e 2. skip if not conventional and either side is ref: too many false positives, https://github.com/MariaDB/server/commit/b2b0d16b68d 3. skip if not conventional and any item in the trees is ref (mentioned and tried by psergei ): too many false positives 4. skip if not conventional and either side intersects free list: seems to work, less work than "skip if any item in the trees intersects free list", but more work, https://github.com/MariaDB/server/commit/28de41f9b9b 5. skip if not conventional and any item in the trees intersects free list: the original check, seems to be the most accurate, but also requiring the most work, with the apparent downside of performance, see e.g. https://github.com/MariaDB/server/commit/a87f22770b for a implementation In terms of time required to check: 1 < 2 < 3,4 < 5. In terms of accuracy, only 4 and 5 do not cause massive ps-protocol failures.
            psergei Sergei Petrunia added a comment - - edited

            Status update from the calls this week:

            • Igor seems to be fine with "A self-contained explanation of how we get a loop of Item_direct_view_ref" above.
              • The bug is apparently in the old (pre-MDEV-22534) code. Can one construct a testcase for it?
              • SergeiP: I had no success in constructing such testcase so far, perhaps it's because Item_direct_ref objects are used in a limited number of scenarios in the server.
            • Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the same bug? what happens if there are nested views? SergeiP: No it doesn't have the same bug. I think the code works correctly there. Can't explain off the top of the head.
            psergei Sergei Petrunia added a comment - - edited Status update from the calls this week: Igor seems to be fine with "A self-contained explanation of how we get a loop of Item_direct_view_ref" above. The bug is apparently in the old (pre- MDEV-22534 ) code. Can one construct a testcase for it? SergeiP: I had no success in constructing such testcase so far, perhaps it's because Item_direct_ref objects are used in a limited number of scenarios in the server. Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the same bug? what happens if there are nested views? SergeiP: No it doesn't have the same bug. I think the code works correctly there. Can't explain off the top of the head. Igor: the workarounds for MDEV-31269 are bad, I have a patch for it here: https://github.com/MariaDB/server/commit/34083cf34bc26a5cd95e56feffd01466f9f4917f . It's targeted at 10.4, has no description yet and is large. Igor suggested that MDEV-22434 code is rebased on top of that commit (but didn't clarify what for?) Discussed this with Serg, no outcome so far.
            ycp Yuchen Pei added a comment - - edited

            This is a reply to a message from igor on slack about the
            relation between this task and MDEV-3881. Messages in commits to
            MDEV-22534 reference MDEV-3881 because the issues seen in MDEV-3881
            are encountered in MDEV-22534. They have the same symptom, but
            nobody can say for sure whether it is the same issue caused by the
            same problem because MDEV-3881 is so ancient. Now let's assume it is
            the same.

            Looking at the fix for MDEV-3881, I cannot verify it myself whether
            the commit did fix it. First, it is not even clear which commit was
            meant as a fix for MDEV-3881. I suspect it is
            e3ac306157ab9ade137c9afc9fff270a2f50d7ec[0] because that is where
            the MDEV-3881 test case was added. But as you can see, it was also
            the commit that introduced the exists-to-in transformation. The
            bug relied on exists-to-in to manifest, so in this case, it is not
            even possible to determine which version of the feature that caused
            the bug, let alone studying the fix. Second, it was a commit from 10
            years ago, and even the author could not recall what was the fix,
            see the thread[1].

            On the face value, my suspicion is that the bug was not fixed, and
            we have just been relying on a workaround (see below) to avoid it.
            Or maybe it was fixed then, but some new changes over the last 10
            years caused a regression (e.g. the Gallina commit mentioned by
            psergei). In any case, I don't think archaeology helps fix this
            bug. In the current codebase, the exists-to-in code relies on a
            (perhaps unintentional) workaround to avoid the bug, which is, when
            the subselect is a toplevel NOT, we need to AND the new IN subquery
            with IS NOT NULLs because 3-value logic is funny[2]. When doing so,
            the exists2in processor constructs the AND item in different orders
            depending on the number of picked-out equalities. If there is only
            one, then the AND item has the IS NOT NULL item as the first
            argument. If there is more than one, then the AND item has the IN
            subquery as the first argument. There's no reason the two cases
            should be treated differently, and the infinite loop happens when we
            try to unify the two cases and always have the IN subquery as the
            first argument, see for example [1]. Note that this was already the
            case in the initial exists-to-in commit[0]. I also made a minimal
            change that has nothing to do with the main task in MDEV-22534 to
            reproduce the issue[3], just run the test
            main.subselect_exists2in that contains the MDEV-3881 test. We
            can keep the workaround in the new transformation introduced in this
            ticket, which is the case in earlier patches, but I was told by
            psergei and monty that it is generally not a good idea have a
            workaround like this unless there is a compelling reason.

            By the way, the commit [3] (and [5] for a 10.4 version) below also
            answers the following question in the previous comment. The testcase
            is main.subselect_exists2in, more specifically the MDEV-3881
            part in that test, also quoted in comment[4] above.

            > ** The bug is apparently in the old (pre-MDEV-22534) code. Can one
            > construct a testcase for it?

            As to the question why swapping the orders matter, here's an
            explanation, using the self-contained example given by psergei
            above.

            Exists2in transforms NOT EXISTS ( SELECT * FROM t3 WHERE c = b )
            to

            !(Item_in_optimizer and Item_func_is_not_null) # patched version

            or

            !(Item_func_is_not_null and Item_in_optimizer) # original version

            where we have the following structures for the two items in the
            conjunction

               Item_func_is_not_null(
                 Item_direct_ref(    // denote this r
                   Item_direct_view_ref( // denote this vr
                     Item_field("t2.b") // denote this fld
                   )
                 )

            and

               Item_in_optimizer(
                 Item_direct_view_ref( // same vr
                   Item_field("t2.b") // same fld
                 ),
                 ...
               )

            In Item_cond::propagate_equal_fields() for the conjunction, it
            iterates over the two items and call propagate_equal_fields() on
            them one by one, and whatever side effects happen during these
            calls, the last call "wins". So in the patched version,
            r->propagate_equal_fields() is called after and we end up with fld
            rather than vr participating in the multiple equality which is
            wrong. In the original version, vr->propagate_equal_fields() is
            called after and we end up with vr participating which is correct.
            If we can make it always correct as we do in patches in this ticket,
            then we don't have to worry about the order.

            [0] https://github.com/MariaDB/server/commit/e3ac306157ab9ade137c9afc9fff270a2f50d7ec
            [1] https://mariadb.slack.com/archives/C021E77G7K2/p1690952975445929
            [2]
            https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/
            [3] https://github.com/MariaDB/server/commit/02da5af0aa1
            [4]
            https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=265762&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-265762
            [5] https://github.com/MariaDB/server/commit/8c0835459aa

            ycp Yuchen Pei added a comment - - edited This is a reply to a message from igor on slack about the relation between this task and MDEV-3881 . Messages in commits to MDEV-22534 reference MDEV-3881 because the issues seen in MDEV-3881 are encountered in MDEV-22534 . They have the same symptom, but nobody can say for sure whether it is the same issue caused by the same problem because MDEV-3881 is so ancient. Now let's assume it is the same. Looking at the fix for MDEV-3881 , I cannot verify it myself whether the commit did fix it. First, it is not even clear which commit was meant as a fix for MDEV-3881 . I suspect it is e3ac306157ab9ade137c9afc9fff270a2f50d7ec [0] because that is where the MDEV-3881 test case was added. But as you can see, it was also the commit that introduced the exists-to-in transformation. The bug relied on exists-to-in to manifest, so in this case, it is not even possible to determine which version of the feature that caused the bug, let alone studying the fix. Second, it was a commit from 10 years ago, and even the author could not recall what was the fix, see the thread [1] . On the face value, my suspicion is that the bug was not fixed, and we have just been relying on a workaround (see below) to avoid it. Or maybe it was fixed then, but some new changes over the last 10 years caused a regression (e.g. the Gallina commit mentioned by psergei ). In any case, I don't think archaeology helps fix this bug. In the current codebase, the exists-to-in code relies on a (perhaps unintentional) workaround to avoid the bug, which is, when the subselect is a toplevel NOT, we need to AND the new IN subquery with IS NOT NULLs because 3-value logic is funny [2] . When doing so, the exists2in processor constructs the AND item in different orders depending on the number of picked-out equalities. If there is only one, then the AND item has the IS NOT NULL item as the first argument. If there is more than one, then the AND item has the IN subquery as the first argument. There's no reason the two cases should be treated differently, and the infinite loop happens when we try to unify the two cases and always have the IN subquery as the first argument, see for example [1] . Note that this was already the case in the initial exists-to-in commit [0] . I also made a minimal change that has nothing to do with the main task in MDEV-22534 to reproduce the issue [3] , just run the test main.subselect_exists2in that contains the MDEV-3881 test. We can keep the workaround in the new transformation introduced in this ticket, which is the case in earlier patches, but I was told by psergei and monty that it is generally not a good idea have a workaround like this unless there is a compelling reason. By the way, the commit [3] (and [5] for a 10.4 version) below also answers the following question in the previous comment. The testcase is main.subselect_exists2in , more specifically the MDEV-3881 part in that test, also quoted in comment [4] above. > ** The bug is apparently in the old (pre- MDEV-22534 ) code. Can one > construct a testcase for it? As to the question why swapping the orders matter, here's an explanation, using the self-contained example given by psergei above. Exists2in transforms NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) to !(Item_in_optimizer and Item_func_is_not_null) # patched version or !(Item_func_is_not_null and Item_in_optimizer) # original version where we have the following structures for the two items in the conjunction Item_func_is_not_null( Item_direct_ref( // denote this r Item_direct_view_ref( // denote this vr Item_field("t2.b") // denote this fld ) ) and Item_in_optimizer( Item_direct_view_ref( // same vr Item_field("t2.b") // same fld ), ... ) In Item_cond::propagate_equal_fields() for the conjunction, it iterates over the two items and call propagate_equal_fields() on them one by one, and whatever side effects happen during these calls, the last call "wins". So in the patched version, r->propagate_equal_fields() is called after and we end up with fld rather than vr participating in the multiple equality which is wrong. In the original version, vr->propagate_equal_fields() is called after and we end up with vr participating which is correct. If we can make it always correct as we do in patches in this ticket, then we don't have to worry about the order. [0] https://github.com/MariaDB/server/commit/e3ac306157ab9ade137c9afc9fff270a2f50d7ec [1] https://mariadb.slack.com/archives/C021E77G7K2/p1690952975445929 [2] https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/ [3] https://github.com/MariaDB/server/commit/02da5af0aa1 [4] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=265762&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-265762 [5] https://github.com/MariaDB/server/commit/8c0835459aa
            ycp Yuchen Pei added a comment - - edited

            After digesting psergei's comment above, and discussions with him,
            here is an explanation why not allowing Item_fields to participate in
            multiple equalities when they are referenced by Item_direct_view_ref
            fixes the self-reference bug (provided some assumption mentioned at the
            end of this comment holds).

            The job of Item::propagate_equal_fields() is to assign a multiple
            equality to the item, or make it so field-like items participate in
            the correct multiple equalities. Here we say an item i participates in
            multiple equality e if i->item_equal == e, and we say i does not
            participate in any multiple equality if i->item_equal == NULL.

            We say an Item_direct_view_ref vr points to a Item_field
            field if vr->real_item() == field. Multiple
            Item_direct_view_ref's in different contexts may point to the same
            Item_field. To avoid joining disjoint equalities, when an
            Item_field is pointed to by an Item_direct_view_ref, the
            former should no longer participate in a multiple equality, but the
            latter should. For example, consider {{alias.a = t.col1 OR alias.a =
            t.col2}}. If the field underlying alias.a participates in both
            multiple equalities, the two multiple equalities will join into one
            with three elements all equal to each other, which is a logical error.
            (It would be nice to have a test case for this, though.) However, this
            is not the case in the codebase, the violation of this rule causes the
            bug, and the rectification fixes the bug provided certain assumptions
            hold.

            The job of Item::replace_equal_field() is to replace a field-like item
            by an equal field-like item. When the Item is an Item_field, it
            replaces itself by the const or first item of the multiple equality it
            participates, if any and if it agrees with cond_equal. When the Item i
            is an Item_ref, including Item_direct_view_ref, in
            Item_ref::transform(.., replace_equal_field, ..), it calls the
            transformer on the item it derefs to (i.e. *i->ref).

              Item *new_item= (*ref)->transform(thd, transformer, arg); /* 1 */

            And if the transformer returns a new item that is different from
            *i->ref, we make i deref to this new item.

              if (*ref != new_item)
                thd->change_item_tree(ref, new_item); /* 2 */

            After that we replace call replace_equal_field on i itself.

              return (this->*transformer)(thd, arg); /* 3 */

            So when we have Item_direct_view_ref vr1 that derefs to an Item_field
            field, i.e. *vr1->ref == field, and field participates in a
            multiple equality whose const item is another Item_direct_view_ref vr2
            that not only also derefs to field, but also has the same ref field as
            vr1 (vr1->ref == vr2->ref), step 1 above returns vr2, and step 2
            above makes vr1 deref to vr2, causing a self-reference:
            *vr2->ref == *vr1->ref == vr2.

            Now, if we do not allow Item_field to participate in a multiple
            equality (i.e. when vr1->ref == field, we do {{vr1->item_equal=
            item_equal}} and field->item_equal = NULL), then the
            self-reference would not occur in this case. In step 1 above, the
            right hand side will evaluate to Item_field, because its cond_equal is
            NULL.

            Item *Item_field::replace_equal_field(THD *thd, uchar *arg)
            {
              REPLACE_EQUAL_FIELD_ARG* param= (REPLACE_EQUAL_FIELD_ARG*)arg;
              if (item_equal && item_equal == param->item_equal)
              {
              ...
              }
              return this;
            }

            It will therefore skip step 2, and in step 3 above, vr1 temporarily
            lets field participate in the multiple equality, and we get vr2 as a
            result, which is then used to replace vr1 directly, rather than
            *vr1->ref.

            We can also answer the following question in a previous comment:

            > Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the
            > same bug? what happens if there are nested views?

            If there are nested views, say *vr1->ref == vr2,
            *vr2->ref == field, vr1->item_equal->get_const() == vr3,
            vr1->item_equal == cond_equal->current_level->item_equal,
            and {{vr2->item_equal == field->item_equal == NULL}
            then the following happens, and vr3 replaces vr1, and no
            self-reference occurs.

              vr1->transform: new_item= vr2->transform(...)
                vr2->transform: new_item= field->transform(...)
                  field->replace_equal_field: returns itself because its item_equal is NULL
                  vr2->replace_equal_field: returns itself because its item_equal is NULL
                vr1->replace_equal_field: temporarily lets field participate and returns vr3

            Because an Item_direct_view_ref does not allow anything downstream
            to participate in a multiple equality, anything returned by the
            replace_equal_field call on an Item_field would replace the
            top-level Item_direct_view_ref. As such, no self-reference occurs.

            ycp Yuchen Pei added a comment - - edited After digesting psergei 's comment above, and discussions with him, here is an explanation why not allowing Item_fields to participate in multiple equalities when they are referenced by Item_direct_view_ref fixes the self-reference bug (provided some assumption mentioned at the end of this comment holds). The job of Item::propagate_equal_fields() is to assign a multiple equality to the item, or make it so field-like items participate in the correct multiple equalities. Here we say an item i participates in multiple equality e if i->item_equal == e , and we say i does not participate in any multiple equality if i->item_equal == NULL . We say an Item_direct_view_ref vr points to a Item_field field if vr->real_item() == field . Multiple Item_direct_view_ref 's in different contexts may point to the same Item_field . To avoid joining disjoint equalities, when an Item_field is pointed to by an Item_direct_view_ref , the former should no longer participate in a multiple equality, but the latter should. For example, consider {{alias.a = t.col1 OR alias.a = t.col2}}. If the field underlying alias.a participates in both multiple equalities, the two multiple equalities will join into one with three elements all equal to each other, which is a logical error. (It would be nice to have a test case for this, though.) However, this is not the case in the codebase, the violation of this rule causes the bug, and the rectification fixes the bug provided certain assumptions hold. The job of Item::replace_equal_field() is to replace a field-like item by an equal field-like item. When the Item is an Item_field, it replaces itself by the const or first item of the multiple equality it participates, if any and if it agrees with cond_equal. When the Item i is an Item_ref, including Item_direct_view_ref, in Item_ref::transform(.., replace_equal_field, ..), it calls the transformer on the item it derefs to (i.e. *i->ref ). Item *new_item= (* ref )->transform(thd, transformer, arg); /* 1 */ And if the transformer returns a new item that is different from *i->ref , we make i deref to this new item. if (* ref != new_item) thd->change_item_tree( ref , new_item); /* 2 */ After that we replace call replace_equal_field on i itself. return ( this ->*transformer)(thd, arg); /* 3 */ So when we have Item_direct_view_ref vr1 that derefs to an Item_field field, i.e. *vr1->ref == field , and field participates in a multiple equality whose const item is another Item_direct_view_ref vr2 that not only also derefs to field, but also has the same ref field as vr1 ( vr1->ref == vr2->ref ), step 1 above returns vr2, and step 2 above makes vr1 deref to vr2, causing a self-reference: *vr2->ref == *vr1->ref == vr2 . Now, if we do not allow Item_field to participate in a multiple equality (i.e. when vr1->ref == field , we do {{vr1->item_equal= item_equal}} and field->item_equal = NULL ), then the self-reference would not occur in this case. In step 1 above, the right hand side will evaluate to Item_field, because its cond_equal is NULL. Item *Item_field::replace_equal_field(THD *thd, uchar *arg) { REPLACE_EQUAL_FIELD_ARG* param= (REPLACE_EQUAL_FIELD_ARG*)arg; if (item_equal && item_equal == param->item_equal) { ... } return this ; } It will therefore skip step 2, and in step 3 above, vr1 temporarily lets field participate in the multiple equality, and we get vr2 as a result, which is then used to replace vr1 directly, rather than *vr1->ref . We can also answer the following question in a previous comment: > Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the > same bug? what happens if there are nested views? If there are nested views, say *vr1->ref == vr2 , *vr2->ref == field , vr1->item_equal->get_const() == vr3 , vr1->item_equal == cond_equal->current_level->item_equal , and {{vr2->item_equal == field->item_equal == NULL} then the following happens, and vr3 replaces vr1, and no self-reference occurs. vr1->transform: new_item= vr2->transform(...) vr2->transform: new_item= field->transform(...) field->replace_equal_field: returns itself because its item_equal is NULL vr2->replace_equal_field: returns itself because its item_equal is NULL vr1->replace_equal_field: temporarily lets field participate and returns vr3 Because an Item_direct_view_ref does not allow anything downstream to participate in a multiple equality, anything returned by the replace_equal_field call on an Item_field would replace the top-level Item_direct_view_ref. As such, no self-reference occurs.
            ycp Yuchen Pei added a comment - - edited

            Anyway here are the latest commits. I wonder how important it is to
            add a test covering nested Item_direct_view_ref and one that shows
            logical errors may occur if an Item_field participates in a multiple
            equality when there is more than one Item_direct_view_ref pointing
            to it. What do you think psergei?

            1084af9e91a MDEV-22534 Fix self-referencing Item_direct_view_ref
            4d72160ae4e MDEV-22534 Simplify the workaround for MDEV-31269
            f393ecea3af MDEV-22534 Decorrelate IN subquery

            I think the plan should be:

            1. Have the self-reference fix as the first commit and push it to
            10.4, since 10.4 is affected
            2. wait for there to be an 11.3 version of the MDEV-30073 patch so
            that we can test our main patch on top
            3. wait for the MDEV-30073 patch to have made it to 11.3, get rid of
            our MDEV-31269 workaround and push our main patch to 11.3

            OTOH our patch does not make the problem in MDEV-31269 worse, so
            maybe step 2 and 3 are redundant, but then again none of the
            workarounds has gained consensus support...

            ycp Yuchen Pei added a comment - - edited Anyway here are the latest commits. I wonder how important it is to add a test covering nested Item_direct_view_ref and one that shows logical errors may occur if an Item_field participates in a multiple equality when there is more than one Item_direct_view_ref pointing to it. What do you think psergei ? 1084af9e91a MDEV-22534 Fix self-referencing Item_direct_view_ref 4d72160ae4e MDEV-22534 Simplify the workaround for MDEV-31269 f393ecea3af MDEV-22534 Decorrelate IN subquery I think the plan should be: 1. Have the self-reference fix as the first commit and push it to 10.4, since 10.4 is affected 2. wait for there to be an 11.3 version of the MDEV-30073 patch so that we can test our main patch on top 3. wait for the MDEV-30073 patch to have made it to 11.3, get rid of our MDEV-31269 workaround and push our main patch to 11.3 OTOH our patch does not make the problem in MDEV-31269 worse, so maybe step 2 and 3 are redundant, but then again none of the workarounds has gained consensus support...
            ycp Yuchen Pei added a comment - - edited

            On why Item_direct_ref's are used in the creation of the extra "IS
            NOT NULL" items.

            If we replace them with Item_ref's, then MDEV-3906 resurfaces,
            i.e. the testcase in main.subselect_exists2in corresponding to
            MDEV-3906 fails with a crash.

            Like MDEV-3881, the fix of MDEV-3906 is "included" in the original
            commit adding the exists2in transformation.

            e3ac306157ab9ade137c9afc9fff270a2f50d7ec
            Commit:     unknown <sanja@montyprogram.com>
            CommitDate: Tue Feb 26 01:20:17 2013 +0200
             
            [NOT] EXISTS to IN transformation.

            sanja's last comment in MDEV-3906[1] seems to suggest that
            Item_direct_ref was used to fix this issue.

            [1] https://jira.mariadb.org/browse/MDEV-3906?focusedCommentId=28801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28801

            Interestingly, if we apply the more crude workaround for MDEV-31269,
            then the MDEV-3906 regression is gone:

            80a094f1355 * upstream/bb-11.3-mdev-22534-item-ref-no-direct MDEV-22534 [demo] Use Item_ref instead of Item_direct_ref
            fc1027ef992 * MDEV-22534 Simplify the workaround for MDEV-31269
            f393ecea3af * MDEV-22534 Decorrelate IN subquery

            Given this, and the fact that MDEV-3906 is also a 2nd PS crash issue,
            I also made the changes of using Item_ref on top of a commit for
            MDEV-30073, and the regression also does not appear:

            8dffff917bc upstream/bb-10.4-mdev-31269-3906-fixed-by-mdev-30073 MDEV-22534 [demo] MDEV-30073 fixes MDEV-31269 and MDEV-3906
            34083cf34bc upstream/bb-10.4-mdev-30073 MDEV-30073 Wrong result on 2nd execution of PS for query with NOT EXISTS

            So it seems:

            • MDEV-30073 fixes both 2nd PS issues, and thus allows the use of
              Item_ref in the creations of extra IS NOT NULL's in
              exists2in/decorrelation transformation
            • Without the fix of MDEV-30073, the only way to pass all tests,
              including the ps-protocol ones, is to check for free list
              intersections during ps execution, as well as keep the
              Item_direct_ref with the infinite cycle fix
            • The infinite cycle fix is probably still needed because the
              IS_NOT_NULL item creations are not the only place using
              Item_direct_ref. Though if we rely on the MDEV-30073 fix, this could
              be separated out as an independent bug, if a testcase can be
              created.

            There are two ways moving forward:
            1. Block this ticket with MDEV-30073, and use Item_ref and no
            workaround in our patches once MDEV-30073 is fixed
            2. Use the Item_direct_ref, the infinite cycle fix, and either the
            fourth or the fifth workaround options in the previous comment[2].

            [2] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=269076&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-269076

            Which of these two shall we go with? psergei serg igor

            ycp Yuchen Pei added a comment - - edited On why Item_direct_ref 's are used in the creation of the extra "IS NOT NULL" items. If we replace them with Item_ref 's, then MDEV-3906 resurfaces, i.e. the testcase in main.subselect_exists2in corresponding to MDEV-3906 fails with a crash. Like MDEV-3881 , the fix of MDEV-3906 is "included" in the original commit adding the exists2in transformation. e3ac306157ab9ade137c9afc9fff270a2f50d7ec Commit: unknown <sanja@montyprogram.com> CommitDate: Tue Feb 26 01:20:17 2013 +0200   [NOT] EXISTS to IN transformation. sanja 's last comment in MDEV-3906 [1] seems to suggest that Item_direct_ref was used to fix this issue. [1] https://jira.mariadb.org/browse/MDEV-3906?focusedCommentId=28801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28801 Interestingly, if we apply the more crude workaround for MDEV-31269 , then the MDEV-3906 regression is gone: 80a094f1355 * upstream/bb-11.3-mdev-22534-item-ref-no-direct MDEV-22534 [demo] Use Item_ref instead of Item_direct_ref fc1027ef992 * MDEV-22534 Simplify the workaround for MDEV-31269 f393ecea3af * MDEV-22534 Decorrelate IN subquery Given this, and the fact that MDEV-3906 is also a 2nd PS crash issue, I also made the changes of using Item_ref on top of a commit for MDEV-30073 , and the regression also does not appear: 8dffff917bc upstream/bb-10.4-mdev-31269-3906-fixed-by-mdev-30073 MDEV-22534 [demo] MDEV-30073 fixes MDEV-31269 and MDEV-3906 34083cf34bc upstream/bb-10.4-mdev-30073 MDEV-30073 Wrong result on 2nd execution of PS for query with NOT EXISTS So it seems: MDEV-30073 fixes both 2nd PS issues, and thus allows the use of Item_ref in the creations of extra IS NOT NULL 's in exists2in/decorrelation transformation Without the fix of MDEV-30073 , the only way to pass all tests, including the ps-protocol ones, is to check for free list intersections during ps execution, as well as keep the Item_direct_ref with the infinite cycle fix The infinite cycle fix is probably still needed because the IS_NOT_NULL item creations are not the only place using Item_direct_ref. Though if we rely on the MDEV-30073 fix, this could be separated out as an independent bug, if a testcase can be created. There are two ways moving forward: 1. Block this ticket with MDEV-30073 , and use Item_ref and no workaround in our patches once MDEV-30073 is fixed 2. Use the Item_direct_ref, the infinite cycle fix, and either the fourth or the fifth workaround options in the previous comment [2] . [2] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=269076&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-269076 Which of these two shall we go with? psergei serg igor

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.