Details
-
Task
-
Status: In Review (View Workflow)
-
Major
-
Resolution: Unresolved
Description
EXISTS-to-IN optimization performs trivial correlation detection and removal.
MySQL 8 has got it too, but in addition to EXISTS subqueries, they perform de-correlation also for IN subqueries. This allows them to use Materialization strategy for trivially-correlated IN- subqueries:
create table ten(a int primary key);
|
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
|
create table t1 (a int, b int, c int);
|
insert into t1 select a,a,a from ten;
|
create table t2 select * from t1;
|
explain select * from t1 where a in (select a from t2 where t1.b=t2.b);
|
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
|
1 SIMPLE t1 NULL ALL NULL NULL NULL NULL 10 100.00 Using where
|
1 SIMPLE <subquery2> NULL eq_ref <auto_distinct_key> <auto_distinct_key> 10 test.t1.a,test.t1.b 1 100.00 NULL
|
2 MATERIALIZED t2 NULL ALL NULL NULL NULL NULL 10 100.00 NULL
|
Attachments
Issue Links
- blocks
-
MDEV-31510 Allow exists2in and decorrelate-in to work for IN/EXISTS subqueries in arbitrary context
-
- Open
-
- is blocked by
-
MDEV-30073 Wrong result on 2nd execution of PS for query with NOT EXISTS
-
- Stalled
-
- relates to
-
MDEV-3881 Endless loop for query with EXISTS predicate using outer reference to view
-
- Closed
-
-
MDEV-31229 Extend check_equality_for_exist2in() to cover inner table expressions
-
- Open
-
-
MDEV-31408 Second SELECT from VIEW based on information_schema.optimizer_trace gives NULL result
-
- Closed
-
-
MDEV-31647 Stack looping and SIGSEGV in Item_args::walk_args on UPDATE
-
- Open
-
Activity
Here's a patch that works for the example test case:
https://github.com/MariaDB/server/commit/b3752311d95
Working on refining it.
ycp,
But why not go all the way to
in (select inner_expr'
from inner_table
where inner_expr = outer_expr and inner_where)
=> ,
outer_expr in (select inner_expr', inner_expr
from inner_table
where inner_where)
This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that?
Looking at how current EXISTS-to-IN works:
create table t1 (a int, b int);
|
insert into t1 select seq, seq from seq_1_to_10000;
|
create table t2 as select * from t1;
|
EXISTS-to-IN is done for outer_expr=inner_col
explain
|
select * from t1
|
where exists (select 1 from t2 where t1.a+1=t2.a);
|
EXISTS-to-IN is NOT done for outer_col=inner_expr:
explain
|
select * from t1
|
where exists (select 1 from t2 where t1.a=t2.a+1);
|
Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable:
alter table t2 add c varchar(32); |
set optimizer_trace=1; |
explain select * from t1 where exists (select 1 from t2 where t2.c=t1.a); |
select left(trace,1900) from information_schema.optimizer_trace\G |
+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10000 | |
|
| 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 1 | Using where; FirstMatch(t1); Using join buffer (flat, BNL join) |
|
+------+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------------------------+
|
{
|
"transformation": { |
"select_id": 2, |
"from": "EXISTS (SELECT)", |
"to": "IN (SELECT)", |
"upper_not": false |
}
|
},
|
{
|
"transformation": { |
"select_id": 2, |
"from": "IN (SELECT)", |
"to": "semijoin", |
"chosen": true |
}
|
},
|
{
|
"transformation": { |
"select_id": 2, |
"from": "IN (SELECT)", |
"to": "materialization", |
"possible": false, |
"cause": "types mismatch" |
}
|
},
|
what should the code for this MDEV do in such cases?
ycp, the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well.
Hi psergei, thanks for the comments.
> This seems to be possible. Could there be some issues if outer_expr has some "special" expressions like subquery expressions, fulltext functions or things like that?
I don't see any problem, will add testcases covering this scenario. What is the concern? Are there existing testcases in the codebase addressing this kind of concerns?
Also, what do you mean by fulltext functions?
> Another observation: in current code, EXISTS->IN sometimes works when Materialization is not applicable:
> what should the code for this MDEV do in such cases?
I feel that in general decorrelation is a gain, even if afterwards there's still correlation left.
But in the example you provided, assuming it does the exists2in transformation to select * from t1 where a in (select c from t2); and proceeds to executing the subquery select c from t2 with a full scan followed by a futile where a in ...., it does look like a scenario we want to avoid if possible. I will need to look more into that.
> the patch makes the testcase pass, but it will also attempt a conversion where it is not applicable, for example when the subquery has GROUP BY and/or HAVING. See the checks in Item_exists_subselect::exists2in_processor(). It looks like at least some of the checks from there should be used here as well.
Will do, thanks.
As discussed in a call with sanja, I created MDEV-31229 regarding the generalization to inner_expr=outer_expr
I added more tests, and started fixing regressions.
Most of the test failures concern mismatch in result files, while two ends in crashes: main.subselect_sj2_mat and main.subselect4. Both are caused by the use of freed items during the prepare stage in a second execution of a prepared statements. I fixed main.subselect_sj2_mat by calling thd->activate_stmt_arena_if_needed() and thd->restore_active_arena() in the processor. However, main.subselect4 is trickier, because the offending item was created at the prepare stage (presumably during the first execution) rather than inside the processor. This also seems to uncover a crash bug in the exists-to-in transformation:
set optimizer_switch=default; |
CREATE TABLE t1 (a INT); |
CREATE TABLE t2 (b INT); |
PREPARE stmt FROM " |
SELECT * FROM t2
|
HAVING 0 IN (
|
SELECT a FROM t1
|
WHERE EXISTS (
|
SELECT a FROM t1
|
WHERE b = a
|
)
|
)
|
"; |
|
EXECUTE stmt; |
EXECUTE stmt; |
I have not looked into it but suspect it is caused by the same issue. Filed it to MDEV-31269.
To be more clear, the problem with main.subselect4 is if we run the following test with the draft patch at[1], we get sigsegv similar to that of MDEV-31269:
set optimizer_switch=default; |
CREATE TABLE t1 (a INT); |
CREATE TABLE t2 (b INT); |
PREPARE stmt FROM " |
SELECT * FROM t2
|
HAVING 0 IN (
|
SELECT a FROM t1
|
WHERE a IN (
|
SELECT a FROM t1
|
WHERE b = a
|
)
|
)
|
"; |
|
EXECUTE stmt; |
EXECUTE stmt; |
[1] https://github.com/MariaDB/server/commit/88ed254c3e6
I did more debugging about this, and found out that the crash is caused by the following events
1. During the JOIN::prepare() of the first execution, the Item_field referring to t2.b is replaced by an Item_ref
2. The item_ref is moved around in the decorrelation transformation and subsequent optimizations
3. At the end of the 1st ps execution, the Item_ref is freed due to its creation in the ps execution, as part of the cleanup
4. During the JOIN::prepare() of the second execution, it tries to restore the optimized version of the statement, in doing so it does a sanity check and causes a sigsegv when trying to access the freed Item_ref.
Why the problem does not occur with the transformation disabled:
- during step 1 above, the replacement also adds an item to the change_list
- during step 2 above, the optimizations cause the equality b = a to be moved around as a whole, thereby moving its args which is an Item ** instead of individual item *'s
- during step 3 above, the change is rolled back, so the Item ** that is the args of the Item_func_eq goes from b (Item_ref) = a to b (Item_field) = a
- during step 4 above, there's no use-after-free in the restored optimized statement because it automatically contains the b (Item_field) = a.
Why we can't do the same with the transformation enabled:
- during step 2, we break the b = a into the outer_expr (b) and local_field (a), add the outer_expr to left_expr and local_field to the inner select item list. we cannot move the equality as a whole during this transformation.
Maybe we can fix this by checking during the decorrelation transformation whether local_field or any item in the tree of outer_expr is in the free_list that will be freed at Step 3. If so, then do not proceed with the transformation. What do you think psergei?
I also debugged the exists2in crash (MDEV-31269), and it seems to indeed have the same cause. So I wonder whether it makes sense to try to fix that one first.
I have a more polished patch now. Can you take a look psergei? Thank you:
https://github.com/MariaDB/server/commit/ba9cf8efeb0
The special cases where left_expr are subqueries or fulltext functions are included in the test.
The type mismatch causing no materialization down the road is not obvious to fix. I thought about checking whether the type handlers of the two sides of the equalities, but then there are "compatible" ones like longlong vs long. I could look more into it (e.g. how to determine which mismatches are compatible and which are not) but I feel it has already taken quite long, and given this is an existing issue in the existing exists-to-in transformation, maybe we can do with creating a ticket for it.
Finally, my fix of 2nd ps execution segfault requires HAVE_PSI_STATEMENT_INTERFACE because it checks thd->m_statement_state.m_parent_prepared_stmt to determine whether we are inside a ps execution. Is there a better way to determine this?
I think the scope of this ticket should be the transformation. Prevention of the 2nd ps execution bug is part of MDEV-31269.
Because of that, I'm marking MDEV-31269 as blocking this ticket, because without fixing that bug and preventing it the same way for this transformation we can't push the change in this ticket, unless the transformation can be implemented in a significantly different way from exists2in such that the bug no longer manifests.
But I think the patch still can be reviewed for the transformation itself. Hope that makes sense.
Some observations from review:
create table ten(a int);
|
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
|
|
create table one_k(a int);
|
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
|
|
create table one_m(a int primary key);
|
insert into one_m select A.a + B.a* 1000 from one_k A, one_k B;
|
|
create table t1 (a int, b int, c int );
|
|
insert into t1 select a, a, a from one_k;
|
|
create table t2 (a int, b int, c int );
|
insert into t2 select a,a,a from one_m;
|
MySQL-8:
mysql> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333;
|
+----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where |
|
| 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 997999 | 1.00 | Using where |
|
+----+--------------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|
Odd, I would expect it to work.
But this matches their optimizer trace:
{
|
"transformation": {
|
"select#": 2,
|
"from": "IN (SELECT)",
|
"to": "materialization",
|
"possible": false,
|
"cause": "correlated"
|
}
|
},
|
In MariaDB with this patch:
MariaDB [j1]> explain select * from t1 where 1 in (select 1 from t2 where t2.a=t1.a and t2.b=t1.b) or t1.c<333;
|
+------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
|
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
|
| 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 997920 | Using where |
|
+------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
|
But the trace shows materialization should be possible:
"join_preparation": {
|
"select_id": 2,
|
"steps": [
|
{
|
"transformation": {
|
"select_id": 2,
|
"from": "IN (SELECT)",
|
"to": "materialization",
|
"sjm_scan_allowed": false,
|
"possible": true
|
}
|
},
|
"transformation": {
|
"select_id": 2,
|
"from": "IN (SELECT)",
|
"to": "decorrelation"
|
}
|
Replying to comment about the non-semijoin materialization[1]:
The trace was printed in subquery_types_allow_materialization() during the prepare stage, which was called by is_materialization_applicable() with some further checks, including correlation. So in this case the original IN subquery passes the "allow" check (whatever that means), but because it is correlated, it does not pass the "applicable" check which would result in adding the materialization strategy.
If it were a semijoin and gets decorrelated by the new transformation, it would get a second chance in convert_join_subqueries_to_semijoins() called in optimize_inner().
If it were an EXIST subquery and had an exists-to-in transformation, it would create a brand new IN subquery in the transformation and call check_and_do_in_subquery_rewrites() that adds the materialization strategy. In our case, the IN subquery is not brand new, and it has already in-to-exists strategy, so doing the same would not help, because there's a check !in_subs->has_strategy() before trying to add the materialization strategy.
So to fix the example you provided, I updated the transformation by adding a (re)consideration for materialization in case of a non-semijoin at the end of the transformation[2].
This causes a few more "possible" materializations in the trace not resulting in actual materialization mainly in the case of having an upper NOT, but given the relationship between subquery_types_allow_materialization() and is_materialization_applicable() as discussed above, as long as materialization is not expected*, I suspect it is OK. What do you think psergei?
[1] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259705&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-259705
[2] https://github.com/MariaDB/server/commit/69af0632fe5
* I checked equivalent EXISTS versions, and they do not result in materialization, unlike your non-semijoin example.
New commit that addresses the review input
https://github.com/MariaDB/server/commit/e80a40e1354
Can you take a look psergei? Thanks.
Looking at the next patch. Good to see the input was addressed.
Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql
The trivial correlation is
where t1.b=t2.y
|
the code puts t1.b IS NOT NULL at the top level (outside the subquery):
"query_block": {
|
"select_id": 1,
|
"attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))"
|
But then inside the subquery we have:
{
|
"full-scan-on-null_key": {
|
"table": {
|
"table_name": "t2",
|
"access_type": "ref_or_null",
|
"possible_keys": ["y"],
|
"key": "y",
|
"key_length": "5",
|
"used_key_parts": ["y"],
|
the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery.
..., as long as materialization is not expected*, I suspect it is OK. What do you think Sergei Petrunia?
Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy.
Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor().
Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:
(!optimizer ||
|
...
|
!first_select->leaf_tables.elements||
|
with_recursive_reference)
|
Do we need to add these too? (I wonder what "!optimizer" check is for...)
I can also see both exists2in_processor() functions have:
!(is_top_level_item() ||
|
(upper_not && upper_not->is_top_level_item())) || /* (2) */
|
although the motivation behind this limitation is not clear.
is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs.
But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN, we don't care between TRUE/NULL results.
This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result).
But if we do differentiate, why can't we handle the subquery in an arbitrary context?
Hi psergei, thanks for the 3rd round comments. I started working through them before noticing something in your attachment, which I'd like to get cleared before continuing. First, what is the table one below?
explain format=json select * from t1 where t1.a not in (select t2.x from one,t2 ignore index(x) where t1.b=t2.y and t2.z=33) |
Also I guess you meant an insert into t2 values preceding the following?
( 12 , 12 , 12 ),
|
( 22 , NULL , 22 ), |
( NULL , 21 , 21 ), |
( 23 , NULL , 23 ), |
( NULL , 23 , 23 ); |
I assume seq_1_to_10 is from --source include/have_sequence.inc.
ycp, sorry for errors in the attachment. The table one is
create table one (w int);
|
insert into one values (1);
|
I added it so that we get a regular join plan for the subquery, instead of special-case plans of one-table subqueries.
Also I guess you meant an insert into t2 values preceding the following?
Yes.
Take-aways from yesterday's optimizer call
Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:
(!optimizer ||
The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer".
I can also see both exists2in_processor() functions have:
!(is_top_level_item() ||
(upper_not && upper_not->is_top_level_item())) || /* (2) */
No idea about this one.
A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2
Looking at the next patch. Good to see the input was addressed.
Handling of NULLs seems to produce correct output, although I see that the query plan has redundant elements. Look at the attached file: mdev22534-full-scan-on-null-key.sql
![]()
The trivial correlation is
where t1.b=t2.y
the code puts t1.b IS NOT NULL at the top level (outside the subquery):
"query_block": {
"select_id": 1,
"attached_condition": "!(t1.b is not null and <in_optimizer>((t1.a,t1.b),<exists>(subquery#2)))"
But then inside the subquery we have:
{
"full-scan-on-null_key": {
"table": {
"table_name": "t2",
"access_type": "ref_or_null",
"possible_keys": ["y"],
"key": "y",
"key_length": "5",
"used_key_parts": ["y"],
the "NULL key" for lookup into t2.y is t1.b. Which we know will be NOT NULL if we get into the subquery.
I did some debugging, and it seems to be caused by the IN subselect creating Item_func_trig_cond when it is not top level during stage 2 optimization[1]. It is avoided by the exists2in transformation because it marks the newly created IN subselect as toplevel[2], even if there's an upper_not:
in_subs->top_level_item();
|
[1] (backtrace): Item_in_subselect::create_row_in_to_exists_cond > Item_in_subselect::create_in_to_exists_cond > JOIN::choose_subquery_plan > make_join_statistics > JOIN::optimize_inner > JOIN::optimize > st_select_lex::optimize_unflattened_subqueries > JOIN::optimize_unflattened_subqueries > JOIN::optimize_stage2 > JOIN::optimize_inner > JOIN::optimize
[2] Item_exists_subselect::exists2in_processor
I tried to do the same thing with my decorrelate-in transformation[3], but it fails the following test:
CREATE TABLE t1 (pk INT NOT NULL, i INT NOT NULL); |
INSERT INTO t1 VALUES (0,10), (1,20), (2,30), (3,40); |
CREATE TABLE t2a (pk INT NOT NULL, i INT NOT NULL, PRIMARY KEY(i,pk)); |
INSERT INTO t2a VALUES (0,0), (1,1), (2,2), (3,3); |
SELECT * FROM t1 WHERE (NULL, 1) NOT IN (SELECT t2a.i, t2a.pk FROM t2a WHERE t2a.pk = t1.pk); # outputs nothing, but should be (0, 10), (2, 30), (3, 40) |
[3] https://github.com/MariaDB/server/commit/373a97067bd
I could not come up with a similar test for exists2in to check whether the forced toplevel would cause similar problems. In any case, this needs to be further investigated.
Yes that should be fine. However since the primary goal of processing IN subquery is to decorrelate it, it would be nice to write the outcomes into the trace: whether the subquery became uncorrelated and also write a note if Materialization became a possible strategy.
Added a field in the trace whether the outcome is still correlated. AFAICT materialization, if applicable after the transformation, does show up in the trace. An example is the non-semijoin materialization example in the test[3], where if you remove the ".to" in json_extract path, you will see materialization after the decorrelation has the possible field equal true.
Nice to see a readable comment before the big if-check in Item_in_subselect::exists2in_processor().
Comparing it with its counterpart in Item_exists_subselect::exists2in_processor(), I can see the latter has more checks:
(!optimizer ||
...
!first_select->leaf_tables.elements||
with_recursive_reference)
Do we need to add these too? (I wonder what "!optimizer" check is for...)
Looking at my notes, I don't know what these checks are for and did not find them necessary for my decorrelate-in transformation. The example snippet in the tests I found that covers !first_select->leaf_tables.elements is
SELECT * FROM t2 AS alias1, t2 AS alias2 |
WHERE EXISTS ( SELECT 1 ) AND (alias2.pk = alias1.b ) |
ORDER BY alias1.b; |
I wrote a similar test for IN, and it got optimized
# no need to check leaf tables |
CREATE TABLE t2 (pk INT PRIMARY KEY, b INT) ENGINE=MyISAM; |
INSERT INTO t2 VALUES (1,1),(2,7); |
SELECT * FROM t2 AS alias1, t2 AS alias2 |
WHERE alias1.pk in ( SELECT 1 ) AND (alias2.pk = alias1.b ) |
ORDER BY alias1.b; |
drop table t2; |
For with_recursive_reference, the relevant snippet is from cte_recursive.test:
with recursive |
reached_objects as |
(
|
select v, 'init' as m from objects where v in ('v3','v7','v9') |
union
|
select module_results.v, module_results.m from module_results, applied_modules |
where module_results.m = applied_modules.m |
),
|
applied_modules as |
(
|
select * from modules where 1=0 |
union
|
select modules.m |
from
|
modules
|
where
|
not exists (select * from module_arguments |
where module_arguments.m = modules.m and |
module_arguments.v not in |
(select v from reached_objects)) |
)
|
select * from reached_objects |
There's IN in this test, and without the check in the decorrelate-in transformation, the test passes. I also tried the following test snippet and there's no problem without the check in place:
with recursive |
ancestor_ids (id, generation)
|
as
|
(
|
select father, 1 from folks where name = 'Me' and father is not null |
union all |
select mother, 1 from folks where name = 'Me' and mother is not null |
union all |
select father, fa.generation+1 from folks, ancestor_ids fa |
where folks.id = fa.id and father is not null and |
(father not in (select id from ancestor_ids)) |
union all |
select mother, ma.generation+1 from folks, ancestor_ids ma |
where folks.id = ma.id and mother is not null and |
(mother not in (select id from ancestor_ids)) |
)
|
select generation, name from ancestor_ids a, folks |
where a.id = folks.id |
I can also see both exists2in_processor() functions have:
!(is_top_level_item() ||
(upper_not && upper_not->is_top_level_item())) || /* (2) */
although the motivation behind this limitation is not clear.
is_top_level_item()==true means one doesn't care between FALSE/NULL results, that is, one only needs to check if the subquery has a match. We don't need to care about non-matches vs non-matches in presence of NULLs.
But if is_top_level_item()==false and (upper_not && upper_not->is_top_level_item())==true then it's a NOT IN and for IN, we don't care between TRUE/NULL results.
This means we do need to differentiate between non-match case (FALSE result) and non-match-but-NULLs-are-present (NULL result).
But if we do differentiate, why can't we handle the subquery in an arbitrary context?
Right, we don't have to worry about null when it is top level in, and with the added is not null in both the inner and outer where condition, the value of the outer where condition is exactly the same before and after the transformation. I wrote two possibilities[4] why we don't consider arbitrary context, which I quote here:
With this modification, theoretically we don't have to care about
whether the IN subselect is top level, however, there are two issues:1. When the subquery is indeed top level, do the extra terms lower
performance?
2. When the subquery is not top level, how do we substitute `this`
which is an Item_in_subselect with the Item_func_and?I suspect this is why the exists2in transformation only does the
modification when there is an upper_not, and the transformation is
skipped when the subquery is not toplevel EXISTS nor toplevel NOT
EXISTS.
I think we can fix Item 1 by handling toplevel separately from non-toplevel, that is, by not adding the is not nulls when it is top level. But do you know how fix Item 2 psergei?
The "!optimizer" check is only meaningful for EXISTS subqueries. Item_in_subselect is always wrapped in an "optimizer".
This is not always true. Item_in_subselect is not always wrapped in an optimizer and it is ok (we don't need a similar check). For example you will get segfaults when you remove the if (optimizer) from the following[5] and test against the most basic case select * from t1 where a in (select a from t2 where t1.b=t2.b);
// inside bool Item_in_subselect::exists2in_processor(void *opt_arg)
|
/* Update any Item_in_optimizer wrapper if exists */ |
if (optimizer) |
{
|
optimizer->reset_cache();
|
if (optimizer->fix_left(thd)) |
{
|
res= TRUE;
|
goto out; |
}
|
}
|
A patch with trivial coding style cleanups: https://github.com/MariaDB/server/commit/62251786f8bb2084fe9169befdc569c24f5078f2
Thank you, I have incorporated the change, as well as the is_correlated field:
One more observation for the old patch: looking at Item_exists_subselect::exists2in_and_is_not_nulls() and its "original" code I see a discrepancy:
Old code
for (size_t i= 0; i < eqs.elements(); i++) |
{
|
if (optimizer->arguments()[0]->maybe_null()) |
...
|
New code:
for (uint i= offset; i < left_exp->cols(); i++) |
{
|
if (left_exp->element_index(i)->maybe_null()) |
I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i]
> I guess the original code had a bug: it checked arguments()[0] instead of arguments()[i]
Yes I think so.
Ok, I was trying to see if I could get everything finished to catch the release train and pushed these:
commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2)
|
Author: Sergei Petrunia <sergey@mariadb.com>
|
Date: Mon Jun 19 11:27:51 2023 +0300
|
|
Make the new tests --view-protocol friendly. Still don't pass.
|
|
commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e
|
Author: Sergei Petrunia <sergey@mariadb.com>
|
Date: Sun Jun 18 23:38:09 2023 +0300
|
|
Remove NULL handling of left expressions which cannot be NULL.
|
|
However this still didn't resolve all the issues and now the release train has left the station.
I've also tried to fix this part, without success:
/* |
If we do not single out this case, and merge it with the >1
|
case, we would get an infinite loop in resolving
|
Item_direct_view_ref::real_item() like in MDEV-3881 when
|
left_exp has scalar type
|
*/
|
else if (and_list->elements == 1) |
exp= new (thd->mem_root) Item_cond_and(thd, and_list->head(), exp); |
I confirm that regular EXISTS-to-IN code hit this by accident. It handles the case of one equality in a special way, then the code path serving multiple equalities will also add multiple NOT NULL conditions even when they are redundant. If it added one condition, it would hit the same crash.
The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself.
Thanks for the updates.
commit f9d2939955028f62f912ec7deb212e2a8fbb4c20 (HEAD -> bb-11.1-mdev-22534-cleanup2, origin/bb-11.1-mdev-22534-cleanup2)
Author: Sergei Petrunia <sergey@mariadb.com>
Date: Mon Jun 19 11:27:51 2023 +0300Make the new tests --view-protocol friendly. Still don't pass.
Ok, I didn't know the test does not work with --view-protocol, because
the new bb does not seem to have any builders running with this flag.
I wonder what caused this, because it is not every non-empty
json_extract output fails. The ones that do not fail include first PS
executions, and in the cases where outer_exp has subquery or fulltext
function, and the non-semijoin materialization.
I looked around and found the "umbrella ticket" MDEV-27691 - do you
know which of the included tickets covers this case?
commit fd162ba2c887a6ca795549c2ed74dbeeefc4964e
Author: Sergei Petrunia <sergey@mariadb.com>
Date: Sun Jun 18 23:38:09 2023 +0300Remove NULL handling of left expressions which cannot be NULL.
Thanks, so that's how you fix the full-scan-on-null_key issue.
The crash happens in substitute_for_best_equal_item() which creates a situation where an Item_direct_view_ref refers to itself.
I guess you mean substitute_for_best_equal_field(). Is there an
MDEV for this?
Ok let's return to this in the development cycle.
Ok
Follow up on the meeting from yesterday:
- view-protocol: we wait for
MDEV-31408to be fixed before addressing
this issue - arbitrary subquery context (not just toplevel in and toplevel not
in): we will do this separately in MDEV-31510 - self-referencing Item_direct_view_ref: this is the only issue
blocking the present task at this moment. I will debug it when I get
back to this task.
As I'm getting back to this task, it just took me a while to reprod
the Item_direct_view_ref issue, so I'll note it here: the triggered
failure is on the test for MDEV-3881 in main.subselect_exists2in:
--echo #
|
--echo # MDEV-3881: Endless loop and crash in Item_ref::real_item with
|
--echo # exists_to_in=on, NOT EXISTS subquery, merge view or from subquery,
|
--echo # constant table
|
--echo #
|
SET optimizer_switch = 'exists_to_in=on'; |
|
CREATE TABLE t1 (a INT) ENGINE=MyISAM; |
INSERT INTO t1 VALUES (1),(7); |
|
CREATE TABLE t2 (b INT) ENGINE=MyISAM; |
INSERT INTO t2 VALUES (8); |
CREATE ALGORITHM=MERGE VIEW v1 AS SELECT * FROM t2; |
|
CREATE TABLE t3 (c INT) ENGINE=MyISAM; |
INSERT INTO t3 VALUES (4),(6); |
|
SELECT * FROM t1, v1 WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b; |
|
drop view v1; |
drop table t1, t2, t3; |
|
CREATE TABLE t1 (a INT) ENGINE=MyISAM; |
INSERT INTO t1 VALUES (1),(7); |
|
CREATE TABLE t2 (b INT) ENGINE=MyISAM; |
INSERT INTO t2 VALUES (8); |
|
CREATE TABLE t3 (c INT) ENGINE=MyISAM; |
INSERT INTO t3 VALUES (4),(6); |
|
SELECT * FROM t1, ( SELECT * FROM t2 ) alias WHERE NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b; |
|
drop table t1, t2, t3; |
So the infinite recursion is caused by self-referencing, caused by the
transformer replace_equal_field. The goal of the transformer
Item::replace_equal_field() is to replace fields with equivalent
ones, as an optimisation. Currently two Item subclasses implement this
transformer nontrivially, Item_field and Item_direct_view_ref.
The Item_direct_view_ref implementation recurses to the
Item_field implementation, but not vice versa, which is good
because otherwise the infinite self-referencing would occur at this
stage.
In the infinite recursion scenario, we have the following chain of
walking down the item tree during the transformation, where each
vertex is a pari of item type and dbug_print output, and each edge /
arrow is annotated by the operation that goes from vertext to the
next.
(Item_direct_view_ref*) 8 -.ref-> (Item**) -deref-> (Item_field*) 8
|
-.item_equal-> (Item_equal*) multiple_equal(8, t1.a) -.get_const()->
|
(Item_direct_view_ref*) 8 -.ref-> (Item**) same as above.
|
The first and the last {{Item_direct_view_ref*}}s have the same value
in their ref field. By replacing the Item_field* with the
second Item_direct_view_ref* we get the self-referencing that
causes the infinite real_item() call.
There's no guarantee that if such a self-reference exists it would
have length 1 like above. So to be safe we may need to try to detect
self-reference of arbitrary length, which is what I did in [1], but
I am not sure how much this extra check affects performance.
Alternatively we can resort to a length-1 check only, and update the
fix if that proves insufficient. What do you think psergei?
For all changes squashed into one commit, see [2].
[1] https://github.com/MariaDB/server/commit/94172e01d63
[2] https://github.com/MariaDB/server/commit/0bf54aef789
Note from the meeting yesterday:
- We will simplify the MDEV-31269 workaround condition to simply
skipping on !thd->stmt_arena->is_conventional() - I will investigate further the cause of the self-referencing
Item_direct_view_ref
OK, here's another fix (h/t Johnston for discussions):
https://github.com/MariaDB/server/commit/c40d4706010
(squashed version: https://github.com/MariaDB/server/commit/cabd6b79922
which also includes the simplified workaround for MDEV-31269
mentioned in the previous comment.)
At this moment a major issue in fixing the infinite recusion issue is
the meaning of the Item_field::equal_field and its twin
Item_direct_view_ref::equal_field. A natural guess of the meaning
of Item_field::equal_field may be "a pointer to the multiple
equality the field belongs to", but this guess breaks apart at the
following function, where the Item_direct_view_ref steals the
equal_field from the Item_field it points to:
Item *Item_direct_view_ref::propagate_equal_fields(THD *thd,
|
const Context &ctx, |
COND_EQUAL *cond)
|
{
|
Item *field_item= real_item();
|
if (field_item->type() != FIELD_ITEM) |
return this; |
Item *item= field_item->propagate_equal_fields(thd, ctx, cond);
|
set_item_equal(field_item->get_item_equal());
|
field_item->set_item_equal(NULL);
|
if (item != field_item) |
return item; |
return this; |
}
|
So the assumption can be adjusted to "let x be the multiple equality
an Item_field belongs to. If the Item_field is not referenced by an
Item_direct_view_ref, then its equal_field points to x, otherwise its
equal_field is NULL and the equal_field of Item_direct_view_ref points
to x". And this second guess seems to make sense in light of the
following function, where the Item_direct_view_ref temporarily
lends its own item_equal to the real_item(), before restoring it to
NULL.
Item *Item_direct_view_ref::replace_equal_field(THD *thd, uchar *arg)
|
{
|
Item *field_item= real_item();
|
if (field_item->type() != FIELD_ITEM) |
return this; |
field_item->set_item_equal(item_equal);
|
Item *item= field_item->replace_equal_field(thd, arg);
|
field_item->set_item_equal(0);
|
return item != field_item ? item : this; |
}
|
So the above fix patches Item_ref::propagate_equal_fields() so
that it delegates to the propagate_equal_fields() of the item it
references, so that when it references to an Item_direct_view_ref,
it will let the latter set the correct equal_field.
psergei can you comment on this assumption and/or the new fix?
So the assumption can be adjusted to "let x be the multiple equality
an Item_field belongs to. If the Item_field is not referenced by an
Item_direct_view_ref, then its equal_field points to x, otherwise its
equal_field is NULL and the equal_field of Item_direct_view_ref points
to x".
Having studied the topic a little, this suggestion makes sense to me, too.
Let me continue to study the patch.
ycp, the patch changes one piece of cryptic code in Item_ref::propagate_equal_fields for another.
It's not clear what the new code does.
Looking at the new code:
Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, |
COND_EQUAL *cond)
|
{
|
Item *derefed = *ref; |
if (derefed->type() != REF_ITEM) |
{
|
derefed= real_item();
|
Suppose *ref points something that's not an Item_ref or its descendant.
Then, derefed->type() != REF_ITEM, if branch is taken.
But when *ref is not an Item_ref or its descendant, real_item() will return *ref.
What is the point of this line then:
derefed= real_item();
|
?
(I'll continue looking at this)
Experimenting, I've made this change (to the latest code with ycp's changes) and it didn't have any effect on mtr :
diff --git a/sql/item.cc b/sql/item.cc
|
index fa05a90c011..d7615521fc6 100644
|
--- a/sql/item.cc
|
+++ b/sql/item.cc
|
@@ -9416,12 +9416,14 @@ Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx,
|
COND_EQUAL *cond)
|
{
|
Item *derefed = *ref;
|
+#if 0
|
if (derefed->type() != REF_ITEM)
|
{
|
derefed= real_item();
|
if (derefed->type() != FIELD_ITEM)
|
return this;
|
}
|
+#endif
|
Item *item= derefed->propagate_equal_fields(thd, ctx, cond);
|
if (item != derefed)
|
return item; |
I don't yet understand the purpose of the code that's #ifdef-ed away...
Looking at the code in Item_ref::propagate_equal_fields():
Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, |
COND_EQUAL *cond)
|
{
|
...
|
Item *item= field_item->propagate_equal_fields(thd, ctx, cond);
|
|
if (item != field_item) |
return item; // (POINT1) |
return this; |
}
|
Suppose the object this is called for is an Item_ref (not Item_direct_ref).
Item_ref objects are used to "wrap" items when they are used in the post-grouping context and one needs to use result calls to get the value like it is done for example in Item_ref::val_int:
longlong Item_ref::val_int()
|
{
|
...
|
longlong tmp=(*ref)->val_int_result(); |
return tmp; |
}
|
But if Item_ref::propagate_equal_fields() substitutes itself for the item it has got from (*ref), as it does on he line marked with (POINT1)? Conversion from val() calls to val*result() calls won't be done anymore...
.. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution
if (item != field_item) |
return item; // (POINT1) |
is only taken when the item is a constant item (for constants, val() and val_result() are the same).
> But when *ref is not an Item_ref or its descendant, real_item() will return *ref.
I didn't know this - is it documented somewhere?
In any case, with this assertion, my changed version is equivalent to
Item *Item_ref::propagate_equal_fields(THD *thd, const Context &ctx, |
COND_EQUAL *cond)
|
{
|
Item *derefed = *ref; |
if (derefed->type() != REF_ITEM && derefed->type() != FIELD_ITEM) |
return this; |
Item *item= derefed->propagate_equal_fields(thd, ctx, cond);
|
if (item != derefed) |
return item; |
return this; |
}
|
which I include in an updated commit at
https://github.com/MariaDB/server/commit/3eeab3c4508.
psergei I wonder whether this version is still cryptic? My
explanation is that it adds inside
Item_ref::propagate_equal_fields() a handling of the case where
*ref points to another Item_ref, by delegating to the
propagate_equal_fields() of the referenced Item_ref, so that
the result is consistent for the latter. By consistency I mean for
example these two cases:
1. Item_ref *ref> Item_direct_view_ref *ref> Item_field
2. Item_func *args> Item_direct_view_ref *ref> Item_field
Both should result in the same item_equal for the
Item_direct_view_ref.
> Experimenting, I've made this change (to the latest code with ycp's changes) and it didn't have any effect on mtr :
Given the above, does this mean that the ref of an Item_ref
can only point to either a REF_ITEM or a FIELD_ITEM? Perhaps
this is another assertion I'm not aware of?
> .. a guess: the lack of conversion from val*() to val*result() methods is not an issue, because the substution ... is only taken when the item is a constant item (for constants, val() and val_result() are the same).
This sounds right, as Item_field::propagate_equal_fields() only
returns something new if the new thing is the result of
field->get_equal_const_item(thd, ctx, item). Just to validate
this assumption, I added assertions in
Item_ref::propagate_equal_fields() and
Item_direct_view_ref::propagate_equal_fields() on top of my
commit[1]:
if (item == derefed) |
item= this; |
DBUG_ASSERT(item->type() == REF_ITEM || item->const_item());
|
return item; |
[1] https://github.com/MariaDB/server/commit/ca390e57642
All main.subselect* tests still pass. Shall we add this assertion?
The squashed version of all these changes, including the assertion
addition is at https://github.com/MariaDB/server/commit/4830c70cdb8
BTW, the simplification of the MDEV-31269 workaround[1] also causes
test failures with --ps-protocol[2]:
[amd64-ubuntu-2204-debug-ps | failed test (failure)]
|
json.json_table
|
main.explain_json
|
main.group_min_max
|
main.index_merge_myisam
|
main.innodb_ext_key
|
main.innodb_ext_key
|
main.innodb_ext_key
|
main.innodb_ext_key
|
main.multi_update
|
main.opt_trace
|
main.subselect
|
main.subselect4
|
main.subselect_decorrelate_in
|
main.subselect_exists2in
|
main.subselect_exists2in_costmat
|
main.subselect_extra_no_semijoin
|
main.subselect_mat
|
main.subselect_mat_cost_bugs
|
main.subselect_sj2_mat
|
[1] https://github.com/MariaDB/server/commit/ffc61fb039e
[2] https://buildbot.mariadb.org/#/builders/534/builds/8770
When I revert the change [1] above all these tests pass.
psergei, shall we update these tests with --disable-ps-protocol and
--enable-ps-protocol, or shall we revert to the more expensive
workaround?
ycp, possible problem is in the 2nd execution of prepare statement (now ps-protocol execute prepare statement twice and returns only the results of the second execution ). You can disable the second execution in testcases with "--disable-ps2-protocol"
Thanks lstartseva, are the problem and workaround you mentioned
concerning the failures with testing with --ps-protocol as mentioned
in my previous comment? If so, following up on the optimizer
meeting, psergei suggested that we make the skip condition a bit
more strict, by also checking for ref items, to avoid the failures
with --ps-protocol.
And here's the updated commit. It checks if either side of any of
the chosen equalities is a ref item, and skip the transformation if
so: https://github.com/MariaDB/server/commit/b2b0d16b68d
As a result, I had to do a --disable_ps_protocol and
--enable_ps_protocol in two explain extended cases in
main.subselect_exists2in.
A self-contained explanation of how we get a loop of Item_direct_view_ref objects:
Testcase
CREATE TABLE t1 ( a INT ); |
INSERT INTO t1 VALUES (1),(5); |
CREATE TABLE t2 ( b INT ) ENGINE=MyISAM; |
INSERT INTO t2 VALUES (1); |
|
CREATE TABLE t3 ( c INT ); |
INSERT INTO t3 VALUES (4),(5); |
SELECT * |
FROM
|
t1,
|
( SELECT * FROM t2 ) alias |
WHERE
|
NOT EXISTS ( SELECT * FROM t3 WHERE c = b ) AND a = b; |
Here, NOT EXISTS is converted into NOT IN. This rewrite also adds left_expresssion IS NOT NULL so the NOT EXISTS becomes:
"!(<in_optimizer>(t2.b,<exists>(subquery#3)) and t2.b is not null)"
|
The "t2.b is not null" has this structure:
Item_func_is_not_null(
|
Item_direct_ref( // denote this as $ITEM_DIRECT_REF
|
Item_direct_view_ref(
|
Item_field("t2.b") // // denote this $IT_FLD
|
)
|
)
|
The top-level WHERE clause is used to construct Item_equal which includes
- Item_field(t1.a)
- Item_direct_view_ref(Item_field(alias.b)) // denote ITEM_DIRECT_REF_2
Then, Item_ref::propagate_equal_fields() is called for $ITEM_DIRECT_REF object
The code does this (comments are mine):
// This will assign field_item = $IT_FLD
|
Item *field_item= real_item();
|
|
// This will make $IT_FLD->item_equal point to multiple equality.
|
// This is already wrong as there can be multiple Item_direct_view_ref
|
// objects in different contexts which refer to the same Item_field object.
|
// See commit comment for 8d9dd21d for details.
|
Item *item= field_item->propagate_equal_fields(thd, ctx, cond);
|
Then, Item_equal::update_const() sets another the $ITEM_DIRECT_REF_2 Item_direct_view_ref object to be the "the const item".
Then, substitute_for_best_equal_field() calls
Item_ref::transform(..., &Item::replace_equal_field)
|
for $ITEM_DIRECT_REF object.
It eventually calls:
Item *new_item= (*ref)->transform(thd, transformer, arg);
|
Here, *ref points to $IT_FLD Item_field.
new_item is now $ITEM_DIRECT_REF_2, the constant Item_direct_view_ref.
Then, we do this:
if (*ref != new_item)
|
thd->change_item_tree(ref, new_item);
|
And this creates a loop because "ref" is the VIEW's field translation entry. Which now points to an Item_direct_view_ref which has ref pointing to the VIEW's field translation...
Some comments on 2nd ps segfault workarounds.
trade-offs:
- the more cases it skips the transformation, the more false
positives there are, and the more ps-protocols it will fail - the less cases it skips the transformation, the more false
negatives there are, and the more likely the 2nd ps segfault will
occur
tried workarounds:
1. skip if not conventional: too many false positives,
https://github.com/MariaDB/server/commit/ffc61fb039e
2. skip if not conventional and either side is ref: too many false
positives, https://github.com/MariaDB/server/commit/b2b0d16b68d
3. skip if not conventional and any item in the trees is ref
(mentioned and tried by psergei): too many false positives
4. skip if not conventional and either side intersects free list:
seems to work, less work than "skip if any item in the trees
intersects free list", but more work,
https://github.com/MariaDB/server/commit/28de41f9b9b
5. skip if not conventional and any item in the trees intersects free
list: the original check, seems to be the most accurate, but also
requiring the most work, with the apparent downside of performance,
see e.g. https://github.com/MariaDB/server/commit/a87f22770b for a
implementation
In terms of time required to check: 1 < 2 < 3,4 < 5.
In terms of accuracy, only 4 and 5 do not cause massive ps-protocol
failures.
Status update from the calls this week:
- Igor seems to be fine with "A self-contained explanation of how we get a loop of Item_direct_view_ref" above.
- The bug is apparently in the old (pre-MDEV-22534) code. Can one construct a testcase for it?
- SergeiP: I had no success in constructing such testcase so far, perhaps it's because Item_direct_ref objects are used in a limited number of scenarios in the server.
- Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the same bug? what happens if there are nested views? SergeiP: No it doesn't have the same bug. I think the code works correctly there. Can't explain off the top of the head.
- Igor: the workarounds for MDEV-31269 are bad, I have a patch for it here: https://github.com/MariaDB/server/commit/34083cf34bc26a5cd95e56feffd01466f9f4917f . It's targeted at 10.4, has no description yet and is large.
- Igor suggested that
MDEV-22434code is rebased on top of that commit (but didn't clarify what for?)- Discussed this with Serg, no outcome so far.
This is a reply to a message from igor on slack about the
relation between this task and MDEV-3881. Messages in commits to
MDEV-22534 reference MDEV-3881 because the issues seen in MDEV-3881
are encountered in MDEV-22534. They have the same symptom, but
nobody can say for sure whether it is the same issue caused by the
same problem because MDEV-3881 is so ancient. Now let's assume it is
the same.
Looking at the fix for MDEV-3881, I cannot verify it myself whether
the commit did fix it. First, it is not even clear which commit was
meant as a fix for MDEV-3881. I suspect it is
e3ac306157ab9ade137c9afc9fff270a2f50d7ec[0] because that is where
the MDEV-3881 test case was added. But as you can see, it was also
the commit that introduced the exists-to-in transformation. The
bug relied on exists-to-in to manifest, so in this case, it is not
even possible to determine which version of the feature that caused
the bug, let alone studying the fix. Second, it was a commit from 10
years ago, and even the author could not recall what was the fix,
see the thread[1].
On the face value, my suspicion is that the bug was not fixed, and
we have just been relying on a workaround (see below) to avoid it.
Or maybe it was fixed then, but some new changes over the last 10
years caused a regression (e.g. the Gallina commit mentioned by
psergei). In any case, I don't think archaeology helps fix this
bug. In the current codebase, the exists-to-in code relies on a
(perhaps unintentional) workaround to avoid the bug, which is, when
the subselect is a toplevel NOT, we need to AND the new IN subquery
with IS NOT NULLs because 3-value logic is funny[2]. When doing so,
the exists2in processor constructs the AND item in different orders
depending on the number of picked-out equalities. If there is only
one, then the AND item has the IS NOT NULL item as the first
argument. If there is more than one, then the AND item has the IN
subquery as the first argument. There's no reason the two cases
should be treated differently, and the infinite loop happens when we
try to unify the two cases and always have the IN subquery as the
first argument, see for example [1]. Note that this was already the
case in the initial exists-to-in commit[0]. I also made a minimal
change that has nothing to do with the main task in MDEV-22534 to
reproduce the issue[3], just run the test
main.subselect_exists2in that contains the MDEV-3881 test. We
can keep the workaround in the new transformation introduced in this
ticket, which is the case in earlier patches, but I was told by
psergei and monty that it is generally not a good idea have a
workaround like this unless there is a compelling reason.
By the way, the commit [3] (and [5] for a 10.4 version) below also
answers the following question in the previous comment. The testcase
is main.subselect_exists2in, more specifically the MDEV-3881
part in that test, also quoted in comment[4] above.
> ** The bug is apparently in the old (pre-MDEV-22534) code. Can one
> construct a testcase for it?
As to the question why swapping the orders matter, here's an
explanation, using the self-contained example given by psergei
above.
Exists2in transforms NOT EXISTS ( SELECT * FROM t3 WHERE c = b )
to
!(Item_in_optimizer and Item_func_is_not_null) # patched version
or
!(Item_func_is_not_null and Item_in_optimizer) # original version
where we have the following structures for the two items in the
conjunction
Item_func_is_not_null(
|
Item_direct_ref( // denote this r
|
Item_direct_view_ref( // denote this vr
|
Item_field("t2.b") // denote this fld
|
)
|
)
|
and
Item_in_optimizer(
|
Item_direct_view_ref( // same vr
|
Item_field("t2.b") // same fld
|
),
|
...
|
)
|
In Item_cond::propagate_equal_fields() for the conjunction, it
iterates over the two items and call propagate_equal_fields() on
them one by one, and whatever side effects happen during these
calls, the last call "wins". So in the patched version,
r->propagate_equal_fields() is called after and we end up with fld
rather than vr participating in the multiple equality which is
wrong. In the original version, vr->propagate_equal_fields() is
called after and we end up with vr participating which is correct.
If we can make it always correct as we do in patches in this ticket,
then we don't have to worry about the order.
[0] https://github.com/MariaDB/server/commit/e3ac306157ab9ade137c9afc9fff270a2f50d7ec
[1] https://mariadb.slack.com/archives/C021E77G7K2/p1690952975445929
[2]
https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/CEPJCC7GCZ4H44L276AFRIUB7LYIJOPI/
[3] https://github.com/MariaDB/server/commit/02da5af0aa1
[4]
https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=265762&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-265762
[5] https://github.com/MariaDB/server/commit/8c0835459aa
After digesting psergei's comment above, and discussions with him,
here is an explanation why not allowing Item_fields to participate in
multiple equalities when they are referenced by Item_direct_view_ref
fixes the self-reference bug (provided some assumption mentioned at the
end of this comment holds).
The job of Item::propagate_equal_fields() is to assign a multiple
equality to the item, or make it so field-like items participate in
the correct multiple equalities. Here we say an item i participates in
multiple equality e if i->item_equal == e, and we say i does not
participate in any multiple equality if i->item_equal == NULL.
We say an Item_direct_view_ref vr points to a Item_field
field if vr->real_item() == field. Multiple
Item_direct_view_ref's in different contexts may point to the same
Item_field. To avoid joining disjoint equalities, when an
Item_field is pointed to by an Item_direct_view_ref, the
former should no longer participate in a multiple equality, but the
latter should. For example, consider {{alias.a = t.col1 OR alias.a =
t.col2}}. If the field underlying alias.a participates in both
multiple equalities, the two multiple equalities will join into one
with three elements all equal to each other, which is a logical error.
(It would be nice to have a test case for this, though.) However, this
is not the case in the codebase, the violation of this rule causes the
bug, and the rectification fixes the bug provided certain assumptions
hold.
The job of Item::replace_equal_field() is to replace a field-like item
by an equal field-like item. When the Item is an Item_field, it
replaces itself by the const or first item of the multiple equality it
participates, if any and if it agrees with cond_equal. When the Item i
is an Item_ref, including Item_direct_view_ref, in
Item_ref::transform(.., replace_equal_field, ..), it calls the
transformer on the item it derefs to (i.e. *i->ref).
Item *new_item= (*ref)->transform(thd, transformer, arg); /* 1 */ |
And if the transformer returns a new item that is different from
*i->ref, we make i deref to this new item.
if (*ref != new_item) |
thd->change_item_tree(ref, new_item); /* 2 */ |
After that we replace call replace_equal_field on i itself.
return (this->*transformer)(thd, arg); /* 3 */ |
So when we have Item_direct_view_ref vr1 that derefs to an Item_field
field, i.e. *vr1->ref == field, and field participates in a
multiple equality whose const item is another Item_direct_view_ref vr2
that not only also derefs to field, but also has the same ref field as
vr1 (vr1->ref == vr2->ref), step 1 above returns vr2, and step 2
above makes vr1 deref to vr2, causing a self-reference:
*vr2->ref == *vr1->ref == vr2.
Now, if we do not allow Item_field to participate in a multiple
equality (i.e. when vr1->ref == field, we do {{vr1->item_equal=
item_equal}} and field->item_equal = NULL), then the
self-reference would not occur in this case. In step 1 above, the
right hand side will evaluate to Item_field, because its cond_equal is
NULL.
Item *Item_field::replace_equal_field(THD *thd, uchar *arg)
|
{
|
REPLACE_EQUAL_FIELD_ARG* param= (REPLACE_EQUAL_FIELD_ARG*)arg;
|
if (item_equal && item_equal == param->item_equal) |
{
|
...
|
}
|
return this; |
}
|
It will therefore skip step 2, and in step 3 above, vr1 temporarily
lets field participate in the multiple equality, and we get vr2 as a
result, which is then used to replace vr1 directly, rather than
*vr1->ref.
We can also answer the following question in a previous comment:
> Igor: doesn't Item_direct_view_ref::propagate_equal_items() have the
> same bug? what happens if there are nested views?
If there are nested views, say *vr1->ref == vr2,
*vr2->ref == field, vr1->item_equal->get_const() == vr3,
vr1->item_equal == cond_equal->current_level->item_equal,
and {{vr2->item_equal == field->item_equal == NULL}
then the following happens, and vr3 replaces vr1, and no
self-reference occurs.
vr1->transform: new_item= vr2->transform(...)
|
vr2->transform: new_item= field->transform(...)
|
field->replace_equal_field: returns itself because its item_equal is NULL
|
vr2->replace_equal_field: returns itself because its item_equal is NULL
|
vr1->replace_equal_field: temporarily lets field participate and returns vr3
|
Because an Item_direct_view_ref does not allow anything downstream
to participate in a multiple equality, anything returned by the
replace_equal_field call on an Item_field would replace the
top-level Item_direct_view_ref. As such, no self-reference occurs.
Anyway here are the latest commits. I wonder how important it is to
add a test covering nested Item_direct_view_ref and one that shows
logical errors may occur if an Item_field participates in a multiple
equality when there is more than one Item_direct_view_ref pointing
to it. What do you think psergei?
1084af9e91a MDEV-22534 Fix self-referencing Item_direct_view_ref
|
4d72160ae4e MDEV-22534 Simplify the workaround for MDEV-31269
|
f393ecea3af MDEV-22534 Decorrelate IN subquery
|
I think the plan should be:
1. Have the self-reference fix as the first commit and push it to
10.4, since 10.4 is affected
2. wait for there to be an 11.3 version of the MDEV-30073 patch so
that we can test our main patch on top
3. wait for the MDEV-30073 patch to have made it to 11.3, get rid of
our MDEV-31269 workaround and push our main patch to 11.3
OTOH our patch does not make the problem in MDEV-31269 worse, so
maybe step 2 and 3 are redundant, but then again none of the
workarounds has gained consensus support...
On why Item_direct_ref's are used in the creation of the extra "IS
NOT NULL" items.
If we replace them with Item_ref's, then MDEV-3906 resurfaces,
i.e. the testcase in main.subselect_exists2in corresponding to
MDEV-3906 fails with a crash.
Like MDEV-3881, the fix of MDEV-3906 is "included" in the original
commit adding the exists2in transformation.
e3ac306157ab9ade137c9afc9fff270a2f50d7ec
|
Commit: unknown <sanja@montyprogram.com>
|
CommitDate: Tue Feb 26 01:20:17 2013 +0200
|
|
[NOT] EXISTS to IN transformation.
|
sanja's last comment in MDEV-3906[1] seems to suggest that
Item_direct_ref was used to fix this issue.
Interestingly, if we apply the more crude workaround for MDEV-31269,
then the MDEV-3906 regression is gone:
80a094f1355 * upstream/bb-11.3-mdev-22534-item-ref-no-direct MDEV-22534 [demo] Use Item_ref instead of Item_direct_ref
|
fc1027ef992 * MDEV-22534 Simplify the workaround for MDEV-31269
|
f393ecea3af * MDEV-22534 Decorrelate IN subquery
|
Given this, and the fact that MDEV-3906 is also a 2nd PS crash issue,
I also made the changes of using Item_ref on top of a commit for
MDEV-30073, and the regression also does not appear:
8dffff917bc upstream/bb-10.4-mdev-31269-3906-fixed-by-mdev-30073 MDEV-22534 [demo] MDEV-30073 fixes MDEV-31269 and MDEV-3906
|
34083cf34bc upstream/bb-10.4-mdev-30073 MDEV-30073 Wrong result on 2nd execution of PS for query with NOT EXISTS
|
So it seems:
- MDEV-30073 fixes both 2nd PS issues, and thus allows the use of
Item_ref in the creations of extra IS NOT NULL's in
exists2in/decorrelation transformation - Without the fix of MDEV-30073, the only way to pass all tests,
including the ps-protocol ones, is to check for free list
intersections during ps execution, as well as keep the
Item_direct_ref with the infinite cycle fix - The infinite cycle fix is probably still needed because the
IS_NOT_NULL item creations are not the only place using
Item_direct_ref. Though if we rely on the MDEV-30073 fix, this could
be separated out as an independent bug, if a testcase can be
created.
There are two ways moving forward:
1. Block this ticket with MDEV-30073, and use Item_ref and no
workaround in our patches once MDEV-30073 is fixed
2. Use the Item_direct_ref, the infinite cycle fix, and either the
fourth or the fifth workaround options in the previous comment[2].
So we want to turn some kind of trivially correlated subquery to an uncorrelated subquery, so that the optimizer can apply materialization.
Extrapolating from the example, and the description in EXISTS-TO-IN[1], what happens is that we transform
in (select inner_col' from inner_table where inner_col = outer_col)
to
, outer_col in (select inner_col', inner_col from inner_table)
How much do we want to generalize this? Judging from EXISTS-TO-IN, we can add an inner_where:
in (select inner_col' from inner_table where inner_col = outer_col and inner_where) => , outer_col in (select inner_col', inner_col from inner_table where inner_where)
But how about expressions?
EXISTS-TO-IN seems to generalise to outer_expr=inner_col (see find_inner_outer_equalities()), so maybe we can also have
in (select inner_col' from inner_table where inner_col = outer_expr and inner_where) => , outer_expr in (select inner_col', inner_col from inner_table where inner_where)
But why not go all the way to
in (select inner_expr' from inner_table where inner_expr = outer_expr and inner_where) => , outer_expr in (select inner_expr', inner_expr from inner_table where inner_where)
?
[1] https://mariadb.com/kb/en/exists-to-in-optimization/
In any case I'll try to hack out a PoC that covers the example first.