[MDEV-6920] subquery don't materialize Created: 2014-10-24  Updated: 2020-05-12  Resolved: 2020-05-12

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Subquery, Optimizer
Affects Version/s: 10.0.14
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: roberto spadim Assignee: Varun Gupta (Inactive)
Resolution: Not a Bug Votes: 0
Labels: optimizer
Environment:

linux



 Description   

Hi guys, i have a subquery that don't materialize:

SELECT 
a.plano_conta_id,a.plano_conta_id_red,
(
	SELECT SUM(quant) FROM est_atu WHERE 
	unidade_id=1001 AND item_id=a.plano_conta_id AND item_id_red=a.plano_conta_id_red
) AS est, 0 as est_ext
FROM estoque_itens AS a 
WHERE 
a.plano_conta_id=67 AND a.habilitado='Y' AND grupo_itens='N'  AND 
((a.plano_conta_id=67 AND a.plano_conta_numero LIKE "%") )  AND 
a.plano_conta_id_red IN (
	SELECT plano_conta_id_red FROM lista_preco WHERE 
	distribuidor_tipo='j' AND distribuidor_id=1 AND plano_conta_id=67
)
 
 
HAVING est>0 OR est_ext>0 
ORDER BY a.plano_conta_id,a.codigo_busca LIMIT 30 OFFSET 0

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY lista_preco ref PRIMARY,item item 2 const 3577 Using where; Using index; LooseScan; Using temporary; Using filesort
1 PRIMARY a eq_ref PRIMARY,id,plano_conta_numero id 16 const,19_org.lista_preco.plano_conta_id_red 1 Using index condition; Using where
2 DEPENDENT SUBQUERY est_atu ALL PRIMARY       5640 Using where

this part:

a.plano_conta_id_red IN (
	SELECT plano_conta_id_red FROM lista_preco WHERE 
	distribuidor_tipo='j' AND distribuidor_id=1 AND plano_conta_id=67
)

return 1300 rows
if I

SET @@session.group_concat_max_len = 1024000;
SELECT GROUP_CONCAT(plano_conta_id_red)  FROM lista_preco WHERE distribuidor_tipo='j' AND distribuidor_id=1 AND plano_conta_id=67;

and change the IN (SELECT), to IN (number,number,number,number)
the query drop from: 2 seconds to 0.1 second

my doubt is, why the subquery was not materialized?

my optimizer_switch:

index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,*materialization=on*,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on

check: materialization=on, semijoin=on

some numbers: (all small tables)

select count(*) from lista_preco => 9338 rows;
select count(*) from est_atu => 5640 rows;
select count(*) from estoque_itens => 8788 rows;

maybe could we include a SQL_MATERIALIZE or something like it to execute the temporary create table and use it (materialize the subquery) ?

something like:

SELECT 
a.plano_conta_id,a.plano_conta_id_red,
(
	SELECT SUM(quant) FROM est_atu WHERE 
	unidade_id=1001 AND item_id=a.plano_conta_id AND item_id_red=a.plano_conta_id_red
) AS est, 0 as est_ext
FROM estoque_itens AS a 
WHERE 
a.plano_conta_id=67 AND a.habilitado='Y' AND grupo_itens='N'  AND 
((a.plano_conta_id=67 AND a.plano_conta_numero LIKE "%") )  AND 
a.plano_conta_id_red IN (
	SELECT SQL_MATERIALIZE plano_conta_id_red FROM lista_preco WHERE 
	distribuidor_tipo='j' AND distribuidor_id=1 AND plano_conta_id=67
)
 
 
HAVING est>0 OR est_ext>0 
ORDER BY a.plano_conta_id,a.codigo_busca LIMIT 30 OFFSET 0

today my work around without create table or without group concat is:

SELECT 
a.plano_conta_id,a.plano_conta_id_red,
(
	SELECT SUM(quant) FROM est_atu WHERE 
	unidade_id=1001 AND item_id=a.plano_conta_id AND item_id_red=a.plano_conta_id_red
) AS est, 0 as est_ext
FROM estoque_itens AS a 
WHERE 
(a.plano_conta_id,a.plano_conta_id_red) IN (
	SELECT c.plano_conta_id,c.plano_conta_id_red FROM lista_preco AS b,estoque_itens AS c WHERE 
	c.plano_conta_id=67 AND c.habilitado='Y' AND c.grupo_itens='N'  AND 
	((c.plano_conta_id=67 AND c.plano_conta_numero LIKE "%") )  AND 
	b.plano_conta_id=c.plano_conta_id AND
	b.plano_conta_id_red=c.plano_conta_id_red and
	b.distribuidor_tipo='j' AND b.distribuidor_id=1 AND b.plano_conta_id=67
)
 
 
HAVING est>0 OR est_ext>0 
ORDER BY a.plano_conta_id,a.codigo_busca LIMIT 30 OFFSET 0

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY a ref PRIMARY,id,plano_conta_numero PRIMARY 8 const 1769 Using where
1 PRIMARY c eq_ref PRIMARY,id,plano_conta_numero id 16 const,19_org.a.plano_conta_id_red 1 Using where
1 PRIMARY b ref PRIMARY,item item 10 const,19_org.a.plano_conta_id_red,const,const 1 Using where; Using index; FirstMatch(c)
2 DEPENDENT SUBQUERY est_atu ALL PRIMARY       5640 Using where

but that's 'very ugly', don't use materialized optimizer at explain, and use 4 tables instead of 3, i don't know if we have a problem with big unions using more tables, or if we are using the most optimized query



 Comments   
Comment by Elena Stepanova [ 2014-10-24 ]

Assigned to psergey] to check whether there is a problem in there, or is it something that's not expected to happen.

Comment by Varun Gupta (Inactive) [ 2018-06-11 ]

The IN subquery is not expected to materialize always. This is a SEMI-JOIN subquery and we have different strategies to execute such queries.

  • Table pullout optimization
  • FirstMatch execution strategy
  • Semi-join Materialization execution strategy
  • LooseScan execution strategy
  • DuplicateWeedout execution strategy

There is a cost-based approach to select them and as you have pointed out in Explain's output it uses LooseScan Execution strategy.
The DEPENDENT SUBQUERY in the Explain is the subquery in the SELECT LIST

Comment by Sergei Petrunia [ 2018-07-11 ]

Some mock data to get the query parsed : https://gist.github.com/spetrunia/a92bb8e68e641dc1aee0901aa1bc62f3 (The data distribution doesn't match the real one).

Comment by Sergei Petrunia [ 2018-07-11 ]

Some observations:

  • The subquery in the select list has an aggregate, SUM(quant). The aggregation is done in the subquery.
  • The top-level query has a HAVING clause, HAVING est>0 OR est_ext>0 }}. est_ext is always 0, so HAVING becomes {{est>0. est is the single-row subquery.
  • The workaround query is not using Materialization either, It is using First Match.
  • The original query has an ORDER BY ... LIMIT clause, and it is using Using temporary; Using filesort to resolve it. That is, it produces the join output, and then does sorting and then takes the first 30 rows.
  • The workaround query DOES NOT HAVE Using filesort. This means it is able to take advantage of an index to resolve the ORDER BY LIMIT. And it actually only finds the first 30 rows. Could this be the reason it is faster?
Comment by Sergei Petrunia [ 2018-07-11 ]

rspadim, can you try the original and workaround queries without the {{ORDER BY ... LIMIT }} clause? Do you still observe the difference in execution speed? Please post EXPLAINs for both queries.

If the speed difference is still there... we would need the table DDLs. And if you could upload the table data to the FTP, it would be of great help. My gut feeling is that the the optimizer should be able to handle such cases without the neeed for explicit hints to do Materialization.

Comment by roberto spadim [ 2018-07-11 ]

Hi Sergei, i will try it, it`s an old version, I will check if i can upgrade server to last version (running 10.0.21)

here the output:

explain
 
SELECT 
a.plano_conta_id,a.plano_conta_id_red,
(
	SELECT SUM(quant) FROM est_atu WHERE 
	unidade_id=1001 AND item_id=a.plano_conta_id AND item_id_red=a.plano_conta_id_red
) AS est, 0 as est_ext
FROM estoque_itens AS a 
WHERE 
a.plano_conta_id=67 AND a.habilitado='Y' AND grupo_itens='N'  AND 
((a.plano_conta_id=67 AND a.plano_conta_numero LIKE "%") )  AND 
a.plano_conta_id_red IN (
	SELECT plano_conta_id_red FROM lista_preco WHERE 
	distribuidor_tipo='j' AND distribuidor_id=1 AND plano_conta_id=67
)
 
 
HAVING est>0 OR est_ext>0 
ORDER BY a.plano_conta_id,a.codigo_busca LIMIT 30 OFFSET 0

Heading 1 Heading 2
id select_type table type possible_keys key key_len ref rows Extra
---: ---:
1 PRIMARY a ref PRIMARY,plano_conta_numero,codigo_busca codigo_busca 8 const 4290 Using where
1 PRIMARY <subquery3> eq_ref distinct_key distinct_key 3 func 1 Using where
3 MATERIALIZED lista_preco ref PRIMARY,item item 2 const 3337 Using where; Using index
2 DEPENDENT SUBQUERY est_atu ref PRIMARY,item item 24 19_org.a.plano_conta_id,19_org.a.plano_conta_id_red,const 1     Col A2

WITHOUT ORDER BY

explain
 
SELECT 
a.plano_conta_id,a.plano_conta_id_red,
(
	SELECT SUM(quant) FROM est_atu WHERE 
	unidade_id=1001 AND item_id=a.plano_conta_id AND item_id_red=a.plano_conta_id_red
) AS est, 0 as est_ext
FROM estoque_itens AS a 
WHERE 
a.plano_conta_id=67 AND a.habilitado='Y' AND grupo_itens='N'  AND 
((a.plano_conta_id=67 AND a.plano_conta_numero LIKE "%") )  AND 
a.plano_conta_id_red IN (
	SELECT plano_conta_id_red FROM lista_preco WHERE 
	distribuidor_tipo='j' AND distribuidor_id=1 AND plano_conta_id=67
)
 
 
HAVING est>0 OR est_ext>0 
#ORDER BY a.plano_conta_id,a.codigo_busca LIMIT 30 OFFSET 0

id select_type table type possible_keys key key_len ref rows Extra
---: ---:
1 PRIMARY a ref PRIMARY,plano_conta_numero,codigo_busca PRIMARY 8 const 4290 Using where
1 PRIMARY <subquery3> eq_ref distinct_key distinct_key 3 func 1 Using where
3 MATERIALIZED lista_preco ref PRIMARY,item item 2 const 3337 Using where; Using index
2 DEPENDENT SUBQUERY est_atu ref PRIMARY,item item 24 19_org.a.plano_conta_id,19_org.a.plano_conta_id_red,const 1  

the second query

EXPLAIN
SELECT SQL_NO_CACHE
a.plano_conta_id,a.plano_conta_id_red,
(
SELECT SUM(quant) FROM est_atu WHERE
unidade_id=1001 AND item_id=a.plano_conta_id AND item_id_red=a.plano_conta_id_red
) AS est, 0 as est_ext
FROM estoque_itens AS a
WHERE
(a.plano_conta_id,a.plano_conta_id_red) IN (
SELECT c.plano_conta_id,c.plano_conta_id_red FROM lista_preco AS b,estoque_itens AS c WHERE
c.plano_conta_id=67 AND c.habilitado='Y' AND c.grupo_itens='N' AND
((c.plano_conta_id=67 AND c.plano_conta_numero LIKE "%") ) AND
b.plano_conta_id=c.plano_conta_id AND
b.plano_conta_id_red=c.plano_conta_id_red and
b.distribuidor_tipo='j' AND b.distribuidor_id=1 AND b.plano_conta_id=67
)

HAVING est>0 OR est_ext>0
ORDER BY a.plano_conta_id,a.codigo_busca LIMIT 30 OFFSET 0
```

id select_type table type possible_keys key key_len ref rows Extra
---: ---:
1 PRIMARY a ref PRIMARY,plano_conta_numero,codigo_busca codigo_busca 8 const 2372 Using where; Using index
1 PRIMARY c eq_ref PRIMARY,plano_conta_numero,codigo_busca PRIMARY 16 const,19_org.a.plano_conta_id_red 1 Using where
1 PRIMARY b ref PRIMARY,item item 10 const,19_org.a.plano_conta_id_red,const,const 1 Using where; Using index; FirstMatch(c)
2 DEPENDENT SUBQUERY est_atu ref PRIMARY,item item 24 19_org.a.plano_conta_id,19_org.a.plano_conta_id_red,const 1  

WITHOUT ORDER BY

```sql
EXPLAIN
SELECT SQL_NO_CACHE
a.plano_conta_id,a.plano_conta_id_red,
(
SELECT SUM(quant) FROM est_atu WHERE
unidade_id=1001 AND item_id=a.plano_conta_id AND item_id_red=a.plano_conta_id_red
) AS est, 0 as est_ext
FROM estoque_itens AS a
WHERE
(a.plano_conta_id,a.plano_conta_id_red) IN (
SELECT c.plano_conta_id,c.plano_conta_id_red FROM lista_preco AS b,estoque_itens AS c WHERE
c.plano_conta_id=67 AND c.habilitado='Y' AND c.grupo_itens='N' AND
((c.plano_conta_id=67 AND c.plano_conta_numero LIKE "%") ) AND
b.plano_conta_id=c.plano_conta_id AND
b.plano_conta_id_red=c.plano_conta_id_red and
b.distribuidor_tipo='j' AND b.distribuidor_id=1 AND b.plano_conta_id=67
)

HAVING est>0 OR est_ext>0
#ORDER BY a.plano_conta_id,a.codigo_busca LIMIT 30 OFFSET 0
```

id select_type table type possible_keys key key_len ref rows Extra
---: ---:
1 PRIMARY a ref PRIMARY,plano_conta_numero,codigo_busca codigo_busca 8 const 2372 Using index
1 PRIMARY c eq_ref PRIMARY,plano_conta_numero,codigo_busca PRIMARY 16 const,19_org.a.plano_conta_id_red 1 Using where
1 PRIMARY b ref PRIMARY,item item 10 const,19_org.a.plano_conta_id_red,const,const 1 Using where; Using index; FirstMatch(c)
2 DEPENDENT SUBQUERY est_atu ref PRIMARY,item item 24 19_org.a.plano_conta_id,19_org.a.plano_conta_id_red,const 1  
Comment by roberto spadim [ 2018-07-11 ]

Server cache is hot, i didn`t checked speed changes ~ 0.032 to 0.094 s per query (with/without, first and second). not sure if the problem was solved, i will try to reproduce this problem with another machine, but i`m considering that newer version solved this problem

Comment by Varun Gupta (Inactive) [ 2018-07-11 ]

So with the explain we see

1) For the first query with or without ORDER BY we see MATERIALIZATION technique is used
2) For the second query with or without ORDER BY we are using FIRSTMATCH technique is used

Generated at Thu Feb 08 07:15:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.