[MDEV-139] LIMIT and subqueries Created: 2012-02-09  Updated: 2020-06-02

Status: Open
Project: MariaDB Server
Component/s: Data Manipulation - Subquery, Optimizer
Fix Version/s: None

Type: Task Priority: Major
Reporter: Rasmus Johansson (Inactive) Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 3
Labels: optimizer, pf1

Issue Links:
Duplicate
duplicates MDEV-12381 Add support for LIMIT in subqueries i... Open

 Description   

Domas initial request:

Essentially we want to be able to use:

SELECT * FROM ( SELECT ... WHERE x=123 LIMIT 1000000 ) WHERE y=456 LIMIT 1000;

Mostly this is to allow paging over large datasets that don't have that great selectivity - so we want to be able to return within reasonable time, even if x=123 condition matches billions of rows, but
y=456 matches very few.
Currently the internal subselect would get materialized by MySQL and wouldn't be used as streaming dataset, which makes it prohibitive on OLTP machines.

In certain cases having DISTINCT not materialize would also be extremely useful for us, but that would probably somewhat different task? (we want to optimize b=Y,c=Z fetches on (a,b,c) indexes).



 Comments   
Comment by Rasmus Johansson (Inactive) [ 2012-02-09 ]

By Petrunia:

If inner query didn't have LIMIT:

SELECT * FROM ( SELECT ... WHERE x=123) WHERE y=456 LIMIT 1000;

then MariaDB 5.3 (and MySQL 5.6) is able to handle this case by merging the inner select into the outer.

"LIMIT 1000000" will prevents the merging, and the subquery will be materialized (that is, really fully read and stored in a temporary table).

Currently there is no support for "streaming" FROM-clause subqueries. It is generally hard, because MySQL's join execution code has this structure:

When we run execute a JOIN of tables t1, t2, t3, in that order, the call stack will look like this:

JOIN::exec
sub_select(table t1)
sub_select(table t2)
sub_select(table t2)
end_send()

{ send the (t1 x t2 x t3) record to output}

changing this execution model to provide a JOIN::get_next_output_record() function is rather difficult. Kostja has made an attempt to do this for cursors and he didn't succeed.

However, if we're looking at only handling the special case where the subquery has form of

(SELECT ... FROM single_table WHERE ... LIMIT ...)

that is

  • select from single base table,
  • no [outer] joins or join buffering,
  • no grouping

then the task becomes much simpler.

Still, the task may depend on a piece of code from MariaDB 5.3/MySQL 5.5. The piece of code is the patch changed FROM-clause subqueries from being materialized very early (even before the parent join is optimized) to being materialized right before it is read

Comment by Rasmus Johansson (Inactive) [ 2012-02-09 ]

Regarding DISTINCT by Petrunia:

  • I think we should get to this question after having resolved difficulties
    mentioned earlier in this email
  • It is generally possible to have DISTINCT run in an on-the-fly mode but then
    there will be a question of when the optimizer should choose to do that.
Comment by Rasmus Johansson (Inactive) [ 2012-02-09 ]

By Domas:

Nested-loop-only support would be nice, OTOH, grouping, when done via a loose scan, could be extremely useful too.

Comment by Robert Dyas [ 2020-06-02 ]

This looks like a really old issue that has been forgotten. I ran into it today with a client. Very useful to be able to use LIMIT in a subquery inside an IN clause.
Any chance this is more easily fixable these days?

Generated at Thu Feb 08 06:26:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.