[MDEV-401] Better nested semi-join processing Created: 2012-07-18  Updated: 2015-11-17

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 3
Labels: optimizer

Issue Links:
PartOf
includes MDEV-582 LP:929732 - semijoin much slower than... Stalled

 Description   

(this all started from https://bugs.launchpad.net/maria/+bug/929732)

Currently, nested semi-joins are merged together into one semi-join. This makes semi-join optimization/execution simpler.

This approach has downsides, though. If there is a semi-join subquery with multiple child semi-join subqueries, we will merge them all together into a semi-join that includes a cross-product join of the child subqueries. Running a cross-product join and then removing duplicates can be slow (see the launchpad bug for detailed description).

Nested semi-join support
========================

The idea is to support nested semi-joins. It will be possible for one semi-join
nest to contain another.

Semi-join nests will not be mixed with outer join nests.

== Optimization

=== Table pullout
current pull_out_semijoin_tables() will not work for nested semi-joins. With
nested semi-joins, we should:

  • process inner semi-joins first, then their parents,
  • pull out of inner semi-join to the parent semi-join (and then pull out
    further to the top-level),
  • if we were doing pullout for a top-level semi-join S, and ended up in
    a situation where S has only semi-joins as its children, then S should be
    removed and its children made top-level.

(TODO: what about PS and re-executions? we need to be able to keep the
nested-semi-join structure permanent, and results of pullout must be kept
transient...)

=== SJ-Materialization

optimize_semijoin_nests() performs pre-optimization of potential
SJ-Materialization nests (so that we know what cost to put there when we
consider materialization)

It is fairly trivial that we should run optimize_semijoin_nests() for
innermost semi-join nests that are eligible for materialization.

It is not obvious what to do for a semi-join nest that is

  • eligible for materialization,
  • has child semi-join nests.
    Should one run optimization separately, and chose the best semi-join
    strategy for children (which may or may not be materialization?)

== Semi-join optimization as part of join optimization

Re-worked advance_sj_state() makes it not as hopeless as before.

TODO: Can we just postpone semi-join flattening until table pullout is done?
this will resolve the bug...

TODO: does current query plan representation allow for nesting?

=== FirstMatch
=== SJ-Materialization
=== Loose Scan
=== Duplicate Weedout

== Execution
=== Duplicate Weedout

  • execution code supports nested.

=== FirstMatch

  • ?

=== SJ-Materialization

  • nested execution seems to be easy and automatic? We allow use of any
    sj-strategies inside the SJ-Materialization join_tabs...

=== Loose Scan

  • ?

Generated at Thu Feb 08 06:28:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.