Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-401

Better nested semi-join processing

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None

    Description

      (this all started from https://bugs.launchpad.net/maria/+bug/929732)

      Currently, nested semi-joins are merged together into one semi-join. This makes semi-join optimization/execution simpler.

      This approach has downsides, though. If there is a semi-join subquery with multiple child semi-join subqueries, we will merge them all together into a semi-join that includes a cross-product join of the child subqueries. Running a cross-product join and then removing duplicates can be slow (see the launchpad bug for detailed description).

      Nested semi-join support
      ========================

      The idea is to support nested semi-joins. It will be possible for one semi-join
      nest to contain another.

      Semi-join nests will not be mixed with outer join nests.

      == Optimization

      === Table pullout
      current pull_out_semijoin_tables() will not work for nested semi-joins. With
      nested semi-joins, we should:

      • process inner semi-joins first, then their parents,
      • pull out of inner semi-join to the parent semi-join (and then pull out
        further to the top-level),
      • if we were doing pullout for a top-level semi-join S, and ended up in
        a situation where S has only semi-joins as its children, then S should be
        removed and its children made top-level.

      (TODO: what about PS and re-executions? we need to be able to keep the
      nested-semi-join structure permanent, and results of pullout must be kept
      transient...)

      === SJ-Materialization

      optimize_semijoin_nests() performs pre-optimization of potential
      SJ-Materialization nests (so that we know what cost to put there when we
      consider materialization)

      It is fairly trivial that we should run optimize_semijoin_nests() for
      innermost semi-join nests that are eligible for materialization.

      It is not obvious what to do for a semi-join nest that is

      • eligible for materialization,
      • has child semi-join nests.
        Should one run optimization separately, and chose the best semi-join
        strategy for children (which may or may not be materialization?)

      == Semi-join optimization as part of join optimization

      Re-worked advance_sj_state() makes it not as hopeless as before.

      TODO: Can we just postpone semi-join flattening until table pullout is done?
      this will resolve the bug...

      TODO: does current query plan representation allow for nesting?

      === FirstMatch
      === SJ-Materialization
      === Loose Scan
      === Duplicate Weedout

      == Execution
      === Duplicate Weedout

      • execution code supports nested.

      === FirstMatch

      • ?

      === SJ-Materialization

      • nested execution seems to be easy and automatic? We allow use of any
        sj-strategies inside the SJ-Materialization join_tabs...

      === Loose Scan

      • ?

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.