Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7474

Semi-Join's DuplicateWeedout strategy skipped for some values of optimizer_search_depth

Details

    Description

      When "semijoin" is enabled via "optimizer_switch", data can be duplicated with certain values of "optimizer_search_depth".

      According to this documentation page:

      https://mariadb.com/kb/en/mariadb/documentation/managing-mariadb/optimization-and-tuning/query-optimizations/optimization-strategies/duplicateweedout-strategy/

      A DuplicateWeedout strategy is supposed to be implemented using temporary tables for semijoin queries. For some values of "optimizer_search_depth", this duplicate weedout step may not occur.

      Attached is the following:

      optimizer_search_depth_semijoin_data_setup.sql - A script that sets up a database and a few tables.

      optimizer_search_depth_semijoin_query_test.sql - A script that queries the tables to demonstrate the problem.

      optimizer_search_depth_semijoin_output.txt - Example execution of the scripts and their output.

      Attachments

        Activity

          Pushed a patch into https://github.com/MariaDB/server/tree/bb-10.0-mdev7474 . I'll need testing from elenst.

          psergei Sergei Petrunia added a comment - Pushed a patch into https://github.com/MariaDB/server/tree/bb-10.0-mdev7474 . I'll need testing from elenst .

          Ran a set of tests, got several hundred of mismatches between the patched version and baseline 10.0 (low optimizer_search_depth values). All differences look legit, related to the bugfix, so the good news is that tests hit the right spot; but due to the sheer amount of mismatches it's unrealistic to check each one separately.
          Next step – I'll run a similar test, but between the patched version with low optimizer_search_depth and baseline 10.0 with normal optimizer_search_depth. The expectation is that there should be no mismatches.

          On a separate note, I wonder if the commit comment is correct:

          JOIN::cur_dups_producing_tables was not maintained correctly in
          the cases of greedy optimization (search_depth > n_tables).

          Is it not the other way round, search-depth < n_tables?

          elenst Elena Stepanova added a comment - Ran a set of tests, got several hundred of mismatches between the patched version and baseline 10.0 (low optimizer_search_depth values). All differences look legit, related to the bugfix, so the good news is that tests hit the right spot; but due to the sheer amount of mismatches it's unrealistic to check each one separately. Next step – I'll run a similar test, but between the patched version with low optimizer_search_depth and baseline 10.0 with normal optimizer_search_depth. The expectation is that there should be no mismatches. On a separate note, I wonder if the commit comment is correct: JOIN::cur_dups_producing_tables was not maintained correctly in the cases of greedy optimization (search_depth > n_tables). Is it not the other way round, search-depth < n_tables?

          The second run went okay. I will also try valgrind tests, but please go ahead and push (we need to decide first whether it's for 5.5 or 10.0).

          elenst Elena Stepanova added a comment - The second run went okay. I will also try valgrind tests, but please go ahead and push (we need to decide first whether it's for 5.5 or 10.0).

          Considering the nature of the fix, it should be 5.5

          psergei Sergei Petrunia added a comment - Considering the nature of the fix, it should be 5.5

          Pushed into 5.5 tree

          psergei Sergei Petrunia added a comment - Pushed into 5.5 tree

          People

            psergei Sergei Petrunia
            GeoffMontee Geoff Montee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.