Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19134

EXISTS() slower if ORDER BY is defined

Details

    Description

      In some specific cases, EXISTS() will slowdown if ORDER BY is defined in the subquery. Why is the reason to apply an ORDER BY to EXISTS()? None. But Laravel does that and I can't control it.

      The question is: ORDER BY inside of an EXISTS() will slowdown in a specific case. I think that the optimizer is not ignoring that.

      Setup:

      CREATE TABLE IF NOT EXISTS `table_a` (
      	`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
      	PRIMARY KEY (`id`)
      )
      ENGINE=InnoDB;
       
      CREATE TABLE IF NOT EXISTS `table_b` (
      	`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
      	`index_a` INT(10) UNSIGNED NULL DEFAULT NULL,
      	`index_b` INT(10) UNSIGNED NULL DEFAULT NULL,
      	`index_c` INT(10) UNSIGNED NULL DEFAULT NULL,
      	`parameter` INT(10) UNSIGNED NULL DEFAULT NULL,
      	PRIMARY KEY (`id`),
      	INDEX `index_a_index_b` (`index_a`, `index_b`),
      	INDEX `index_c` (`index_c`),
      	INDEX `parameter` (`parameter`)
      )
      ENGINE=InnoDB;
      

      Populate:

      DELIMITER ;;
      FOR i IN 1 .. 1250
      DO
        INSERT INTO table_a VALUES ();
        INSERT INTO table_b (index_a, index_b, index_c, parameter) VALUES (123, i, NULL, i);
      END FOR;;
      DELIMITER ;
      

      Slow query:

      SELECT table_a.id
      FROM   table_a 
       
      WHERE
      	EXISTS (
      		SELECT *
      		FROM table_b
      		
      		WHERE table_b.index_a = 123 AND
      			  table_b.index_b = table_a.id AND
      			  table_b.parameter = 123
       
      		ORDER BY table_b.index_c
      	);
      

      Cleanup:

      DROP TABLE IF EXISTS table_a;
      DROP TABLE IF EXISTS table_b;
      

      What I realized:

      • Dropping ORDER BY will solve (which is not an option for me, unfortunatelly);
      • Dropping index index_a_index_b or index_c will solve;
      • Moving index index_a_index_b to end will solve;
      • Split index index_a_index_b into two separated indexes will solve;

      And about the parameter column:

      I created this column just to make sure that it will returns only the `table_a.id = 123` as an additional parameter. In this case, I could run a different query with the same behaviour using `IN()` that will not will slowdown (0.000s vs. 1.516s for 1250 x 1250 rows).

      SELECT table_a.id
      FROM   table_a
       
      WHERE
      	table_a.id IN (
      		SELECT table_b.parameter
      		FROM table_b
      		
      		WHERE table_b.index_a = 123 AND
      			   table_b.index_b = table_a.id AND
      			   table_b.parameter = 123
      		
      		ORDER BY table_b.index_c
      	)
      

      Explain:

      The EXPLAIN is a bit different.

      • The PRIMARY is the same: table_a, type index, possible keys NULL, key PRIMARY, key length 4, ref NULL, rows 1445, extra using where; using index;
      • The DEPENDENT SUBQUERY will have some equal values, like table table_b, possible keys index_a_index_b,parameter, rows 1 and extra using where;
      • The DEPENDENT SUBQUERY of EXISTS() query will be type index, key index_c, key length 5 and ref NULL;
      • The DEPENDENT SUBQUERY of IN() query will be type ref, key index_a_index_b, key length 10 and ref const,test.table_a.id;

      About ORDER BY insde EXISTS()

      This query is generated by Laravel framework when we use the whereHas() method. Because of a Scope it add the ORDER BY in this place. As workaround I have modified the code to use IN() instead of EXISTS(), but I think that it could be optimized to work fine in all the cases (even with ORDER BY).

      Attachments

        Activity

          rentalhost David Rodrigues created issue -
          elenst Elena Stepanova made changes -
          Field Original Value New Value
          Assignee Alice Sherepa [ alice ]
          alice Alice Sherepa made changes -
          Assignee Alice Sherepa [ alice ] Sergei Petrunia [ psergey ]

          ANALYZE FORMAT=JSON output on the current 10.1.40:

            "query_block": {
              "select_id": 1,
              "r_loops": 1,
              "r_total_time_ms": 5.708,
              "table": {
                "table_name": "table_a",
                "access_type": "index",
                "key": "PRIMARY",
                "key_length": "4",
                "used_key_parts": ["id"],
                "r_loops": 1,
                "rows": 1250,
                "r_rows": 1250,
                "r_total_time_ms": 0.3471,
                "filtered": 100,
                "r_filtered": 0.08,
                "attached_condition": "<in_optimizer>(table_a.`id`,<exists>(subquery#2))",
                "using_index": true
              },
              "subqueries": [
                {
                  "query_block": {
                    "select_id": 2,
                    "r_loops": 1250,
                    "r_total_time_ms": 4.6903,
                    "table": {
                      "table_name": "table_b",
                      "access_type": "ref",
                      "possible_keys": ["index_a_index_b", "parameter"],
                      "key": "index_a_index_b",
                      "key_length": "10",
                      "used_key_parts": ["index_a", "index_b"],
                      "ref": ["const", "test.table_a.id"],
                      "r_loops": 1250,
                      "rows": 1,
                      "r_rows": 1,
                      "r_total_time_ms": 3.5381,
                      "filtered": 100,
                      "r_filtered": 0.08,
                      "attached_condition": "((1250 = table_b.parameter) and (table_b.parameter = 123))"
                    }
                  }
                }
              ]
            }
          } 
          

          psergei Sergei Petrunia added a comment - ANALYZE FORMAT=JSON output on the current 10.1.40: "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 5.708, "table": { "table_name": "table_a", "access_type": "index", "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "r_loops": 1, "rows": 1250, "r_rows": 1250, "r_total_time_ms": 0.3471, "filtered": 100, "r_filtered": 0.08, "attached_condition": "<in_optimizer>(table_a.`id`,<exists>(subquery#2))", "using_index": true }, "subqueries": [ { "query_block": { "select_id": 2, "r_loops": 1250, "r_total_time_ms": 4.6903, "table": { "table_name": "table_b", "access_type": "ref", "possible_keys": ["index_a_index_b", "parameter"], "key": "index_a_index_b", "key_length": "10", "used_key_parts": ["index_a", "index_b"], "ref": ["const", "test.table_a.id"], "r_loops": 1250, "rows": 1, "r_rows": 1, "r_total_time_ms": 3.5381, "filtered": 100, "r_filtered": 0.08, "attached_condition": "((1250 = table_b.parameter) and (table_b.parameter = 123))" } } } ] } }

          If I remove the ORDER BY, the subquery is merged into a semi-join:

          {
            "query_block": {
              "select_id": 1,
              "r_loops": 1,
              "r_total_time_ms": 0.0467,
              "table": {
                "table_name": "table_a",
                "access_type": "const",
                "possible_keys": ["PRIMARY"],
                "key": "PRIMARY",
                "key_length": "4",
                "used_key_parts": ["id"],
                "ref": ["const"],
                "r_loops": 0,
                "rows": 1,
                "r_rows": null,
                "filtered": 100,
                "r_filtered": null,
                "using_index": true
              },
              "table": {
                "table_name": "table_b",
                "access_type": "ref",
                "possible_keys": ["index_a_index_b", "parameter"],
                "key": "index_a_index_b",
                "key_length": "10",
                "used_key_parts": ["index_a", "index_b"],
                "ref": ["const", "const"],
                "r_loops": 1,
                "rows": 1,
                "r_rows": 1,
                "r_total_time_ms": 0.0218,
                "filtered": 100,
                "r_filtered": 100,
                "attached_condition": "(table_b.parameter = 123)",
                "first_match": "table_a"
              }
            }
          }
          

          psergei Sergei Petrunia added a comment - If I remove the ORDER BY, the subquery is merged into a semi-join: { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 0.0467, "table": { "table_name": "table_a", "access_type": "const", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "ref": ["const"], "r_loops": 0, "rows": 1, "r_rows": null, "filtered": 100, "r_filtered": null, "using_index": true }, "table": { "table_name": "table_b", "access_type": "ref", "possible_keys": ["index_a_index_b", "parameter"], "key": "index_a_index_b", "key_length": "10", "used_key_parts": ["index_a", "index_b"], "ref": ["const", "const"], "r_loops": 1, "rows": 1, "r_rows": 1, "r_total_time_ms": 0.0218, "filtered": 100, "r_filtered": 100, "attached_condition": "(table_b.parameter = 123)", "first_match": "table_a" } } }

          10.4 (debug build)

          {
            "query_block": {
              "select_id": 1,
              "r_loops": 1,
              "r_total_time_ms": 1.9693,
              "table": {
                "table_name": "table_a",
                "access_type": "index",
                "key": "PRIMARY",
                "key_length": "4",
                "used_key_parts": ["id"],
                "r_loops": 1,
                "rows": 1250,
                "r_rows": 1250,
                "r_total_time_ms": 0.2257,
                "filtered": 100,
                "r_filtered": 0.08,
                "attached_condition": "<in_optimizer>(table_a.`id`,<exists>(subquery#2))",
                "using_index": true
              },
              "subqueries": [
                {
                  "query_block": {
                    "select_id": 2,
                    "r_loops": 1250,
                    "r_total_time_ms": 1.4948,
                    "table": {
                      "table_name": "table_b",
                      "access_type": "ref",
                      "possible_keys": ["index_a_index_b", "parameter"],
                      "key": "parameter",
                      "key_length": "5",
                      "used_key_parts": ["parameter"],
                      "ref": ["const"],
                      "r_loops": 1250,
                      "rows": 1,
                      "r_rows": 0.0008,
                      "r_total_time_ms": 0.8395,
                      "filtered": 100,
                      "r_filtered": 100,
                      "index_condition": "1250 = table_b.parameter",
                      "attached_condition": "table_b.index_b = table_a.`id` and table_b.index_a = 123"
                    }
                  }
                }
              ]
            }
          }
          

          note that I don't get the sorting step here (but the bug report says it's done?)

          psergei Sergei Petrunia added a comment - 10.4 (debug build) { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 1.9693, "table": { "table_name": "table_a", "access_type": "index", "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "r_loops": 1, "rows": 1250, "r_rows": 1250, "r_total_time_ms": 0.2257, "filtered": 100, "r_filtered": 0.08, "attached_condition": "<in_optimizer>(table_a.`id`,<exists>(subquery#2))", "using_index": true }, "subqueries": [ { "query_block": { "select_id": 2, "r_loops": 1250, "r_total_time_ms": 1.4948, "table": { "table_name": "table_b", "access_type": "ref", "possible_keys": ["index_a_index_b", "parameter"], "key": "parameter", "key_length": "5", "used_key_parts": ["parameter"], "ref": ["const"], "r_loops": 1250, "rows": 1, "r_rows": 0.0008, "r_total_time_ms": 0.8395, "filtered": 100, "r_filtered": 100, "index_condition": "1250 = table_b.parameter", "attached_condition": "table_b.index_b = table_a.`id` and table_b.index_a = 123" } } } ] } } note that I don't get the sorting step here (but the bug report says it's done?)

          Ok, If I run the ANALYZE FORMAT=JSON immediately after the INSERT statements (without a server restart in the middle), I get the sort step, too:

          | {
            "query_block": {
              "select_id": 1,
              "r_loops": 1,
              "r_total_time_ms": 17.856,
              "table": {
                "table_name": "table_a",
                "access_type": "index",
                "key": "PRIMARY",
                "key_length": "4",
                "used_key_parts": ["id"],
                "r_loops": 1,
                "rows": 1250,
                "r_rows": 1250,
                "r_total_time_ms": 0.844,
                "filtered": 100,
                "r_filtered": 0.08,
                "attached_condition": "<in_optimizer>(1,exists(subquery#2))",
                "using_index": true
              },
              "subqueries": [
                {
                  "expression_cache": {
                    "state": "disabled",
                    "r_loops": 200,
                    "r_hit_ratio": 0,
                    "query_block": {
                      "select_id": 2,
                      "r_loops": 1250,
                      "r_total_time_ms": 15.395,
                      "read_sorted_file": {
                        "r_rows": 0.0008,
                        "filesort": {
                          "sort_key": "table_b.index_c",
                          "r_loops": 1250,
                          "r_total_time_ms": 10.515,
                          "r_limit": 1,
                          "r_used_priority_queue": true,
                          "r_output_rows": 0,
                          "table": {
                            "table_name": "table_b",
                            "access_type": "ref",
                            "possible_keys": ["index_a_index_b", "parameter"],
                            "key": "parameter",
                            "key_length": "5",
                            "used_key_parts": ["parameter"],
                            "ref": ["const"],
                            "r_loops": 1250,
                            "rows": 1,
                            "r_rows": 1,
                            "r_total_time_ms": 6.5982,
                            "filtered": 100,
                            "r_filtered": 0.08,
                            "attached_condition": "table_b.parameter <=> 123 and table_b.index_a = 123 and table_b.index_b = table_a.`id`"
                          }
                        }
                      }
                    }
                  }
                }
              ]
            }
          }
          

          psergei Sergei Petrunia added a comment - Ok, If I run the ANALYZE FORMAT=JSON immediately after the INSERT statements (without a server restart in the middle), I get the sort step, too: | { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 17.856, "table": { "table_name": "table_a", "access_type": "index", "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "r_loops": 1, "rows": 1250, "r_rows": 1250, "r_total_time_ms": 0.844, "filtered": 100, "r_filtered": 0.08, "attached_condition": "<in_optimizer>(1,exists(subquery#2))", "using_index": true }, "subqueries": [ { "expression_cache": { "state": "disabled", "r_loops": 200, "r_hit_ratio": 0, "query_block": { "select_id": 2, "r_loops": 1250, "r_total_time_ms": 15.395, "read_sorted_file": { "r_rows": 0.0008, "filesort": { "sort_key": "table_b.index_c", "r_loops": 1250, "r_total_time_ms": 10.515, "r_limit": 1, "r_used_priority_queue": true, "r_output_rows": 0, "table": { "table_name": "table_b", "access_type": "ref", "possible_keys": ["index_a_index_b", "parameter"], "key": "parameter", "key_length": "5", "used_key_parts": ["parameter"], "ref": ["const"], "r_loops": 1250, "rows": 1, "r_rows": 1, "r_total_time_ms": 6.5982, "filtered": 100, "r_filtered": 0.08, "attached_condition": "table_b.parameter <=> 123 and table_b.index_a = 123 and table_b.index_b = table_a.`id`" } } } } } } ] } }

          ORDER BY without LIMIT can be removed for EXISTS/IN subqueries.

          For IN subqueries, LIMIT is not a concern, because MariaDB (and MySQL) doesn't support them. The queries will be rejected with this error:

          ERROR 1235 (42000): This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
          

          EXISTS subqueries allow LIMIT. "ORDER BY ... LIMIT n" should be removable. "ORDER BY ... LIMIT x OFFSET y" is not removable.

          psergei Sergei Petrunia added a comment - ORDER BY without LIMIT can be removed for EXISTS/IN subqueries. For IN subqueries, LIMIT is not a concern, because MariaDB (and MySQL) doesn't support them. The queries will be rejected with this error: ERROR 1235 (42000): This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' EXISTS subqueries allow LIMIT. "ORDER BY ... LIMIT n" should be removable. "ORDER BY ... LIMIT x OFFSET y" is not removable.
          psergei Sergei Petrunia added a comment - - edited

          Found this piece in item_subselect.cc, Item_in_subselect::select_in_like_transformer

            /*
              IN/SOME/ALL/ANY subqueries aren't support LIMIT clause. Without it
              ORDER BY clause becomes meaningless thus we drop it here.
            */
            for (SELECT_LEX *sl= current->master_unit()->first_select();
                 sl; sl= sl->next_select())
            {
              if (sl->join)
              {
                sl->join->order= 0;
                sl->join->skip_sort_order= 1;
              }
            }
          

          psergei Sergei Petrunia added a comment - - edited Found this piece in item_subselect.cc , Item_in_subselect::select_in_like_transformer /* IN/SOME/ALL/ANY subqueries aren't support LIMIT clause. Without it ORDER BY clause becomes meaningless thus we drop it here. */ for (SELECT_LEX *sl= current->master_unit()->first_select(); sl; sl= sl->next_select()) { if (sl->join) { sl->join->order= 0; sl->join->skip_sort_order= 1; } }

          Investigated.

          The optimizer does these query rewrites in this order:

          1. EXISTS->IN transformation.
          2. IN->semi-join conversion
          3. IN->EXISTS rewrite, MIN/MAX rewrite
          3.1 Removal of redundant ORDER BY in subquery.
          

          For the EXISTS subquery from this bug report, the following happens:

          1. EXISTS->IN doesn't fire because it doesn't allow ORDER BY clause
          2. IN->semi-join conversion doesn't happen (because #1 didn't)
          3.1 Removal of redundant ORDER BY in subquery doesn't cover EXISTS queries.
          

          In explain, we see sorting, and no merging.

          For the IN-subquery from this bug report, the following happens:

          2. IN->semi-join conversion doesn't happen because it doesn't allow ORDER BY clause
          3.1. Redundant ORDER BY is removed. 
          

          In explain, we see that sorting is not done, but merging is done.

          psergei Sergei Petrunia added a comment - Investigated. The optimizer does these query rewrites in this order: 1. EXISTS->IN transformation. 2. IN->semi-join conversion 3. IN->EXISTS rewrite, MIN/MAX rewrite 3.1 Removal of redundant ORDER BY in subquery. For the EXISTS subquery from this bug report, the following happens: 1. EXISTS->IN doesn't fire because it doesn't allow ORDER BY clause 2. IN->semi-join conversion doesn't happen (because #1 didn't) 3.1 Removal of redundant ORDER BY in subquery doesn't cover EXISTS queries. In explain, we see sorting, and no merging. For the IN-subquery from this bug report, the following happens: 2. IN->semi-join conversion doesn't happen because it doesn't allow ORDER BY clause 3.1. Redundant ORDER BY is removed. In explain, we see that sorting is not done, but merging is done.

          Committed a patch that makes the order of transformations to be:

          1. EXISTS->IN transformation.
          2. Remove redundant ORDER BY [LIMIT] in subqueries
          3. IN->semi-join conversion
          

          Now, for the IN-subquery from this bug report:

          2. Remove redundant ORDER BY [LIMIT] in subqueries works
          2. IN->semi-join conversion works
          

          and we should get a good query plan.

          For the EXISTS subquery from this bug report:

          1. EXISTS->IN transformation doesn't fire (because ORDER BY is present)
          2. Remove redundant ORDER BY [LIMIT] in subqueries - works and removes it
          3. IN->semi-join conversion - doesn't work because EXIST->IN didn't
          

          The sorting is gone but we are not able to merge it to semi-join yet.

          psergei Sergei Petrunia added a comment - Committed a patch that makes the order of transformations to be: 1. EXISTS->IN transformation. 2. Remove redundant ORDER BY [LIMIT] in subqueries 3. IN->semi-join conversion Now, for the IN-subquery from this bug report: 2. Remove redundant ORDER BY [LIMIT] in subqueries works 2. IN->semi-join conversion works and we should get a good query plan. For the EXISTS subquery from this bug report: 1. EXISTS->IN transformation doesn't fire (because ORDER BY is present) 2. Remove redundant ORDER BY [LIMIT] in subqueries - works and removes it 3. IN->semi-join conversion - doesn't work because EXIST->IN didn't The sorting is gone but we are not able to merge it to semi-join yet.

          The Step #1 patch: http://lists.askmonty.org/pipermail/commits/2019-May/013738.html . sanja could you please review?

          psergei Sergei Petrunia added a comment - The Step #1 patch: http://lists.askmonty.org/pipermail/commits/2019-May/013738.html . sanja could you please review?

          I've also tried to make EXISTS->IN transformation work when ORDER BY is present but didn't succeed. I'm hitting this assert in Item_exists_subselect::exists2in_processor

            DBUG_ASSERT(first_select->join->all_fields.elements ==
                        first_select->item_list.elements);
          

          psergei Sergei Petrunia added a comment - I've also tried to make EXISTS->IN transformation work when ORDER BY is present but didn't succeed. I'm hitting this assert in Item_exists_subselect::exists2in_processor DBUG_ASSERT(first_select->join->all_fields.elements == first_select->item_list.elements);
          psergei Sergei Petrunia added a comment - - edited

          Discussed with Sanja. The takeaways are:

          • Check again if ORDER BY ... LIMIT is removable for EXISTS subqueries (I initially thought that no, then he said yes during the conversation. Now I'm trying to type down the reasoning and find it to be faulty)
          • Item_exists_subselect::exists2in_processor should be able to ignore the ORDER BY clause. There are some adjustments to be made (e.g. asserts removed) but this should be easily doable.
          psergei Sergei Petrunia added a comment - - edited Discussed with Sanja. The takeaways are: Check again if ORDER BY ... LIMIT is removable for EXISTS subqueries (I initially thought that no, then he said yes during the conversation. Now I'm trying to type down the reasoning and find it to be faulty) Item_exists_subselect::exists2in_processor should be able to ignore the ORDER BY clause. There are some adjustments to be made (e.g. asserts removed) but this should be easily doable.
          psergei Sergei Petrunia added a comment - Patch for Step #2: http://lists.askmonty.org/pipermail/commits/2019-May/013758.html
          psergei Sergei Petrunia made changes -
          Status Open [ 1 ] In Progress [ 3 ]

          Sanja, please review both patches.

          psergei Sergei Petrunia added a comment - Sanja, please review both patches.
          psergei Sergei Petrunia made changes -
          Assignee Sergei Petrunia [ psergey ] Oleksandr Byelkin [ sanja ]
          Status In Progress [ 3 ] In Review [ 10002 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.4 [ 22408 ]

          OK to push (both)

          sanja Oleksandr Byelkin added a comment - OK to push (both)
          sanja Oleksandr Byelkin made changes -
          Assignee Oleksandr Byelkin [ sanja ] Sergei Petrunia [ psergey ]
          Status In Review [ 10002 ] Stalled [ 10000 ]
          psergei Sergei Petrunia made changes -
          Component/s Query Cache [ 10120 ]
          Component/s Views [ 10111 ]

          rentalhost, thanks for reporting this!

          The fix is pushed into MariaDB 10.4 tree.

          It should be trivial to apply the patch to previous MariaDB versions as well. The reason we are fixing this only in 10.4 is that it changes the query plans which may potentially cause a slowdown for somebody.

          psergei Sergei Petrunia added a comment - rentalhost , thanks for reporting this! The fix is pushed into MariaDB 10.4 tree. It should be trivial to apply the patch to previous MariaDB versions as well. The reason we are fixing this only in 10.4 is that it changes the query plans which may potentially cause a slowdown for somebody.
          psergei Sergei Petrunia made changes -
          Fix Version/s 10.4.5 [ 23311 ]
          Fix Version/s 10.4 [ 22408 ]
          Resolution Fixed [ 1 ]
          Status Stalled [ 10000 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 94165 ] MariaDB v4 [ 156022 ]

          People

            psergei Sergei Petrunia
            rentalhost David Rodrigues
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.