Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28

LIMIT ROWS EXAMINED clause to limit the number of rows examined during SELECT processing

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 5.5.21
    • Component/s: None
    • Labels:

      Description

      Description:

      The purpose of this task is to provide the means to terminate the execution of SELECT statements that examines too many rows, and thus uses too much resources. This is achieved through an extension of the LIMIT clause - LIMIT ROWS EXAMINED <number_of_rows>. Whenever possible the semantics of LIMIT ROWS EXAMINED is the same as that of normal LIMIT (for instance for aggregate functions).

      The current syntax for the clause is:

      SELECT ... FROM ... WHERE ...
      [group_clause] [order_clause]
      LIMIT [[offset,] row_count] ROWS EXAMINED rows_limit;

      Similar to the parameters of LIMIT, 'rows_limit' can be both a prepared statement parameter, or a stored program parameter.

      A simple example of the clause is:

      SELECT * from t1, t2 LIMIT 10 ROWS EXAMINED 10000;

      The LIMIT ROWS EXAMINED clause is global for the whole statement. If a composite query (such as UNION, or query with derived tables or with subqueries) contains more than one LIMIT ROWS EXAMINED, the last one parsed is taken into account. In this manner either the last or the outermost one is taken into account. For instance, in the query:

      SELECT * FROM t1
      WHERE c1 IN (SELECT * FROM t2 WHERE c2 > ' ' LIMIT ROWS EXAMINED 0)
      LIMIT ROWS EXAMINED 11;

      The limit that is taken into account is 11, not 0.

      The LIMIT ROWS EXAMINED clause is taken into account by the query engine only during query execution. Thus the clause is ignored in the following cases:

      • If a query is EXPLAIN-ed.
      • During query optimization.
      • During auxiliary operations such as writing to system tables (e.g. logs).

      The clause is not applicable to DELETE/UPDATE statements, and if used in these statements produces a syntax error.

      The effects of this clause are as follows:

      • The server counts the number of read, inserted, modified, and deleted rows during query execution. This takes into account the use of temporary tables, and sorting for intermediate query operations.
      • Once the counter exceeds the value specified in the LIMIT ROWS EXAMINED clause, query execution is terminated as soon as possible.
      • The effects of terminating the query because of LIMIT ROWS EXAMINED are as follows:
        • The result of the query is a subset of the complete query, depending on when the query engine detected that the limit was reached. The result may be empty if no result rows could be computed before reaching the limit.
        • A warning is generated of the form: "Query execution was interrupted. The query examined at least 100 rows, which exceeds LIMIT ROWS EXAMINED (20). The query result may be incomplete."
        • If query processing was interrupted during filesort, an error is returned in addition to the warning.
        • If a UNION was interrupted during execution of one of its queries, the last step of the UNION is still executed in order to produce a partial result.
        • Depending on the join and other execution strategies used for a query, the same query may produce no result at all, or a different subset of the complete result when terminated due to LIMIT ROWS EXAMINED.
        • If the query contains a GROUP BY clause, the last group where the limit was reached will be discarded.

      The LIMIT ROWS EXAMINED clause cannot be specified on per-subquery basis. There can be only one LIMIT ROWS EXAMINED clause for the whole SELECT statement. If a SELECT statement contains several subqueries with LIMIT ROWS EXAMINED, the one that is parsed last is taken into account.

      Low-level design:

      The implementation has the following aspects:

      1. Counting the number of examined rows.
      2. Checking if the limit has been exceeded.
      3. Interrupting query execution.

      These are implemented as follows:

      Counting the number of examined rows.

      The patch introduces a new thread-level counter THD::accessed_rows_and_keys. This counter is increased at each call to any of the family of handler methods handler::ha_*. Therefore this counter is equivalent to the sum of all counters increased by handler::increment_statistics(). Unlike all other handler counters in THD::status_var, accessed_rows_and_keys is reset automatically before each statement.

      The counter is equivalent to the sum of the following handler counters:

      ha_read_key_count, ha_read_next_count, ha_read_prev_count, ha_read_first_count, ha_read_last_count, ha_read_rnd_next_count, ha_read_rnd_count, ha_tmp_write_count, ha_tmp_update_count, ha_write_count, ha_update_count, ha_delete_count.

      These are the counters increased by handler::increment_statistics() which is
      used to increase the counter for LIMIT ROWS EXAMINED.

      Checking if the limit has been exceeded.

      The test if LIMIT_ROWS_EXAMINED has been exceeded is done in the method THD::check_limit_rows_examined(). This method is called by handler::increment_statistics(), which is called in its turn by all methods in the family handler::ha_*.

      Once the limit is exceeded, the handler method sets THD::killed = ABORT_QUERY, where ABORT_QUERY is a new enum in enum killed_state.

      In order not to stop query processing with an error, at the end of handle_select() we test for thd->killed == ABORT_QUERY, and if true,
      issue a warning, and reset the killed state. The warning can be tested with 'show warnings'.

      Interrupting query execution.

      The implementation reuses the fact that the query executioner checks in
      multiple places if THD::killed has been set, and stops execution.

      One goal of this task is to stop execution, but still produce the rows found until LIMIT ROWS EXAMINED was reached. Therefore at the end of the main execution loop, the function handle_select() resets both THD::killed, and removes the limit in order not to affect data accesses needed for system purposes (such as writing to log tables). As a result:

      • query execution is stopped,
      • a warning is generated,
      • the query finishes without an error (except when query execution was
        interrupted during filesort).

      TODO:

      • consider issuing an error (or warning) if there is more than one
        LIMIT ROWS EXAMINED clause in the whole query.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                timour Timour Katchaounov (Inactive)
                Reporter:
                timour Timour Katchaounov (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: