[MDEV-406] ANALYZE $stmt Created: 2012-07-20 Updated: 2019-10-09 Resolved: 2014-06-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | 10.1.0 |
| Type: | Task | Priority: | Major |
| Reporter: | Sergei Petrunia | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | analyze-stmt, optimizer | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
(Documentation is at https://mariadb.com/kb/en/analyze-statement/) == SQL syntax ==The new syntax:
ANALYZE $stmt will run the $stmt, and produce the output that EXPLAIN $stmt would produce, annotated with info about the query execution. == Adjustments to EXPLAIN output ==EXPLAIN FORMAT=JSON is easy to extend. As for tabular EXPLAIN form, the following columns will be added:
== Implementation at SQL layer ==The parser will set LEX::analyze_stmt flag for ANALYZE statements. Note: ANALYZE UPDATE statement actually makes the updates. With SBR, we will == Counting ==We will collect two kinds of counters: 1. Some are counted at SQL level, like filtered%, ICP_filtered, #rows, etc. 2. Some will be counted deeper inside the engine, like number of disk reads per table. The problems with the latter are
== Getting the counter values ==This is where the new SHOW EXPLAIN architecture plays against us. The problem is: at the end of JOIN::optimize(), the plan is saved into an Explain_select object has only "explain" information, it has no connection to An apparent solution is to have JOIN::cleanup() save execution data using a |
| Comments |
| Comment by Sergei Petrunia [ 2012-07-20 ] | |||||||||
|
test=# explain select count test=# explain analyze select count | |||||||||
| Comment by Sergei Petrunia [ 2012-07-20 ] | |||||||||
|
^^ Example from PostgreSQL | |||||||||
| Comment by Sergei Petrunia [ 2012-09-02 ] | |||||||||
|
When we have SHOW EXPLAIN, the natural way to save JOIN's plan is to produce a part of EXPLAIN output that describes the join. This means, we need to save what JOIN::print_explain has produced, and then replay it back. | |||||||||
| Comment by Sergei Petrunia [ 2012-09-02 ] | |||||||||
|
The code has Protocol_local, however
| |||||||||
| Comment by Sergei Petrunia [ 2012-09-02 ] | |||||||||
|
Another possible option is to use select_result_explain_buffer from the pre-review SHOW EXPLAIN code. However, that buffer relies on class Protocol to serialize the data | |||||||||
| Comment by Sergei Petrunia [ 2012-09-02 ] | |||||||||
|
.. which is ok, because the real select output is suppressed, so EXPLAIN ANALYZE can use thd->protocol for its own purposes. | |||||||||
| Comment by Sergei Petrunia [ 2014-05-10 ] | |||||||||
|
Additional notes from discussion with igor: It would be nice to show the amounts of time spent accessing the tables.
| |||||||||
| Comment by Sergei Petrunia [ 2014-05-10 ] | |||||||||
|
So,
I would also argue that the most frequent case is the one where we have log_slow_verbosity=explain. In this case, we don't want to add overhead to every query. At the same time, we want EXPLAIN ANALYZE output in the slow query log. This means, we need a way to run EXPLAIN ANALYZE without overhead. | |||||||||
| Comment by Sergei Petrunia [ 2014-05-19 ] | |||||||||
|
Notes from discussion with serg and elenst. Instead of "EXPLAIN ANALYZE $CMD" syntax, we will use "ANALYZE $cmd". "ANALYZE $cmd" will run the command (e.g. ANALYZE UPDATE will do the updates (checked: PG's EXPLAIN ANALYZE DELETE" will do deletes)), but produce output of "EXPLAIN $cmd", amended with information about actual execution. Notes from discussion with igor: It would be nice to count the number of disk io caused by each table. (counting time is even nicer but may cause overhead). | |||||||||
| Comment by Sergei Petrunia [ 2014-05-21 ] | |||||||||
|
Tasks 1. Get ANALYZE working for tabular EXPLAIN format After the above, we will have ANALYZE counterpart for the information that we've had in the tabular form of EXPLAIN output. 2. Extras in the tabular form 3. Support ANALYZE FORMAT=JSON and print more data | |||||||||
| Comment by Sergei Petrunia [ 2014-06-17 ] | |||||||||
|
Got basic things to work. The design is as follows:
| |||||||||
| Comment by Sergei Petrunia [ 2014-06-17 ] | |||||||||
|
Hit a problem (the example I am looking at is one with subquery, but one could probably hit it without it also). Currently, [SHOW] EXPLAIN code does:
The need to for call (2) was that because JOIN::exec() mades some last-minute changes to the query plan. I've tried to pull them out and put into JOIN::optimize() when working on SHOW EXPLAIN, but hit a problem that these changes are all over JOIN::exec(). For SHOW EXPLAIN, we could store these last-minute choices with the call (2). Howeve, call (2) has disastrous consequences for ANALYZE. It overwrites the query plan and destroys the counter values. So, I removed the call (2). I went through the known last-minute changes in the query plan (made by ORDER/GROUP BY optimizer), and added a call which saves just the changed info. Running tests after that has shown that there is another gotcha - INFORMATION_SCHEMA. For some reason, a part of I_S optimizations is made very late, right in JOIN::exec(). When we're running EXPLAIN, it goes into JOIN::exec, goes into ##get_all_tables()##, and then that function has " if (lex->describe) { return 0; }# in the middle of it. Apparently, there is no way to get the right query plan early in the current code. | |||||||||
| Comment by Sergei Petrunia [ 2014-06-17 ] | |||||||||
|
Possible options:
| |||||||||
| Comment by Sergei Petrunia [ 2014-06-17 ] | |||||||||
|
get_all_tables() is poorly written. It runs make_cond_for_info_schema(). That is, if we have a query like
then each subquery execution will call get_all_tables which will call make_cond_for_info_schema, and in the end we will allocate O(#rows(big_table)) of Item_cond_and objects. Saw it myself in debugger. | |||||||||
| Comment by Sergei Petrunia [ 2014-06-17 ] | |||||||||
|
As MySQL 5.6:
| |||||||||
| Comment by Sergei Petrunia [ 2014-06-20 ] | |||||||||
|
Another issue I'm facing after I've removed the second save_explain_data() call
VIEW's name is displayed instead of table name. It happens only on EXPLAIN | |||||||||
| Comment by Sergei Petrunia [ 2014-06-20 ] | |||||||||
|
Debugging how it worked in 10.0 with the second save_explain_data() call, I see that it worked in an unacceptable way:
| |||||||||
| Comment by Sergei Petrunia [ 2014-06-20 ] | |||||||||
|
Checking why EXPLAIN DELETE is affected and EXPLAIN SELECT is not... It turns out, there is a special kind of early merge, DT_MERGE_FOR_INSERT, which is also used for multi-table DELETEs. The view is merged here:
and when regular SELECT handling tries to merge it, it's already merged. A distinguishing case: view's TABLE_LIST has t->merged_for_insert=TRUE. | |||||||||
| Comment by Sergei Petrunia [ 2014-06-24 ] | |||||||||
|
Fixed the problem with I_S by splitting get_all_tables() into the optimizer part and executor part. | |||||||||
| Comment by Sergei Petrunia [ 2014-06-25 ] | |||||||||
|
Fixed the problem with order_by.test (it was a trivial bug). | |||||||||
| Comment by Sergei Petrunia [ 2014-06-25 ] | |||||||||
|
Merged with 10.1 | |||||||||
| Comment by Sergei Petrunia [ 2014-06-25 ] | |||||||||
|
As for "EXPLAIN UPDATE shows VIEW names":
In MySQL 5.6, they don't have table->pos_in_table_list->alias == v1. instead, they have table->pos_in_table_list->alias=t1. Grepping the source code for DT_MERGE_FOR_INSERT (or for DT_MERGE or related terms) finds nothing, so I assume they have re-worked derived table merge algorithm and so do not have this "special kind of merge" problem. | |||||||||
| Comment by Sergei Petrunia [ 2014-06-26 ] | |||||||||
|
Functionality intended for 10.1.0 has been pushed. |