|
Hello up there MariaDB team,
I'm currently debugging performance anomalies in MariaDB under NEO,
when same queries are usually executed fast, but sometimes badly slow. I've
discovered that some (not all) of such cases happen because wrong query
execution plan is somehow selected by optimizer. Why optimizer makes such
decision is a question.
For simple cases in a couple of times we can hopefully paper the problem over
with things like FORCE INDEX, but since this issue is starting to appear
regularly in many different places it is better to understand the root cause
and fix problem there.
That's where optimizer trace would be very handy - to get info from the system
about why such-and-such query execution decision has been made. With this
message I'd like to provide to MariaDB team feedback that optimizer trace is
needed in real-life since on production instances we cannot use debug builds
and get info from DBUG.
I provide examples of some anomalies in appendix for the reference.
Thanks beforehand,
Kirill
P.S. My familiarity with MariaDB is only several days, so please forgive me in
advance some possibly silly mistakes.
/cc ratzpo (@ratzpo)
Appendix
MariaDB 10.1.17. TokuDB is used as storage engine.
Example of bad query caught in slow log:
# Time: 160921 10:47:15
|
# User@Host: root[root] @ localhost []
|
# Thread_id: 6 Schema: neo1 QC_hit: No
|
# Query_time: 4.118192 Lock_time: 0.000024 Rows_sent: 0 Rows_examined: 8476058
|
# Rows_affected: 0
|
#
|
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
# explain: 1 SIMPLE obj ref PRIMARY,partition partition 10 const,const 1 8476058.00 100.00 0.00 Using where; Using index
|
#
|
SET timestamp=1474447635;
|
SELECT tid FROM obj WHERE `partition`=5 AND oid=79613 AND tid>268544235197088772 ORDER BY tid LIMIT 1;
|
Similar query caught in process list with analyze:
MariaDB [neo1]> show processlist;
|
+----+------+-----------+------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+------+-----------+------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 3 | root | localhost | neo0 | Sleep | 8 | | NULL | 0.000 |
|
| 4 | root | localhost | neo2 | Sleep | 15 | | NULL | 0.000 |
|
| 5 | root | localhost | neo3 | Sleep | 11 | | NULL | 0.000 |
|
| 6 | root | localhost | neo1 | Query | 3 | Queried about 7710000 rows | SELECT tid FROM obj WHERE `partition`=5 AND oid=79613 AND tid>268544341634012678 ORDER BY tid LIMIT | 0.000 |
|
| 10 | root | localhost | neo1 | Query | 0 | init | show processlist | 0.000 |
|
+----+------+-----------+------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
|
5 rows in set (0.00 sec)
|
|
MariaDB [neo1]> show explain for 6;
|
+------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------------------------------------+
|
| 1 | SIMPLE | obj | ref | PRIMARY,partition | partition | 10 | const,const | 1 | Using where; Using index; Using filesort |
|
+------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------------------------------------+
|
1 row in set, 1 warning (0.01 sec)
|
The same query in normal case:
MariaDB [neo1]> explain SELECT tid FROM obj WHERE `partition`=5 AND oid=79613 AND tid>268544341634012678 ORDER BY tid LIMIT 1;
|
+------+-------------+-------+-------+-------------------+-----------+---------+------+------+--------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+-------------------+-----------+---------+------+------+--------------------------+
|
| 1 | SIMPLE | obj | range | PRIMARY,partition | partition | 18 | NULL | 1295 | Using where; Using index |
|
+------+-------------+-------+-------+-------------------+-----------+---------+------+------+--------------------------+
|
1 row in set (0.00 sec)
|
|
MariaDB [neo1]> analyze SELECT tid FROM obj WHERE `partition`=5 AND oid=79613 AND tid>268544341634012678 ORDER BY tid LIMIT 1;
|
+------+-------------+-------+-------+-------------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+-------+-------+-------------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
|
| 1 | SIMPLE | obj | range | PRIMARY,partition | partition | 18 | NULL | 639 | 1.00 | 100.00 | 100.00 | Using where; Using index |
|
+------+-------------+-------+-------+-------------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
|
1 row in set (0.00 sec)
|
|
MariaDB [neo1]> SELECT tid FROM obj WHERE `partition`=5 AND oid=79613 AND tid>268544341634012678 ORDER BY tid LIMIT 1;
|
+--------------------+
|
| tid |
|
+--------------------+
|
| 268544342862377552 |
|
+--------------------+
|
1 row in set (0.00 sec)
|
NOTE the difference:
- time: 4s vs < 0.01s
- type: 'ref' vs 'range'
- key_len: 10 vs 18 (partition, oid) vs (partition, oid, tid)
- ref: const,const vs NULL
- "Using filesort"
For the reference:
MariaDB [neo1]> describe obj;
|
+-----------+----------------------+------+-----+---------+-------+
|
| Field | Type | Null | Key | Default | Extra |
|
+-----------+----------------------+------+-----+---------+-------+
|
| partition | smallint(5) unsigned | NO | PRI | NULL | |
|
| oid | bigint(20) unsigned | NO | PRI | NULL | |
|
| tid | bigint(20) unsigned | NO | PRI | NULL | |
|
| data_id | bigint(20) unsigned | YES | MUL | NULL | |
|
| value_tid | bigint(20) unsigned | YES | | NULL | |
|
+-----------+----------------------+------+-----+---------+-------+
|
5 rows in set (0.00 sec)
|
|
MariaDB [neo1]> show index from obj;
|
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
|
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|
| obj | 0 | PRIMARY | 1 | partition | A | 28253069 | NULL | NULL | | BTREE | | |
|
| obj | 0 | PRIMARY | 2 | tid | A | 28253069 | NULL | NULL | | BTREE | | |
|
| obj | 0 | PRIMARY | 3 | oid | A | 28253069 | NULL | NULL | | BTREE | | |
|
| obj | 1 | partition | 1 | partition | A | 28253069 | NULL | NULL | | BTREE | | |
|
| obj | 1 | partition | 2 | oid | A | 28253069 | NULL | NULL | | BTREE | | |
|
| obj | 1 | partition | 3 | tid | A | 28253069 | NULL | NULL | | BTREE | | |
|
| obj | 1 | data_id | 1 | data_id | A | 28253069 | NULL | NULL | YES | BTREE | | |
|
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|
7 rows in set (0.01 sec)
|
|
MariaDB [neo1]> show table status like 'obj'\G
|
*************************** 1. row ***************************
|
Name: obj
|
Engine: TokuDB
|
Version: 10
|
Row_format: Fixed
|
Rows: 28253276
|
Avg_row_length: 33
|
Data_length: 957477808
|
Max_data_length: 9223372036854775807
|
Index_length: 1779148337
|
Data_free: 913293312
|
Auto_increment: NULL
|
Create_time: 2016-08-16 10:33:02
|
Update_time: 2016-09-21 22:05:35
|
Check_time: NULL
|
Collation: utf8_unicode_ci
|
Checksum: NULL
|
Create_options:
|
Comment:
|
1 row in set (0.00 sec)
|
Another example:
https://lab.nexedi.com/nexedi/neoppod/commit/13911ca3
|
|
Some ideas about features we should have: https://lists.launchpad.net/maria-developers/msg11266.html
|
|
Ideas about features/milestones: https://lists.launchpad.net/maria-developers/msg11267.html
|
|
Takes from MYSQL's implementation
INFOMATMATION_SCHEMA.OPTIMIZER_TRACE
The table has the following fields
- QUERY
- Trace
- MISSING_BYTES_BEYOND_MAX_MEM_SIZE
- INSUFFICIENT_PRIVILEGES
Queries to trace
They are: SELECT; INSERT or REPLACE (with VALUES or SELECT); UPDATE/DELETE and their multi-table variants; all the previous ones prefixed by EXPLAIN; SET (unless it manipulates the optimizer_trace system variable); DO; DECLARE/CASE/IF/RETURN (stored routines language elements); CALL. If one of those statements is prepared and executed in separate steps, preparation and execution are separately traced.
Trace purging
By default, each new trace overwrites the previous trace. Thus, if a statement contains substatements (example: invokes stored procedures, stored functions, triggers), the top statement and substatements each generate one trace, but at the execution's end only the last substatement's trace is visible. A user who wants to see the trace of another substatement, can enable/disable tracing around the desired substatement, but this requires editing the routine's code, which may not be possible. Another solution is to tune trace purging.
SET optimizer_trace_offset=<OFFSET>, optimizer_trace_limit=<LIMIT>
The optimizer_trace System Variable
- enabled: allows to enable/disable tracing
During disabling, we need to make sure that we remove DBUG_PRINT and use the optimizer trace to write to the trace file. Avoid duplication of code.
- one_line: if on, the trace will have no whitespace; it's unreadable for humans but readable for JSON parsers (they ignore whitespace); the only advantage is a saving on space.
Selecting Optimizer Features to Trace
Those features can be excluded from tracing using the optimizer_trace_features system variable, which has these on/off switches
Security
- INFORMATION_SCHEMA.OPTIMIZER_TRACE.INSUFFICIENT_PRIVILEGES
- Who can view optimizer_trace?
INFORMATION_SCHEMA.OPTIMIZER_TRACE.INSUFFICIENT_PRIVILEGE
If a traced query uses views or stored routines that have SQL SECURITY with a value of DEFINER, it may be that a user other than the definer is denied from seeing the trace of the query. In that case, the trace is shown as empty and INSUFFICIENT_PRIVILEGES has a value of 1. Otherwise, the value is 0.
|
|
The branch is 10.4-mdev6111
|
|
https://jira.mariadb.org/browse/MDEV-13744 is an example where opt trace would be useful- to check whether loose scan was considered (and see that it was, but was discarded because it wasn't worth it)
|
|
Takeaways from yesterday discussion:
- It would be nice to select parts of the trace somehow.
- MySQL allows to control whether certain parts of the optimizer emit the trace: it has @@optimizer_trace_features where one can tune these flags: greedy_search,range_optimizer,dynamic_range,repeated_subselect.
- Can one use JSON functions to select parts of the trace?
|
|
Example from the call: range optimization. A trace from mysql-8: [^mdev6111-example-trace-from-mysql8.json] .
There is a JSONPath language which allows to select all range_analysis nodes like so: $..range_analysis (you can check this online at http://jsonpath.com/). It is not clear which of the objects refers to which table, though.
|
|
JSON example I was using: [^mdev6111-example-trace-from-mysql8.json.txt]
|
|
Another attempt mdev6111-example-trace-from-mysql8.txt
|
|
... However, MariaDB's JSON function do not allow to compute JSON paths? I tried JSON_EXTRACT and others, none could achieve what I wanted - extract all nodes in the document with the name range_analysis.
|
|
As for pretty-printing, there is JSON_DETAILED function but its output is not as beautiful as the one that EXPLAIN FORMAT=JSON pretty-printer produces.
Example:
|
|
"key_parts":
|
[
|
"a"
|
]
|
}
|
],
|
"setup_range_conditions":
|
[
|
],
|
|
|
WIth holyfoot's and Serg's input:
mysql> select JSON_DETAILED(JSON_EXTRACT(a, '$**.range_analysis')) from t20\G
|
*************************** 1. row ***************************
|
JSON_DETAILED(JSON_EXTRACT(a, '$**.range_analysis')): [
|
|
{
|
"table_scan":
|
{
|
"rows": 1000,
|
"cost": 103.1
|
},
|
"potential_range_indexes":
|
[
|
|
{
|
"index": "a",
|
"usable": true,
|
"key_parts":
|
[
|
"a"
|
]
|
}
|
],
|
"setup_range_conditions":
|
[
|
],
|
so, it is possible to extract parts of trace using JSON function. You'll need to re-format them, and formatter is not perfect, but still.
|
|
Pushed this:
https://github.com/MariaDB/server/commit/2dbe472ed011a951b28434ae8e67945e964d2030
Optimizer trace: print cost and #rows of the join prefix
|
|
The names rows_for_plan and cost_for_plan follow MySQL
|
Also added post-join-operation selectivity cost
|
|
|
Pushed this:
http://lists.askmonty.org/pipermail/commits/2019-August/013945.html
http://lists.askmonty.org/pipermail/commits/2019-August/013945.html
|
commit 4a490d1a993959222deb8bae822f40575586a148 (HEAD -> 10.4, origin/HEAD, origin/10.4, tmp10)
|
Author: Sergei Petrunia <psergey@askmonty.org>
|
Date: Sun Aug 25 11:03:19 2019 +0300
|
|
MDEV-6111: Optimizer Trace: add tracing for semi-join optimizations
|
|
Added:
|
- "semijoin_strategy_choice" element (actions in advance_sj_state(), name
|
matches the name in MySQL)
|
|
- semijoin_table_pullout element.
|
UPDATE: filed a MDEV to track the above patch: MDEV-20440.
|