Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Hi guys, I have some queries like this:
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%" |
UNION |
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%" |
UNION |
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%" |
UNION |
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%" |
UNION |
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%" |
UNION |
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%" |
LIMIT 1
|
It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is
could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?
for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order), or extend the first UNION clausule with UNION NO_ORDER, or something like it, i'm considering here two problems... in this example the first query is the faster... and return the right data, if i change the first with the last, maybe i could get the wrong result, in this case the (3.2) optimization couldn't be used, but think about a server that the optimizer stay on and we forget to set it off (like query cache = on and we don't use SQL_NO_CACHE) an NO_ORDER or something like it could be nice to ensure that union will be executed in the right order even with optimizer option turned on just an idea... maybe we could use other better idea, for example of SQL_CACHE, only the first SELECT allow the SQL_CACHE others SELECT don't allow it, something like it is nice
(select sql_cache sql_no_union_order * form blabal union select * from blabla2 ....)
with this we could mix in a single complex query both union order and non union order
select (sql_union order) , (sql_no_union_order) from (any other query)
the explain of this query today is (count( * )=1.826.326 rows):
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | execution time without flush cache (seconds) |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | 1 | Using where; Using index | 0.031 <- fast | |
2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | 628434 | Using where; Using index | 6.755 <- slow (maybe a outlier result) | |
3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | 628434 | Using where; Using index | 0.671 <- relative fast | |
4 | UNION | mov_documentos | index | PRIMARY | 77 | 1826326 | Using where; Using index | 1.467 <- slow | ||
5 | UNION | mov_documentos | index | PRIMARY | 77 | 1826326 | Using where; Using index | 1.373 <- slow | ||
6 | UNION | mov_documentos | index | PRIMARY | 77 | 1826326 | Using where; Using index | 1.528 <- slow | ||
UNION RESULT | <union1,2,3,4,5,6> or <union optimized 1,4,2,5,3,6> | ALL | (Union limit optimized) | Total query time: 5.258 |
if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"
in this case we have a optimization from 5.258 seconds to 0.031 seconds ~= 170x faster
well that's all =]