[MDEV-6696] Improve UNION with LIMIT execution - Jira

roberto spadim created issue - 2014-09-04 20:02

roberto spadim made changes - 2014-09-04 20:04

Field	Original Value	New Value
Description	Hi guys, I have some queries like this: SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%" LIMIT 1 It's a search about a document number, but the problem is.... The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is could we optimize this kind of query, since limit=1 row? for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx) the explain of this query today is: id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY mov_documentos range PRIMARY PRIMARY 77 1 Using where; Using index 2 UNION mov_documentos range PRIMARY PRIMARY 77 625612 Using where; Using index 3 UNION mov_documentos range PRIMARY PRIMARY 77 625612 Using where; Using index 4 UNION mov_documentos index PRIMARY 77 1798490 Using where; Using index 5 UNION mov_documentos index PRIMARY 77 1798490 Using where; Using index 6 UNION mov_documentos index PRIMARY 77 1798490 Using where; Using index UNION RESULT <union1,2,3,4,5,6> ALL and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause) well that's all =]	Hi guys, I have some queries like this: SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%" UNION SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%" LIMIT 1 It's a search about a document number, but the problem is.... The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is could we optimize this kind of query, since limit=1 row? for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx) the explain of this query today is: \|\|id\|select_type \|table\|type\|possible_keys\|key\|key_len\|ref\|rows\|Extra\|\| \|1 \|PRIMARY \|mov_documentos \|range \|PRIMARY \|PRIMARY \|77 \|1 \|Using where; Using index\| \|2 \|UNION \|mov_documentos \|range \|PRIMARY \|PRIMARY \|77 \|625612 \|Using where; Using index\| \|3 \|UNION \|mov_documentos \|range \|PRIMARY \|PRIMARY \|77 \|625612 \|Using where; Using index\| \|4 \|UNION \|mov_documentos \|index \|PRIMARY \|77 \|1798490 \|Using where; Using index\| \|5 \|UNION \|mov_documentos \|index \|PRIMARY \|77 \|1798490 \|Using where; Using index\| \|6 \|UNION \|mov_documentos \|index \|PRIMARY \|77 \|1798490 \|Using where; Using index\| \|\|UNION \|RESULT <union1,2,3,4,5,6> ALL \| and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause) well that's all =]

roberto spadim made changes - 2014-09-04 20:05

Description

Hi guys, I have some queries like this:
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is:

||id|select_type |table|type|possible_keys|key|key_len|ref|rows|Extra||
|1 |PRIMARY |mov_documentos |range |PRIMARY |PRIMARY |77 |1 |Using where; Using index|
|2 |UNION |mov_documentos |range |PRIMARY |PRIMARY |77 |625612 |Using where; Using index|
|3 |UNION |mov_documentos |range |PRIMARY |PRIMARY |77 |625612 |Using where; Using index|
|4 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
|5 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
|6 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
||UNION |RESULT <union1,2,3,4,5,6> ALL |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

Hi guys, I have some queries like this:
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is:

||id||select_type ||table||type||possible_keys||key||key_len||ref||rows||Extra||
|1 |PRIMARY |mov_documentos |range |PRIMARY |PRIMARY |77 |1 |Using where; Using index|
|2 |UNION |mov_documentos |range |PRIMARY |PRIMARY |77 |625612 |Using where; Using index|
|3 |UNION |mov_documentos |range |PRIMARY |PRIMARY |77 |625612 |Using where; Using index|
|4 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
|5 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
|6 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
| |UNION |RESULT <union1,2,3,4,5,6> ALL |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

roberto spadim made changes - 2014-09-04 20:06

Description

Hi guys, I have some queries like this:
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is:

||id||select_type ||table||type||possible_keys||key||key_len||ref||rows||Extra||
|1 |PRIMARY |mov_documentos |range |PRIMARY |PRIMARY |77 |1 |Using where; Using index|
|2 |UNION |mov_documentos |range |PRIMARY |PRIMARY |77 |625612 |Using where; Using index|
|3 |UNION |mov_documentos |range |PRIMARY |PRIMARY |77 |625612 |Using where; Using index|
|4 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
|5 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
|6 |UNION |mov_documentos |index |PRIMARY |77 |1798490 |Using where; Using index|
| |UNION |RESULT <union1,2,3,4,5,6> ALL |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

Hi guys, I have some queries like this:
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is:

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* ||
|| 1 || PRIMARY || mov_documentos || range || PRIMARY || PRIMARY || 77 || || 1 || Using where; Using index ||
|| 2 || UNION || mov_documentos || range || PRIMARY || PRIMARY || 77 || || 628434 || Using where; Using index ||
|| 3 || UNION || mov_documentos || range || PRIMARY || PRIMARY || 77 || || 628434 || Using where; Using index ||
|| 4 || UNION || mov_documentos || index || || PRIMARY || 77 || || 1826326 || Using where; Using index ||
|| 5 || UNION || mov_documentos || index || || PRIMARY || 77 || || 1826326 || Using where; Using index ||
|| 6 || UNION || mov_documentos || index || || PRIMARY || 77 || || 1826326 || Using where; Using index ||
|| || UNION RESULT || <union1,2,3,4,5,6> || ALL || || || || || || ||

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

roberto spadim made changes - 2014-09-04 20:07

Description

Hi guys, I have some queries like this:
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is:

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* ||
|| 1 || PRIMARY || mov_documentos || range || PRIMARY || PRIMARY || 77 || || 1 || Using where; Using index ||
|| 2 || UNION || mov_documentos || range || PRIMARY || PRIMARY || 77 || || 628434 || Using where; Using index ||
|| 3 || UNION || mov_documentos || range || PRIMARY || PRIMARY || 77 || || 628434 || Using where; Using index ||
|| 4 || UNION || mov_documentos || index || || PRIMARY || 77 || || 1826326 || Using where; Using index ||
|| 5 || UNION || mov_documentos || index || || PRIMARY || 77 || || 1826326 || Using where; Using index ||
|| 6 || UNION || mov_documentos || index || || PRIMARY || 77 || || 1826326 || Using where; Using index ||
|| || UNION RESULT || <union1,2,3,4,5,6> || ALL || || || || || || ||

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

Hi guys, I have some queries like this:
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is:

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

roberto spadim made changes - 2014-09-04 20:11

Description

Hi guys, I have some queries like this:
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is:

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

Hi guys, I have some queries like this:
[code]
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

roberto spadim made changes - 2014-09-04 20:11

Description

Hi guys, I have some queries like this:
[code]
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

roberto spadim made changes - 2014-09-04 20:18

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number, but the problem is....
The first query at first line (with % at end of the query) return really fast, but executing others queries are very slow, the question is
could we optimize this kind of query, since limit=1 row?

for example, at each "UNION" select, check if the total return rows is inside LIMIT clausule, in this case we CAN'T have an order by at outer 'select' since we need a sort before LIMIT, example (select union select union select order by ... limit xxx) or ((select) union (select) order by ... limit xxx)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | |

and execution time of the first one is near or less than 1 second and others are more than 1 second (maybe less too since if the first query return the others probably will return the same row too but in a slower search algorithm since it have other WHERE clause)

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | |

well that's all =]

roberto spadim made changes - 2014-09-04 20:20

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | |

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> (if we execute a diferent union since we can optimize based in query cost, we could show <union 1,4,2,5,3,6> and at final Extra column could put <Union order optimizer>) | ALL | | | | | | | (Union order optimizer) |

well that's all =]

roberto spadim made changes - 2014-09-04 20:21

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> (if we execute a diferent union since we can optimize based in query cost, we could show <union 1,4,2,5,3,6> and at final Extra column could put <Union order optimizer>) | ALL | | | | | | | (Union order optimizer) |

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> \n (if we execute a diferent union since we can optimize based in query cost, we could show <union 1,4,2,5,3,6> and at final Extra column could put <Union order optimizer>) | ALL | | | | | | (Union order optimizer) | |

well that's all =]

roberto spadim made changes - 2014-09-04 20:21

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> \n (if we execute a diferent union since we can optimize based in query cost, we could show <union 1,4,2,5,3,6> and at final Extra column could put <Union order optimizer>) | ALL | | | | | | (Union order optimizer) | |

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

well that's all =]

roberto spadim made changes - 2014-09-04 20:21

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

well that's all =]

roberto spadim made changes - 2014-09-04 20:22

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow (maybe a outlier result) |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | Total query time: 5.258 |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

well that's all =]

Sergei Golubchik made changes - 2014-09-05 08:17

Affects Version/s		10.1.0 [ 12200 ]
Affects Version/s	10.1 [ 16100 ]

Elena Stepanova made changes - 2014-09-07 20:22

Affects Version/s	10.1.0 [ 12200 ]
Issue Type	Bug [ 1 ]	Task [ 3 ]

Elena Stepanova made changes - 2014-09-07 20:24

Fix Version/s

10.1 [ 16100 ]

Sergei Golubchik added a comment - 2014-10-09 15:26

good idea!

Sergei Golubchik added a comment - 2014-10-09 15:26 good idea!

roberto spadim made changes - 2014-10-09 18:38

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order)

the explain of this query today is (count(*)=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow (maybe a outlier result) |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | Total query time: 5.258 |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order), or extend the first UNION clausule with UNION NO_ORDER, or something like it, i'm considering here two problems... in this example the first query is the faster... and return the right data, if i change the first with the last, maybe i could get the wrong result, in this case the (3.2) optimization couldn't be used, but think about a server that the optimizer stay on and we forget to set it off (like query cache = on and we don't use SQL_NO_CACHE) an NO_ORDER or something like it could be nice to ensure that union will be executed in the right order even with optimizer option turned on just an idea... maybe we could use other better idea, for example of SQL_CACHE, only the first SELECT allow the SQL_CACHE others SELECT don't allow it, something like it is nice
(select sql_cache sql_no_union_order * form blabal union select * from blabla2 ....)
with this we could mix in a single complex query both union order and non union order
select (sql_union order) , (sql_no_union_order) from (any other query)

the explain of this query today is (count( * )=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow (maybe a outlier result) |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | Total query time: 5.258 |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

well that's all =]

roberto spadim made changes - 2014-10-09 19:41

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order), or extend the first UNION clausule with UNION NO_ORDER, or something like it, i'm considering here two problems... in this example the first query is the faster... and return the right data, if i change the first with the last, maybe i could get the wrong result, in this case the (3.2) optimization couldn't be used, but think about a server that the optimizer stay on and we forget to set it off (like query cache = on and we don't use SQL_NO_CACHE) an NO_ORDER or something like it could be nice to ensure that union will be executed in the right order even with optimizer option turned on just an idea... maybe we could use other better idea, for example of SQL_CACHE, only the first SELECT allow the SQL_CACHE others SELECT don't allow it, something like it is nice
(select sql_cache sql_no_union_order * form blabal union select * from blabla2 ....)
with this we could mix in a single complex query both union order and non union order
select (sql_union order) , (sql_no_union_order) from (any other query)

the explain of this query today is (count( * )=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow (maybe a outlier result) |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | Total query time: 5.258 |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order), or extend the first UNION clausule with UNION NO_ORDER, or something like it, i'm considering here two problems... in this example the first query is the faster... and return the right data, if i change the first with the last, maybe i could get the wrong result, in this case the (3.2) optimization couldn't be used, but think about a server that the optimizer stay on and we forget to set it off (like query cache = on and we don't use SQL_NO_CACHE) an NO_ORDER or something like it could be nice to ensure that union will be executed in the right order even with optimizer option turned on just an idea... maybe we could use other better idea, for example of SQL_CACHE, only the first SELECT allow the SQL_CACHE others SELECT don't allow it, something like it is nice
(select sql_cache sql_no_union_order * form blabal union select * from blabla2 ....)
with this we could mix in a single complex query both union order and non union order
select (sql_union order) , (sql_no_union_order) from (any other query)

the explain of this query today is (count( * )=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow (maybe a outlier result) |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | Total query time: 5.258 |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

in this case we have a optimization from 5.258 seconds to 0.031 seconds ~= 170x faster

well that's all =]

roberto spadim made changes - 2014-10-18 07:25

Description

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order), or extend the first UNION clausule with UNION NO_ORDER, or something like it, i'm considering here two problems... in this example the first query is the faster... and return the right data, if i change the first with the last, maybe i could get the wrong result, in this case the (3.2) optimization couldn't be used, but think about a server that the optimizer stay on and we forget to set it off (like query cache = on and we don't use SQL_NO_CACHE) an NO_ORDER or something like it could be nice to ensure that union will be executed in the right order even with optimizer option turned on just an idea... maybe we could use other better idea, for example of SQL_CACHE, only the first SELECT allow the SQL_CACHE others SELECT don't allow it, something like it is nice
(select sql_cache sql_no_union_order * form blabal union select * from blabla2 ....)
with this we could mix in a single complex query both union order and non union order
select (sql_union order) , (sql_no_union_order) from (any other query)

the explain of this query today is (count( * )=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow (maybe a outlier result) |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> | ALL | | | | | | (Union order optimizer) | Total query time: 5.258 |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

in this case we have a optimization from 5.258 seconds to 0.031 seconds ~= 170x faster

well that's all =]

Hi guys, I have some queries like this:
{code:sql}
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%"
UNION
SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
LIMIT 1
{code}

It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

for example, (i don't know how is the code today and if this could break a shard strategy)
1) execute the first query,
2) if this return more than LIMIT + OFFSET rows, stop all unions,
3) if not go to next query
3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order), or extend the first UNION clausule with UNION NO_ORDER, or something like it, i'm considering here two problems... in this example the first query is the faster... and return the right data, if i change the first with the last, maybe i could get the wrong result, in this case the (3.2) optimization couldn't be used, but think about a server that the optimizer stay on and we forget to set it off (like query cache = on and we don't use SQL_NO_CACHE) an NO_ORDER or something like it could be nice to ensure that union will be executed in the right order even with optimizer option turned on just an idea... maybe we could use other better idea, for example of SQL_CACHE, only the first SELECT allow the SQL_CACHE others SELECT don't allow it, something like it is nice
(select sql_cache sql_no_union_order * form blabal union select * from blabla2 ....)
with this we could mix in a single complex query both union order and non union order
select (sql_union order) , (sql_no_union_order) from (any other query)

the explain of this query today is (count( * )=1.826.326 rows):

|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* || execution time without flush cache (seconds) ||
| 1 | PRIMARY | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 1 | Using where; Using index | 0.031 <- fast |
| 2 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 6.755 <- slow (maybe a outlier result) |
| 3 | UNION | mov_documentos | range | PRIMARY | PRIMARY | 77 | | 628434 | Using where; Using index | 0.671 <- relative fast |
| 4 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.467 <- slow |
| 5 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.373 <- slow |
| 6 | UNION | mov_documentos | index | | PRIMARY | 77 | | 1826326 | Using where; Using index | 1.528 <- slow |
| | UNION RESULT | <union1,2,3,4,5,6> or <union optimized 1,4,2,5,3,6> | ALL | | | | | | (Union limit optimized) | Total query time: 5.258 |

if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

in this case we have a optimization from 5.258 seconds to 0.031 seconds ~= 170x faster

well that's all =]

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:51

Workflow

MariaDB v2 [ 53728 ]

MariaDB v3 [ 65025 ]

Sergei Golubchik made changes - 2015-10-29 22:27

Fix Version/s

10.1 [ 16100 ]

roberto spadim made changes - 2019-01-14 16:34

Priority

Minor [ 4 ]

Major [ 3 ]

Sergei Golubchik made changes - 2021-12-06 21:20

Workflow

MariaDB v3 [ 65025 ]

MariaDB v4 [ 130258 ]

Alice Sherepa made changes - 2022-12-09 12:32

Assignee

Sergei Petrunia [ psergey ]

Alice Sherepa made changes - 2022-12-09 12:34

Fix Version/s

10.13 [ 28501 ]

Julien Fritsch made changes - 2022-12-28 13:05

Fix Version/s

11.1 [ 28549 ]

Julien Fritsch made changes - 2022-12-28 13:05

Fix Version/s

10.13 [ 28501 ]

MariaDB Server

Improve UNION with LIMIT execution

Details

Description

Attachments

Activity

People

Dates

Git Integration