[MDEV-11054] Sort_priority_queue_sorts optimization isn't applied in some circumstances Created: 2016-10-14 Updated: 2017-07-03 Resolved: 2017-06-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.1.18 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Nathan Stretch | Assignee: | Sergei Petrunia |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | order-by-optimization | ||
| Environment: |
Tested on CentOS 7 and Linux Mint 17.3. |
||
| Description |
|
Basically I have a large MyISAM table of vehicle listings with columns like make, model, year, price, etc. Approximately 1M rows. We process a variety of user queries on these tables, and they are effectively read-only, so we've created a relatively large number of indexes, to match the various possible queries. (Full create table statement below.) Most typical queries involve only range filters in the WHERE section - no constants. In these cases, we have found that queries do not use the Sort_priority_queue_sorts optimization by default, and are often relatively slow (several seconds for typical queries). Here is one simplified example that still shows the behaviour:
Here is the explain statement:
By checking the Sort_priority_queue_sorts status variable, I've confirmed that the filesort with small limit optimization is not being used. However, if I change the ORDER clause to this, the optimization IS used:
And the query speeds up by orders of magnitude. Also, if I change one of the range conditions to a constant, ie by searching for l.model_slug LIKE 'grandcherokee' instead of l.model_slug LIKE 'grandcherokee%', the query again uses the optimization and again is fast. The explain statement is identical in all cases. Finally, if I remove the LIMIT portion of the query, it becomes very fast again (although obviously the optimization is not used in that case because there is no limit). This test query only returns 9 results either way, so if the results are indeed being sorted after they're filtered, it should not take any significant time. The explain is still unchanged. If the explain statement didn't explicitly say the model_date_year index were being used, I would guess that it was choosing the price index due to the ORDER clause, and then scanning the majority of the index to find the 9 matching rows. As it is, I'm not sure exactly what's happening, because if it is indeed using the index to get those 9 rows first and then sorting, it really shouldn't matter how it sorts them as far as speed. So it seems like the explain may not match the query, and at the very least like the query optimizer is not choosing the optimal approach in this situation. For now, the workaround of adding a constant before the ORDER column works for me, for the most part, although it does prevent the optimizer from choosing an index for ORDERing when that would indeed be the most efficient option. Here is the create table:
|
| Comments |
| Comment by Sergei Petrunia [ 2017-06-28 ] | ||||||||||||||
|
Formatted the query:
| ||||||||||||||
| Comment by Sergei Petrunia [ 2017-06-28 ] | ||||||||||||||
|
The EXPLAIN has index=model_date_year, key_len=303
model_slug is VARCHAR(100). 303 = 100*3 + 2 bytes for length + null_byte. Makes sense so far. Explain shows "Using filesort". It's not clear to me what prevents the use of Priority Queue optimization. | ||||||||||||||
| Comment by Sergei Petrunia [ 2017-06-28 ] | ||||||||||||||
I think this is because
which is odd, I would expect rows to get smaller and key_len to get bigger. | ||||||||||||||
| Comment by Sergei Petrunia [ 2017-06-28 ] | ||||||||||||||
"1" in the ORDER BY list translates to "l.source_id". Since one column uses ASC and another one DESC, there is no potential to use any index to resolve the ORDER BY. In the original query, ORDER BY l.price desc has a potentially usable index:
This should not affect the priority queue optimization, but perhaps it does it somehow. | ||||||||||||||
| Comment by Sergei Petrunia [ 2017-06-28 ] | ||||||||||||||
There are a number of known issues like this in MariaDB 10.1 (and earlier MySQL/MariaDB versions), for example, MDEV-8306, | ||||||||||||||
| Comment by Sergei Petrunia [ 2017-06-28 ] | ||||||||||||||
|
I think the only thing one can do with the available info is to point to MariaDB 10.2. Feel free to re-open if the issue still happens on 10.2, or if the dataset is available. | ||||||||||||||
| Comment by Nathan Stretch [ 2017-06-28 ] | ||||||||||||||
|
> "1" in the ORDER BY list translates to "l.source_id". Since one column uses ASC and another one DESC, there is no potential to use any index to resolve the ORDER BY. Yes, I realized after posting this that the 1 referred to that column, not a literal 1. The effect is the same though, as the source_id is constant in this table: it will prevent it from choosing the price_lon index. So it appears that two things are happening: 1) it's non-optimally choosing the price index for this query instead of the model_date_year index, and 2) it's incorrectly reporting that the model_date_year index was used in the explain. I can provide a test table, but not publicly. Would it help if I emailed (a link to) it to you? | ||||||||||||||
| Comment by Daniel Black [ 2017-06-29 ] | ||||||||||||||
| Comment by Sergei Petrunia [ 2017-07-03 ] | ||||||||||||||
|
nstretch, yes it would help. Both emailing the link (sergey@mariadb.com) or uploading to the "private" folder on FTP (as danblack mentioned) will work. |