[MDEV-11054]  Sort_priority_queue_sorts optimization isn't applied in some circumstances Created: 2016-10-14  Updated: 2017-07-03  Resolved: 2017-06-28

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.18
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Nathan Stretch Assignee: Sergei Petrunia
Resolution: Incomplete Votes: 0
Labels: order-by-optimization
Environment:

Tested on CentOS 7 and Linux Mint 17.3.



 Description   

Basically I have a large MyISAM table of vehicle listings with columns like make, model, year, price, etc. Approximately 1M rows. We process a variety of user queries on these tables, and they are effectively read-only, so we've created a relatively large number of indexes, to match the various possible queries. (Full create table statement below.)

Most typical queries involve only range filters in the WHERE section - no constants. In these cases, we have found that queries do not use the Sort_priority_queue_sorts optimization by default, and are often relatively slow (several seconds for typical queries). Here is one simplified example that still shows the behaviour:

SELECT l.source_id, l.external_id, l.dealer_id FROM listings_cm l  WHERE l.lat < 40.2207221168 AND l.lat > 32.9792778832   AND l.lon < -118.621350095 AND l.lon > -125.138649905  AND l.model_slug LIKE 'grandcherokee%' AND l.year>=1999 AND l.year<=2003 ORDER BY l.price desc LIMIT 0,101;

Here is the explain statement:

+------+-------------+-------+-------+----------------------------------------------------------------------+-----------------+---------+------+-------+----------------------------------------------------+
| id   | select_type | table | type  | possible_keys                                                        | key             | key_len | ref  | rows  | Extra                                              |
+------+-------------+-------+-------+----------------------------------------------------------------------+-----------------+---------+------+-------+----------------------------------------------------+
|    1 | SIMPLE      | l     | range | make_lon,model_lon,make_year_lon,make_year_nhp_price,*snip*          | model_date_year | 303     | NULL | 13553 | Using index condition; Using where; Using filesort |
+------+-------------+-------+-------+----------------------------------------------------------------------+-----------------+---------+------+-------+----------------------------------------------------+

By checking the Sort_priority_queue_sorts status variable, I've confirmed that the filesort with small limit optimization is not being used. However, if I change the ORDER clause to this, the optimization IS used:

ORDER BY 1, l.price desc

And the query speeds up by orders of magnitude. Also, if I change one of the range conditions to a constant, ie by searching for l.model_slug LIKE 'grandcherokee' instead of l.model_slug LIKE 'grandcherokee%', the query again uses the optimization and again is fast. The explain statement is identical in all cases.

Finally, if I remove the LIMIT portion of the query, it becomes very fast again (although obviously the optimization is not used in that case because there is no limit). This test query only returns 9 results either way, so if the results are indeed being sorted after they're filtered, it should not take any significant time. The explain is still unchanged.

If the explain statement didn't explicitly say the model_date_year index were being used, I would guess that it was choosing the price index due to the ORDER clause, and then scanning the majority of the index to find the 9 matching rows. As it is, I'm not sure exactly what's happening, because if it is indeed using the index to get those 9 rows first and then sorting, it really shouldn't matter how it sorts them as far as speed.

So it seems like the explain may not match the query, and at the very least like the query optimizer is not choosing the optimal approach in this situation.

For now, the workaround of adding a constant before the ORDER column works for me, for the most part, although it does prevent the optimizer from choosing an index for ORDERing when that would indeed be the most efficient option.

Here is the create table:

CREATE TABLE `listings_cm` (
  `source_id` int(11) unsigned NOT NULL,
  `external_id` varchar(100) NOT NULL DEFAULT '',
  `is_new` tinyint(1) NOT NULL DEFAULT '0',
  `vin` varchar(50) DEFAULT NULL,
  `year` int(4) unsigned NOT NULL,
  `make` varchar(50) NOT NULL DEFAULT '',
  `model` varchar(100) NOT NULL DEFAULT '',
  `trim` varchar(100) DEFAULT NULL,
  `body` varchar(50) DEFAULT NULL,
  `doors` int(1) DEFAULT NULL,
  `drive` varchar(50) NOT NULL DEFAULT 'fwd',
  `engine` varchar(50) DEFAULT NULL,
  `displacement` float DEFAULT NULL,
  `fuel` varchar(1) DEFAULT '',
  `transmission` varchar(2) DEFAULT NULL,
  `transmission_speed` tinyint(1) DEFAULT NULL,
  `category` varchar(50) DEFAULT NULL,
  `mpg_city` smallint(4) DEFAULT NULL,
  `mpg_hwy` smallint(4) DEFAULT NULL,
  `price` int(11) DEFAULT NULL,
  `has_price` tinyint(1) DEFAULT NULL,
  `not_has_price` tinyint(1) DEFAULT NULL,
  `mileage` int(11) DEFAULT NULL,
  `exterior` varchar(50) DEFAULT NULL,
  `interior` varchar(50) DEFAULT NULL,
  `options` text,
  `url` varchar(500) DEFAULT NULL,
  `images` text,
  `details` text,
  `cpo` tinyint(1) DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `dealer_id` int(11) DEFAULT NULL,
  `dealer_radius` int(5) DEFAULT '10000',
  `make_slug` varchar(100) DEFAULT NULL,
  `model_slug` varchar(100) DEFAULT NULL,
  `lat` float DEFAULT NULL,
  `lon` float DEFAULT NULL,
  PRIMARY KEY (`external_id`),
  KEY `dealer` (`dealer_id`),
  KEY `make_lon` (`make_slug`,`lon`),
  KEY `model_lon` (`model_slug`,`lon`),
  KEY `make_year_lon` (`make_slug`,`year`,`lon`),
  KEY `make_year_nhp_price` (`make_slug`,`year`,`not_has_price`,`price`),
  KEY `model_year_lon` (`model_slug`,`year`,`lon`),
  KEY `model_year_nhp_price` (`model_slug`,`year`,`not_has_price`,`price`),
  KEY `year_lon` (`year`,`lon`),
  KEY `year_nhp_price` (`year`,`not_has_price`,`price`),
  KEY `lon` (`lon`),
  KEY `make_price` (`make_slug`,`price`),
  KEY `make_nhp_price` (`make_slug`,`not_has_price`,`price`),
  KEY `make_date_year` (`make_slug`,`date`,`year`),
  KEY `model_price` (`model_slug`,`price`),
  KEY `model_nhp_price` (`model_slug`,`not_has_price`,`price`),
  KEY `model_date_year` (`model_slug`,`date`,`year`),
  KEY `make_miles` (`make_slug`,`mileage`),
  KEY `model_miles` (`model_slug`,`mileage`),
  KEY `miles` (`mileage`),
  KEY `price_lon` (`price`,`lon`),
  KEY `nhp_price_lon` (`not_has_price`,`price`,`lon`),
  KEY `date_year` (`date`,`year`),
  FULLTEXT KEY `keyword_index` (`make`,`model`,`trim`,`engine`,`drive`,`options`,`details`,`body`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;



 Comments   
Comment by Sergei Petrunia [ 2017-06-28 ]

Formatted the query:

SELECT 
  l.source_id, 
  l.external_id, 
  l.dealer_id 
FROM 
  listings_cm l  
WHERE 
  l.lat < 40.2207221168 AND l.lat > 32.9792778832   AND 
  l.lon < -118.621350095 AND l.lon > -125.138649905  AND 
  l.model_slug LIKE 'grandcherokee%' AND 
  l.year>=1999 AND l.year<=2003 
ORDER BY 
  l.price desc 
LIMIT 0,101;

Comment by Sergei Petrunia [ 2017-06-28 ]

The EXPLAIN has index=model_date_year, key_len=303

  KEY `model_date_year` (`model_slug`,`date`,`year`),

model_slug is VARCHAR(100).

303 = 100*3 + 2 bytes for length + null_byte. Makes sense so far.

Explain shows "Using filesort". It's not clear to me what prevents the use of Priority Queue optimization.
Could it be memory size? LIMIT 101 -> 100 rows. Default @@sort_buffer_size is 2M, which means each row must take > 20K in order for PQ optimization to be disabled due to lack of memory.

Comment by Sergei Petrunia [ 2017-06-28 ]

Also, if I change one of the range conditions to a constant, ie by searching for l.model_slug LIKE 'grandcherokee' instead of l.model_slug LIKE 'grandcherokee%', the query again uses the optimization and again is fast

I think this is because
l.model_slug LIKE 'grandcherokee' is the same as l.model_slug='grandcherokee' and this allows the optimizer to use the second component of the index and narrow down the search significantly. However, nstretch says that

The explain statement is identical in all cases.

which is odd, I would expect rows to get smaller and key_len to get bigger.

Comment by Sergei Petrunia [ 2017-06-28 ]

However, if I change the ORDER clause to this, the optimization IS used:
ORDER BY 1, l.price desc

"1" in the ORDER BY list translates to "l.source_id". Since one column uses ASC and another one DESC, there is no potential to use any index to resolve the ORDER BY.

In the original query, ORDER BY l.price desc has a potentially usable index:

  KEY `price_lon` (`price`,`lon`),

This should not affect the priority queue optimization, but perhaps it does it somehow.

Comment by Sergei Petrunia [ 2017-06-28 ]

So it seems like the explain may not match the query, and at the very least like the query optimizer is not choosing the optimal approach in this situation.

There are a number of known issues like this in MariaDB 10.1 (and earlier MySQL/MariaDB versions), for example, MDEV-8306, MDEV-326. They all were fixed in MariaDB 10.2.

Comment by Sergei Petrunia [ 2017-06-28 ]

I think the only thing one can do with the available info is to point to MariaDB 10.2.

Feel free to re-open if the issue still happens on 10.2, or if the dataset is available.

Comment by Nathan Stretch [ 2017-06-28 ]

> "1" in the ORDER BY list translates to "l.source_id". Since one column uses ASC and another one DESC, there is no potential to use any index to resolve the ORDER BY.

Yes, I realized after posting this that the 1 referred to that column, not a literal 1. The effect is the same though, as the source_id is constant in this table: it will prevent it from choosing the price_lon index. So it appears that two things are happening: 1) it's non-optimally choosing the price index for this query instead of the model_date_year index, and 2) it's incorrectly reporting that the model_date_year index was used in the explain.

I can provide a test table, but not publicly. Would it help if I emailed (a link to) it to you?

Comment by Daniel Black [ 2017-06-29 ]

Or https://mariadb.com/kb/en/meta/ftp/

Comment by Sergei Petrunia [ 2017-07-03 ]

nstretch, yes it would help. Both emailing the link (sergey@mariadb.com) or uploading to the "private" folder on FTP (as danblack mentioned) will work.
The file will not be made public.

Generated at Thu Feb 08 07:46:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.