Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11054

Sort_priority_queue_sorts optimization isn't applied in some circumstances

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Incomplete
    • 10.1.18
    • N/A
    • Optimizer
    • Tested on CentOS 7 and Linux Mint 17.3.

    Description

      Basically I have a large MyISAM table of vehicle listings with columns like make, model, year, price, etc. Approximately 1M rows. We process a variety of user queries on these tables, and they are effectively read-only, so we've created a relatively large number of indexes, to match the various possible queries. (Full create table statement below.)

      Most typical queries involve only range filters in the WHERE section - no constants. In these cases, we have found that queries do not use the Sort_priority_queue_sorts optimization by default, and are often relatively slow (several seconds for typical queries). Here is one simplified example that still shows the behaviour:

      SELECT l.source_id, l.external_id, l.dealer_id FROM listings_cm l  WHERE l.lat < 40.2207221168 AND l.lat > 32.9792778832   AND l.lon < -118.621350095 AND l.lon > -125.138649905  AND l.model_slug LIKE 'grandcherokee%' AND l.year>=1999 AND l.year<=2003 ORDER BY l.price desc LIMIT 0,101;
      

      Here is the explain statement:

      +------+-------------+-------+-------+----------------------------------------------------------------------+-----------------+---------+------+-------+----------------------------------------------------+
      | id   | select_type | table | type  | possible_keys                                                        | key             | key_len | ref  | rows  | Extra                                              |
      +------+-------------+-------+-------+----------------------------------------------------------------------+-----------------+---------+------+-------+----------------------------------------------------+
      |    1 | SIMPLE      | l     | range | make_lon,model_lon,make_year_lon,make_year_nhp_price,*snip*          | model_date_year | 303     | NULL | 13553 | Using index condition; Using where; Using filesort |
      +------+-------------+-------+-------+----------------------------------------------------------------------+-----------------+---------+------+-------+----------------------------------------------------+
      

      By checking the Sort_priority_queue_sorts status variable, I've confirmed that the filesort with small limit optimization is not being used. However, if I change the ORDER clause to this, the optimization IS used:

      ORDER BY 1, l.price desc
      

      And the query speeds up by orders of magnitude. Also, if I change one of the range conditions to a constant, ie by searching for l.model_slug LIKE 'grandcherokee' instead of l.model_slug LIKE 'grandcherokee%', the query again uses the optimization and again is fast. The explain statement is identical in all cases.

      Finally, if I remove the LIMIT portion of the query, it becomes very fast again (although obviously the optimization is not used in that case because there is no limit). This test query only returns 9 results either way, so if the results are indeed being sorted after they're filtered, it should not take any significant time. The explain is still unchanged.

      If the explain statement didn't explicitly say the model_date_year index were being used, I would guess that it was choosing the price index due to the ORDER clause, and then scanning the majority of the index to find the 9 matching rows. As it is, I'm not sure exactly what's happening, because if it is indeed using the index to get those 9 rows first and then sorting, it really shouldn't matter how it sorts them as far as speed.

      So it seems like the explain may not match the query, and at the very least like the query optimizer is not choosing the optimal approach in this situation.

      For now, the workaround of adding a constant before the ORDER column works for me, for the most part, although it does prevent the optimizer from choosing an index for ORDERing when that would indeed be the most efficient option.

      Here is the create table:

      CREATE TABLE `listings_cm` (
        `source_id` int(11) unsigned NOT NULL,
        `external_id` varchar(100) NOT NULL DEFAULT '',
        `is_new` tinyint(1) NOT NULL DEFAULT '0',
        `vin` varchar(50) DEFAULT NULL,
        `year` int(4) unsigned NOT NULL,
        `make` varchar(50) NOT NULL DEFAULT '',
        `model` varchar(100) NOT NULL DEFAULT '',
        `trim` varchar(100) DEFAULT NULL,
        `body` varchar(50) DEFAULT NULL,
        `doors` int(1) DEFAULT NULL,
        `drive` varchar(50) NOT NULL DEFAULT 'fwd',
        `engine` varchar(50) DEFAULT NULL,
        `displacement` float DEFAULT NULL,
        `fuel` varchar(1) DEFAULT '',
        `transmission` varchar(2) DEFAULT NULL,
        `transmission_speed` tinyint(1) DEFAULT NULL,
        `category` varchar(50) DEFAULT NULL,
        `mpg_city` smallint(4) DEFAULT NULL,
        `mpg_hwy` smallint(4) DEFAULT NULL,
        `price` int(11) DEFAULT NULL,
        `has_price` tinyint(1) DEFAULT NULL,
        `not_has_price` tinyint(1) DEFAULT NULL,
        `mileage` int(11) DEFAULT NULL,
        `exterior` varchar(50) DEFAULT NULL,
        `interior` varchar(50) DEFAULT NULL,
        `options` text,
        `url` varchar(500) DEFAULT NULL,
        `images` text,
        `details` text,
        `cpo` tinyint(1) DEFAULT NULL,
        `date` datetime DEFAULT NULL,
        `dealer_id` int(11) DEFAULT NULL,
        `dealer_radius` int(5) DEFAULT '10000',
        `make_slug` varchar(100) DEFAULT NULL,
        `model_slug` varchar(100) DEFAULT NULL,
        `lat` float DEFAULT NULL,
        `lon` float DEFAULT NULL,
        PRIMARY KEY (`external_id`),
        KEY `dealer` (`dealer_id`),
        KEY `make_lon` (`make_slug`,`lon`),
        KEY `model_lon` (`model_slug`,`lon`),
        KEY `make_year_lon` (`make_slug`,`year`,`lon`),
        KEY `make_year_nhp_price` (`make_slug`,`year`,`not_has_price`,`price`),
        KEY `model_year_lon` (`model_slug`,`year`,`lon`),
        KEY `model_year_nhp_price` (`model_slug`,`year`,`not_has_price`,`price`),
        KEY `year_lon` (`year`,`lon`),
        KEY `year_nhp_price` (`year`,`not_has_price`,`price`),
        KEY `lon` (`lon`),
        KEY `make_price` (`make_slug`,`price`),
        KEY `make_nhp_price` (`make_slug`,`not_has_price`,`price`),
        KEY `make_date_year` (`make_slug`,`date`,`year`),
        KEY `model_price` (`model_slug`,`price`),
        KEY `model_nhp_price` (`model_slug`,`not_has_price`,`price`),
        KEY `model_date_year` (`model_slug`,`date`,`year`),
        KEY `make_miles` (`make_slug`,`mileage`),
        KEY `model_miles` (`model_slug`,`mileage`),
        KEY `miles` (`mileage`),
        KEY `price_lon` (`price`,`lon`),
        KEY `nhp_price_lon` (`not_has_price`,`price`,`lon`),
        KEY `date_year` (`date`,`year`),
        FULLTEXT KEY `keyword_index` (`make`,`model`,`trim`,`engine`,`drive`,`options`,`details`,`body`)
      ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
      

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            nstretch Nathan Stretch
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.