Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24439

Major optimizer regression with ORDER BY PK + short LIMIT

    XMLWordPrintable

    Details

      Description

      I am creating a database of Wikipedia images metadata to handle backups on my database of preference (MariaDB). The structure is a relatively common one "a single large table" plus some of its properties normalized to external, small tables. This is the structure of the relevant tables:

      files.sql

      db1133.eqiad.wmnet[mediabackups]> show create table files\G
      *************************** 1. row ***************************
             Table: files
      Create Table: CREATE TABLE `files` (
        `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
        `wiki` int(10) unsigned NOT NULL,
        `upload_name` varbinary(255) DEFAULT NULL,
        `swift_container` int(10) unsigned DEFAULT NULL,
        `swift_name` varbinary(270) DEFAULT NULL,
        `file_type` tinyint(3) unsigned DEFAULT NULL,
        `status` tinyint(3) unsigned DEFAULT NULL,
        `sha1` varbinary(40) DEFAULT NULL,
        `md5` varbinary(32) DEFAULT NULL,
        `size` int(10) unsigned DEFAULT NULL,
        `upload_timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
        `archived_timestamp` timestamp NULL DEFAULT NULL,
        `deleted_timestamp` timestamp NULL DEFAULT NULL,
        `backup_status` tinyint(3) unsigned DEFAULT 1,
        PRIMARY KEY (`id`),
        KEY `sha1` (`sha1`),
        KEY `file_type` (`file_type`),
        KEY `status` (`status`),
        KEY `backup_status` (`backup_status`),
        KEY `wiki` (`wiki`),
        KEY `swift_container` (`swift_container`),
        KEY `upload_name` (`upload_name`,`status`),
        KEY `upload_timestamp` (`upload_timestamp`),
        CONSTRAINT `files_ibfk_1` FOREIGN KEY (`file_type`) REFERENCES `file_types` (`id`),
        CONSTRAINT `files_ibfk_2` FOREIGN KEY (`status`) REFERENCES `file_status` (`id`),
        CONSTRAINT `files_ibfk_3` FOREIGN KEY (`wiki`) REFERENCES `wikis` (`id`),
        CONSTRAINT `files_ibfk_4` FOREIGN KEY (`backup_status`) REFERENCES `backup_status` (`id`),
        CONSTRAINT `files_ibfk_5` FOREIGN KEY (`swift_container`) REFERENCES `swift_containers` (`id`)
      ) ENGINE=InnoDB AUTO_INCREMENT=4549852 DEFAULT CHARSET=binary
      1 row in set (0.001 sec)
      

      file_status.sql

      db1133.eqiad.wmnet[mediabackups]> show create table file_status\G
      *************************** 1. row ***************************
             Table: file_status
      Create Table: CREATE TABLE `file_status` (
        `id` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
        `status_name` varbinary(100) DEFAULT NULL,
        PRIMARY KEY (`id`)
      ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=binary
      1 row in set (0.001 sec)
       
      db1133.eqiad.wmnet[mediabackups]> select * FROM file_status;
      +----+--------------+
      | id | status_name  |
      +----+--------------+
      |  1 | public       |
      |  2 | archived     |
      |  3 | deleted      |
      |  4 | hard-deleted |
      +----+--------------+
      4 rows in set (0.001 sec)
      

      backup_status.sql

      db1133.eqiad.wmnet[mediabackups]> show create table backup_status\G
      *************************** 1. row ***************************
             Table: backup_status
      Create Table: CREATE TABLE `backup_status` (
        `id` tinyint(3) unsigned NOT NULL,
        `backup_status_name` varbinary(100) DEFAULT NULL,
        PRIMARY KEY (`id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=binary
      1 row in set (0.000 sec)
       
      db1133.eqiad.wmnet[mediabackups]> select * FROM backup_status;
      +----+--------------------+
      | id | backup_status_name |
      +----+--------------------+
      |  1 | pending            |
      |  2 | processing         |
      |  3 | backedup           |
      |  4 | error              |
      +----+--------------------+
      4 rows in set (0.001 sec)
      

      wikis.sql

      db1133.eqiad.wmnet[mediabackups]> show create table wikis\G
      *************************** 1. row ***************************
             Table: wikis
      Create Table: CREATE TABLE `wikis` (
        `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
        `wiki_name` varbinary(255) DEFAULT NULL,
        `type` tinyint(3) unsigned DEFAULT NULL,
        PRIMARY KEY (`id`),
        UNIQUE KEY `wiki_name` (`wiki_name`)
      ) ENGINE=InnoDB AUTO_INCREMENT=1206 DEFAULT CHARSET=binary
      1 row in set (0.001 sec)
       
      db1133.eqiad.wmnet[mediabackups]> select count(*) FROM wikis;
      +----------+
      | count(*) |
      +----------+
      |      999 |
      +----------+
      1 row in set (0.001 sec)
      

      When I try to run a query with an ORDER BY, followed by a short limit, using the PRIMARY KEY, it preferes to scan the whole table and perform a filesort rather than using the index efectively to scan just ~10 rows:

      query.sql

      -- simple query, just denormalize the data so it is in "human" readable form
      db1133.eqiad.wmnet[mediabackups]> SELECT wiki_name, upload_name, size, status_name, sha1, backup_status_name
      FROM files JOIN file_status ON file_status.id = files.status
      JOIN backup_status ON backup_status.id = files.backup_status
      JOIN wikis ON wikis.id = files.wiki
      ORDER BY files.id
      LIMIT 10;
      	
      <---- waiting for minutes...
       
      ^CCtrl-C -- query killed. Continuing normally.
      ERROR 1317 (70100): Query execution was interrupted
      	
      db1133.eqiad.wmnet[mediabackups]> EXPLAIN select wiki_name, upload_name, size, status_name, sha1, backup_status_name FROM files JOIN file_status ON file_status.id = files.status JOIN backup_status ON backup_status.id = files.backup_status JOIN wikis ON wikis.id = files.wiki ORDER BY files.id LIMIT 10\G
      	*************************** 1. row ***************************
      	           id: 1
      	  select_type: SIMPLE
      	        table: files
      	         type: ALL                              <--------------------------------------------------- ?????
      	possible_keys: status,backup_status,wiki
      	          key: NULL
      	      key_len: NULL
      	          ref: NULL
      	         rows: 4549851
      	        Extra: Using where; Using temporary; Using filesort <------------------------- ??????
      	*************************** 2. row ***************************
      	           id: 1
      	  select_type: SIMPLE
      	        table: wikis
      	         type: eq_ref
      	possible_keys: PRIMARY
      	          key: PRIMARY
      	      key_len: 4
      	          ref: mediabackups.files.wiki
      	         rows: 1
      	        Extra: 
      	*************************** 3. row ***************************
      	           id: 1
      	  select_type: SIMPLE
      	        table: backup_status
      	         type: ALL
      	possible_keys: PRIMARY
      	          key: NULL
      	      key_len: NULL
      	          ref: NULL
      	         rows: 4
      	        Extra: Using where; Using join buffer (flat, BNL join)
      	*************************** 4. row ***************************
      	           id: 1
      	  select_type: SIMPLE
      	        table: file_status
      	         type: eq_ref
      	possible_keys: PRIMARY
      	          key: PRIMARY
      	      key_len: 1
      	          ref: mediabackups.files.status
      	         rows: 1
      	        Extra: 
      	4 rows in set (0.001 sec)
      

      This is a major regression because being able to select ranges in order is a top feature of a MySQL-type db, and has to be fast. Most of our web pages are just "list of items in order". I can see an issue with a secondary index among many, but refusing to use the primary key?

      The table was analyzed and rebuilt, to the same results. Optimizer trace and more information is available at our web tracker: https://phabricator.wikimedia.org/P13595#74956

      To check it was not something I was doing wrong, I checked the same query, after importing a copy of the database on Percona Server 8, and I got this:

      percona_query.sql

      db2102.codfw.wmnet[mediabackups]> explain select wiki_name, upload_name, size, status_name, sha1, backup_status_name FROM files JOIN file_status ON file_status.id = files.status JOIN backup_status ON backup_status.id = files.backup_status JOIN wikis ON wikis.id = files.wiki ORDER BY files.id LIMIT 10\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: files
         partitions: NULL
               type: index
      possible_keys: status,backup_status,wiki
                key: PRIMARY <------------------------------------- preferred index
            key_len: 4
                ref: NULL
               rows: 10
           filtered: 100.00
              Extra: Using where <------------------------------------ no filesort
      *************************** 2. row ***************************
                 id: 1
        select_type: SIMPLE
              table: file_status
         partitions: NULL
               type: eq_ref
      possible_keys: PRIMARY
                key: PRIMARY
            key_len: 1
                ref: mediabackups.files.status
               rows: 1
           filtered: 100.00
              Extra: NULL
      *************************** 3. row ***************************
                 id: 1
        select_type: SIMPLE
              table: backup_status
         partitions: NULL
               type: eq_ref
      possible_keys: PRIMARY
                key: PRIMARY
            key_len: 1
                ref: mediabackups.files.backup_status
               rows: 1
           filtered: 100.00
              Extra: NULL
      *************************** 4. row ***************************
                 id: 1
        select_type: SIMPLE
              table: wikis
         partitions: NULL
               type: eq_ref
      possible_keys: PRIMARY
                key: PRIMARY
            key_len: 4
                ref: mediabackups.files.wiki
               rows: 1
           filtered: 100.00
              Extra: NULL
      4 rows in set, 1 warning (0.035 sec)
       
      db2102.codfw.wmnet[mediabackups]> select wiki_name, upload_name, size, status_name, sha1, backup_status_name FROM files JOIN file_status ON file_status.id = files.status JOIN backup_status ON backup_status.id = files.backup_status JOIN wikis ON wikis.id = files.wiki ORDER BY files.id LIMIT 10;
      +-----------+---------------------------------------------------------+--------+-------------+------------------------------------------+--------------------+
      | wiki_name | upload_name                                             | size   | status_name | sha1                                     | backup_status_name |
      +-----------+---------------------------------------------------------+--------+-------------+------------------------------------------+--------------------+
      | enwiki    | !!!_(Chk_Chk_Chk)_-_One_Girl_One_Boy_cover_art.jpg      |  31850 | public      | 2c5f4c5ff0e57ffcea85c1da92b4599336d75fb9 | backedup           |
      | enwiki    | !!!_-_!!!_album_cover.jpg                               |  43672 | public      | 25c046a856d14314cda3c741539f41fbc6c63fe2 | backedup           |
      | enwiki    | !!!_-_Wallop.png                                        | 118745 | public      | 3fccf40bfd27ccfac1d057198fd4315afd50b42d | backedup           |
      | enwiki    | !0_Trombones_Like_2_Pianos.jpg                          |  25319 | public      | 3ec8b40e128ad8c677b869ef9280104c03bff2e5 | backedup           |
      | enwiki    | !ClaudiaPascoal.png                                     |  26203 | public      | 66a8f68f1ac4708aaa600dcc5a5690836868a2c9 | backedup           |
      | enwiki    | !Haunu.ogg                                              |  13450 | public      | 77577544869768e619829c4e9e6e0ddde94f9421 | backedup           |
      | enwiki    | !Hero_(album).jpg                                       |  38664 | public      | c42574f1639e0bd96749f0ac3e76b69573ce2ba3 | backedup           |
      | enwiki    | !Women_Art_Revolution_(documentary_film)_poster_art.jpg |  15446 | public      | fd33557daf97103c72401e7a33723360cd0db9f4 | backedup           |
      | enwiki    | !_(The_Song_Formely_Known_As)_by_Regurgitator.png       | 146525 | public      | c491daf7642bfa1d192b22bc83cc41c721d1092e | backedup           |
      | enwiki    | ""Motor-Cycle"_LP_cover-Lotti_Golden.jpg                |  17001 | public      | dbc8dcac96e647d0dfcba702c88e86a0935e2d88 | backedup           |
      +-----------+---------------------------------------------------------+--------+-------------+------------------------------------------+--------------------+
      10 rows in set (0.035 sec)
      

      ~10 rows read, using the index to avoid the sort + short circuit the scan.

      I could not make the query on MariaDB behave as expected with index hints. The only way I could do it is by doing:

      workaround-not-really.sql

      db1133.eqiad.wmnet[mediabackups]> select wiki_name, upload_name, size, status_name, sha1, backup_status_name FROM files JOIN file_status ON file_status.id = files.status JOIN backup_status ON backup_status.id = files.backup_status JOIN wikis ON wikis.id = files.wiki WHERE files.id BETWEEN 1 AND 10; -- note the only change at the end
      	+-----------+---------------------------------------------------------+--------+-----------
      	| wiki_name | upload_name                                             | size   | status_nam
      	+-----------+---------------------------------------------------------+--------+-----------
      	| enwiki    | !!!_(Chk_Chk_Chk)_-_One_Girl_One_Boy_cover_art.jpg      |  31850 | public    
      	| enwiki    | !!!_-_!!!_album_cover.jpg                               |  43672 | public    
      	| enwiki    | !!!_-_Wallop.png                                        | 118745 | public    
      	| enwiki    | !0_Trombones_Like_2_Pianos.jpg                          |  25319 | public    
      	| enwiki    | !ClaudiaPascoal.png                                     |  26203 | public    
      	| enwiki    | !Haunu.ogg                                              |  13450 | public    
      	| enwiki    | !Hero_(album).jpg                                       |  38664 | public    
      	| enwiki    | !Women_Art_Revolution_(documentary_film)_poster_art.jpg |  15446 | public    
      	| enwiki    | !_(The_Song_Formely_Known_As)_by_Regurgitator.png       | 146525 | public    
      	| enwiki    | ""Motor-Cycle"_LP_cover-Lotti_Golden.jpg                |  17001 | public    
      	+-----------+---------------------------------------------------------+--------+-----------
      	10 rows in set (0.001 sec)
      

      But the between usage is not really practical in a general case (gaps on PKs).

      I can provide a partial dump of the table (the public part of it), if needed.

        Attachments

          Activity

            People

            Assignee:
            psergey Sergei Petrunia
            Reporter:
            jcrespo Jaime Crespo
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated: