[MDEV-8880] Very suboptimal join order is generated for a simple query even if MariaDB query planner knows the other is better in every sense Created: 2015-10-02  Updated: 2020-01-31

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0.16, 10.0, 10.1
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Jaime Crespo Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 4
Labels: order-by-optimization, performance
Environment:

Linux with MariaDB-like packages


Issue Links:
Relates
relates to MDEV-8306 Complete cost-based optimization for ... Stalled
Sprint: 10.1.11

 Description   

This is the summary of the bug reported here: https://phabricator.wikimedia.org/T113901

This is the exact CREATE syntax of revision and page:

CREATE TABLE `revision` (
    `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
    `rev_page` int(8) unsigned NOT NULL DEFAULT '0',
    `rev_text_id` int(8) unsigned NOT NULL DEFAULT '0',
    `rev_comment` varbinary(255) DEFAULT NULL,
    `rev_user` int(5) unsigned NOT NULL DEFAULT '0',
    `rev_user_text` varbinary(255) NOT NULL DEFAULT '',
    `rev_timestamp` varbinary(14) NOT NULL DEFAULT '',
    `rev_minor_edit` tinyint(1) unsigned NOT NULL DEFAULT '0',
    `rev_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
    `rev_len` int(8) unsigned DEFAULT NULL,
    `rev_parent_id` int(8) unsigned DEFAULT NULL,
    `rev_sha1` varbinary(32) NOT NULL DEFAULT '',
    `rev_content_model` varbinary(32) DEFAULT NULL,
    `rev_content_format` varbinary(64) DEFAULT NULL,
    PRIMARY KEY (`rev_id`),
    KEY `rev_page_id` (`rev_page`,`rev_id`),
    KEY `rev_timestamp` (`rev_timestamp`,`rev_id`),
    KEY `page_timestamp` (`rev_page`,`rev_timestamp`,`rev_id`),
    KEY `user_timestamp` (`rev_user`,`rev_timestamp`,`rev_id`),
    KEY `usertext_timestamp` (`rev_user_text`,`rev_timestamp`,`rev_id`)
  ) ENGINE=InnoDB AUTO_INCREMENT=683122218 DEFAULT CHARSET=binary

CREATE TABLE `page` (
  `page_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `page_namespace` int(11) NOT NULL DEFAULT '0',
  `page_title` varbinary(255) NOT NULL DEFAULT '',
  `page_restrictions` tinyblob NOT NULL,
  `page_counter` bigint(20) unsigned NOT NULL DEFAULT '0',
  `page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `page_random` double unsigned NOT NULL DEFAULT '0',
  `page_touched` varbinary(14) NOT NULL DEFAULT '',
  `page_links_updated` varbinary(14) DEFAULT NULL,
  `page_latest` int(8) unsigned NOT NULL DEFAULT '0',
  `page_len` int(8) unsigned NOT NULL DEFAULT '0',
  `page_content_model` varbinary(32) DEFAULT NULL,
  PRIMARY KEY (`page_id`),
  UNIQUE KEY `name_title` (`page_namespace`,`page_title`),
  KEY `page_random` (`page_random`),
  KEY `page_len` (`page_len`),
  KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`)
) ENGINE=InnoDB AUTO_INCREMENT=47992700 DEFAULT CHARSET=binary

Regular explain:

Tables have approximately the numbers or rows you see on the autoinc.

mysql> EXPLAIN SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11\G
  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: page
           type: index
  possible_keys: PRIMARY
            key: name_title
        key_len: 261
            ref: NULL
           rows: 33836157
          Extra: Using index; Using temporary; Using filesort
  *************************** 2. row ***************************
             id: 1
    select_type: SIMPLE
          table: revision
           type: ref
  possible_keys: rev_page_id,page_timestamp
            key: page_timestamp
        key_len: 4
            ref: enwiki.page.page_id
           rows: 8
          Extra: Using index

Forcing the other plan:

mysql> EXPLAIN SELECT STRAIGHT_JOIN rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11\G
  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: revision
           type: index
  possible_keys: rev_page_id,page_timestamp
            key: rev_timestamp
        key_len: 20
            ref: NULL
           rows: 11
          Extra:
  *************************** 2. row ***************************
             id: 1
    select_type: SIMPLE
          table: page
           type: eq_ref
  possible_keys: PRIMARY
            key: PRIMARY
        key_len: 4
            ref: enwiki.revision.rev_page
           rows: 1
          Extra:

The actual query parsed has no issue:

select `enwiki`.`revision`.`rev_id` AS `rev_id`,`enwiki`.`revision`.`rev_timestamp` AS `rev_timestamp`,`enwiki`.`page`.`page_id`
  AS `page_id`,`enwiki`.`page`.`page_title` AS `page_title`,`enwiki`.`page`.`page_namespace` AS `page_namespace`
  from `enwiki`.`revision` join `enwiki`.`page` where (`enwiki`.`revision`.`rev_page` = `enwiki`.`page`.`page_id`)
  order by `enwiki`.`revision`.`rev_timestamp` desc,`enwiki`.`revision`.`rev_id` desc limit 11

Even ignoring the indexes does not work (it is not related to covering index):

mysql> EXPLAIN EXTENDED SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` IGNORE INDEX(name_title, PRIMARY) ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11\G
  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: page
           type: ALL
  possible_keys: NULL
            key: NULL
        key_len: NULL
            ref: NULL
           rows: 33836157
       filtered: 100.00
          Extra: Using temporary; Using filesort
  *************************** 2. row ***************************
             id: 1
    select_type: SIMPLE
          table: revision
           type: ref
  possible_keys: rev_page_id,page_timestamp
            key: page_timestamp
        key_len: 4
            ref: enwiki.page.page_id
           rows: 8
       filtered: 100.00
          Extra: Using index
  2 rows in set, 1 warning (0.00 sec)

I think this is a bug on the optimizer.

Running ANALYZE on both tables, even with use-stat-tables=PREFERABLY does not help (and it shouldn't, row counts are ok, it is that for some reason maria refuses to use the clearly better plan -that itself understands).

This could be a specific configuration we have, you can see most of it at:
https://git.wikimedia.org/blob/operations%2Fpuppet.git/1423255184712f7fde17dee338c7ac1519ee44ee/templates%2Fmariadb%2Fproduction.my.cnf.erb



 Comments   
Comment by Elena Stepanova [ 2015-10-21 ]

Tried on x1000 smaller tables (600K and 50K rows, correspondingly). Same result

MariaDB [test]> EXPLAIN SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11;
+------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+
| id   | select_type | table    | type  | possible_keys              | key            | key_len | ref               | rows  | Extra                                        |
+------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+
|    1 | SIMPLE      | page     | index | PRIMARY                    | name_title     | 261     | NULL              | 25272 | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | revision | ref   | rev_page_id,page_timestamp | page_timestamp | 4       | test.page.page_id |     5 | Using index                                  |
+------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

MariaDB [test]> EXPLAIN SELECT STRAIGHT_JOIN rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11;
+------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+
| id   | select_type | table    | type   | possible_keys              | key           | key_len | ref                    | rows | Extra |
+------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+
|    1 | SIMPLE      | revision | index  | rev_page_id,page_timestamp | rev_timestamp | 20      | NULL                   |   11 |       |
|    1 | SIMPLE      | page     | eq_ref | PRIMARY                    | PRIMARY       | 4       | test.revision.rev_page |    1 |       |
+------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+
2 rows in set (0.00 sec)

Execution time for the first query on my machine is ~1 sec, give or take; for the second one it is negligible.

11 rows in set (0.82 sec)
11 rows in set (0.00 sec)

I've uploaded my generated data dump (purely artificial, nothing confidential) to ftp://ftp.askmonty.org/public/mdev8880.dump.gz

Comment by Sergei Petrunia [ 2016-01-21 ]

Confirm, this is the same problem as MDEV-8306. Join optimizer ignores the fact that some join orders can produce rows in the order required by ORDER BY ... LIMIT clause, which allows to finish join execution short.

Comment by Sergei Petrunia [ 2020-01-31 ]

Some details, debugging Elena's testcase

page, revision

page (idx=0)
  position->records_read = 25553
  position->read_time = 609
  total_read_time = 5719 
 
revision (idx=1)
  position->records_read = 4
  position->read_time= 25587
  
  current_record_count = 102,212
  current_read_time = 51,749

Here we hit the "We may have to make a temp table" heuristic, do this:

    current_read_time= COST_ADD(current_read_time, current_record_count);

then, current_read_time = 153,961

revision, page

revision (idx=0)
  position->records_read = 543,446
  position->read_time = 5479
  current_read_time = 114,168
 
page (idx=1)
  position->read_time = 543,446
  position->records_read = 1
 
  current_record_count = 543,446
  current_read_time = 766,303

that is, (revision, page) is more expensive than (page,revision). The heuristic with "We may have to make a temp table" makes the difference smaller but the change is too small to make the query plan differ.

Generated at Thu Feb 08 07:30:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.