Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8880

Very suboptimal join order is generated for a simple query even if MariaDB query planner knows the other is better in every sense

Details

    • 10.1.11

    Description

      This is the summary of the bug reported here: https://phabricator.wikimedia.org/T113901

      This is the exact CREATE syntax of revision and page:

      CREATE TABLE `revision` (
          `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
          `rev_page` int(8) unsigned NOT NULL DEFAULT '0',
          `rev_text_id` int(8) unsigned NOT NULL DEFAULT '0',
          `rev_comment` varbinary(255) DEFAULT NULL,
          `rev_user` int(5) unsigned NOT NULL DEFAULT '0',
          `rev_user_text` varbinary(255) NOT NULL DEFAULT '',
          `rev_timestamp` varbinary(14) NOT NULL DEFAULT '',
          `rev_minor_edit` tinyint(1) unsigned NOT NULL DEFAULT '0',
          `rev_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
          `rev_len` int(8) unsigned DEFAULT NULL,
          `rev_parent_id` int(8) unsigned DEFAULT NULL,
          `rev_sha1` varbinary(32) NOT NULL DEFAULT '',
          `rev_content_model` varbinary(32) DEFAULT NULL,
          `rev_content_format` varbinary(64) DEFAULT NULL,
          PRIMARY KEY (`rev_id`),
          KEY `rev_page_id` (`rev_page`,`rev_id`),
          KEY `rev_timestamp` (`rev_timestamp`,`rev_id`),
          KEY `page_timestamp` (`rev_page`,`rev_timestamp`,`rev_id`),
          KEY `user_timestamp` (`rev_user`,`rev_timestamp`,`rev_id`),
          KEY `usertext_timestamp` (`rev_user_text`,`rev_timestamp`,`rev_id`)
        ) ENGINE=InnoDB AUTO_INCREMENT=683122218 DEFAULT CHARSET=binary

      CREATE TABLE `page` (
        `page_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
        `page_namespace` int(11) NOT NULL DEFAULT '0',
        `page_title` varbinary(255) NOT NULL DEFAULT '',
        `page_restrictions` tinyblob NOT NULL,
        `page_counter` bigint(20) unsigned NOT NULL DEFAULT '0',
        `page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0',
        `page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0',
        `page_random` double unsigned NOT NULL DEFAULT '0',
        `page_touched` varbinary(14) NOT NULL DEFAULT '',
        `page_links_updated` varbinary(14) DEFAULT NULL,
        `page_latest` int(8) unsigned NOT NULL DEFAULT '0',
        `page_len` int(8) unsigned NOT NULL DEFAULT '0',
        `page_content_model` varbinary(32) DEFAULT NULL,
        PRIMARY KEY (`page_id`),
        UNIQUE KEY `name_title` (`page_namespace`,`page_title`),
        KEY `page_random` (`page_random`),
        KEY `page_len` (`page_len`),
        KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`)
      ) ENGINE=InnoDB AUTO_INCREMENT=47992700 DEFAULT CHARSET=binary

      Regular explain:

      Tables have approximately the numbers or rows you see on the autoinc.

      mysql> EXPLAIN SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11\G
        *************************** 1. row ***************************
                   id: 1
          select_type: SIMPLE
                table: page
                 type: index
        possible_keys: PRIMARY
                  key: name_title
              key_len: 261
                  ref: NULL
                 rows: 33836157
                Extra: Using index; Using temporary; Using filesort
        *************************** 2. row ***************************
                   id: 1
          select_type: SIMPLE
                table: revision
                 type: ref
        possible_keys: rev_page_id,page_timestamp
                  key: page_timestamp
              key_len: 4
                  ref: enwiki.page.page_id
                 rows: 8
                Extra: Using index

      Forcing the other plan:

      mysql> EXPLAIN SELECT STRAIGHT_JOIN rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11\G
        *************************** 1. row ***************************
                   id: 1
          select_type: SIMPLE
                table: revision
                 type: index
        possible_keys: rev_page_id,page_timestamp
                  key: rev_timestamp
              key_len: 20
                  ref: NULL
                 rows: 11
                Extra:
        *************************** 2. row ***************************
                   id: 1
          select_type: SIMPLE
                table: page
                 type: eq_ref
        possible_keys: PRIMARY
                  key: PRIMARY
              key_len: 4
                  ref: enwiki.revision.rev_page
                 rows: 1
                Extra:

      The actual query parsed has no issue:

      select `enwiki`.`revision`.`rev_id` AS `rev_id`,`enwiki`.`revision`.`rev_timestamp` AS `rev_timestamp`,`enwiki`.`page`.`page_id`
        AS `page_id`,`enwiki`.`page`.`page_title` AS `page_title`,`enwiki`.`page`.`page_namespace` AS `page_namespace`
        from `enwiki`.`revision` join `enwiki`.`page` where (`enwiki`.`revision`.`rev_page` = `enwiki`.`page`.`page_id`)
        order by `enwiki`.`revision`.`rev_timestamp` desc,`enwiki`.`revision`.`rev_id` desc limit 11

      Even ignoring the indexes does not work (it is not related to covering index):

      mysql> EXPLAIN EXTENDED SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` IGNORE INDEX(name_title, PRIMARY) ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11\G
        *************************** 1. row ***************************
                   id: 1
          select_type: SIMPLE
                table: page
                 type: ALL
        possible_keys: NULL
                  key: NULL
              key_len: NULL
                  ref: NULL
                 rows: 33836157
             filtered: 100.00
                Extra: Using temporary; Using filesort
        *************************** 2. row ***************************
                   id: 1
          select_type: SIMPLE
                table: revision
                 type: ref
        possible_keys: rev_page_id,page_timestamp
                  key: page_timestamp
              key_len: 4
                  ref: enwiki.page.page_id
                 rows: 8
             filtered: 100.00
                Extra: Using index
        2 rows in set, 1 warning (0.00 sec)

      I think this is a bug on the optimizer.

      Running ANALYZE on both tables, even with use-stat-tables=PREFERABLY does not help (and it shouldn't, row counts are ok, it is that for some reason maria refuses to use the clearly better plan -that itself understands).

      This could be a specific configuration we have, you can see most of it at:
      https://git.wikimedia.org/blob/operations%2Fpuppet.git/1423255184712f7fde17dee338c7ac1519ee44ee/templates%2Fmariadb%2Fproduction.my.cnf.erb

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            Tried on x1000 smaller tables (600K and 50K rows, correspondingly). Same result

            MariaDB [test]> EXPLAIN SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11;
            +------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+
            | id   | select_type | table    | type  | possible_keys              | key            | key_len | ref               | rows  | Extra                                        |
            +------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+
            |    1 | SIMPLE      | page     | index | PRIMARY                    | name_title     | 261     | NULL              | 25272 | Using index; Using temporary; Using filesort |
            |    1 | SIMPLE      | revision | ref   | rev_page_id,page_timestamp | page_timestamp | 4       | test.page.page_id |     5 | Using index                                  |
            +------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+
            2 rows in set (0.00 sec)

            MariaDB [test]> EXPLAIN SELECT STRAIGHT_JOIN rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))  ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 11;
            +------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+
            | id   | select_type | table    | type   | possible_keys              | key           | key_len | ref                    | rows | Extra |
            +------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+
            |    1 | SIMPLE      | revision | index  | rev_page_id,page_timestamp | rev_timestamp | 20      | NULL                   |   11 |       |
            |    1 | SIMPLE      | page     | eq_ref | PRIMARY                    | PRIMARY       | 4       | test.revision.rev_page |    1 |       |
            +------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+
            2 rows in set (0.00 sec)

            Execution time for the first query on my machine is ~1 sec, give or take; for the second one it is negligible.

            11 rows in set (0.82 sec)
            11 rows in set (0.00 sec)

            I've uploaded my generated data dump (purely artificial, nothing confidential) to ftp://ftp.askmonty.org/public/mdev8880.dump.gz

            elenst Elena Stepanova added a comment - - edited Tried on x1000 smaller tables (600K and 50K rows, correspondingly). Same result MariaDB [test]> EXPLAIN SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id)) ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11; +------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+ | 1 | SIMPLE | page | index | PRIMARY | name_title | 261 | NULL | 25272 | Using index; Using temporary; Using filesort | | 1 | SIMPLE | revision | ref | rev_page_id,page_timestamp | page_timestamp | 4 | test.page.page_id | 5 | Using index | +------+-------------+----------+-------+----------------------------+----------------+---------+-------------------+-------+----------------------------------------------+ 2 rows in set (0.00 sec) MariaDB [test]> EXPLAIN SELECT STRAIGHT_JOIN rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id)) ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11; +------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+ | 1 | SIMPLE | revision | index | rev_page_id,page_timestamp | rev_timestamp | 20 | NULL | 11 | | | 1 | SIMPLE | page | eq_ref | PRIMARY | PRIMARY | 4 | test.revision.rev_page | 1 | | +------+-------------+----------+--------+----------------------------+---------------+---------+------------------------+------+-------+ 2 rows in set (0.00 sec) Execution time for the first query on my machine is ~1 sec, give or take; for the second one it is negligible. 11 rows in set (0.82 sec) 11 rows in set (0.00 sec) I've uploaded my generated data dump (purely artificial, nothing confidential) to ftp://ftp.askmonty.org/public/mdev8880.dump.gz
            psergei Sergei Petrunia added a comment - - edited

            Confirm, this is the same problem as MDEV-8306. Join optimizer ignores the fact that some join orders can produce rows in the order required by ORDER BY ... LIMIT clause, which allows to finish join execution short.

            psergei Sergei Petrunia added a comment - - edited Confirm, this is the same problem as MDEV-8306 . Join optimizer ignores the fact that some join orders can produce rows in the order required by ORDER BY ... LIMIT clause, which allows to finish join execution short.
            psergei Sergei Petrunia added a comment - - edited

            Some details, debugging Elena's testcase

            page, revision

            page (idx=0)
              position->records_read = 25553
              position->read_time = 609
              total_read_time = 5719 
             
            revision (idx=1)
              position->records_read = 4
              position->read_time= 25587
              
              current_record_count = 102,212
              current_read_time = 51,749
            

            Here we hit the "We may have to make a temp table" heuristic, do this:

                current_read_time= COST_ADD(current_read_time, current_record_count);

            then, current_read_time = 153,961

            revision, page

            revision (idx=0)
              position->records_read = 543,446
              position->read_time = 5479
              current_read_time = 114,168
             
            page (idx=1)
              position->read_time = 543,446
              position->records_read = 1
             
              current_record_count = 543,446
              current_read_time = 766,303
            

            that is, (revision, page) is more expensive than (page,revision). The heuristic with "We may have to make a temp table" makes the difference smaller but the change is too small to make the query plan differ.

            psergei Sergei Petrunia added a comment - - edited Some details, debugging Elena's testcase page, revision page (idx=0) position->records_read = 25553 position->read_time = 609 total_read_time = 5719   revision (idx=1) position->records_read = 4 position->read_time= 25587 current_record_count = 102,212 current_read_time = 51,749 Here we hit the "We may have to make a temp table" heuristic, do this: current_read_time= COST_ADD(current_read_time, current_record_count); then, current_read_time = 153,961 revision, page revision (idx=0) position->records_read = 543,446 position->read_time = 5479 current_read_time = 114,168   page (idx=1) position->read_time = 543,446 position->records_read = 1 current_record_count = 543,446 current_read_time = 766,303 that is, (revision, page) is more expensive than (page,revision). The heuristic with "We may have to make a temp table" makes the difference smaller but the change is too small to make the query plan differ.

            People

              psergei Sergei Petrunia
              jcrespo Jaime Crespo
              Votes:
              4 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.