Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19579

Query optimizer not behaving consistently across servers after rebuilding a table

    XMLWordPrintable

Details

    Description

      We have 24 servers running SSDs (RAID10+BBU) and 512GB RAM and the following query:

      SELECT /* ApiQueryRevisions::run */ rev_id, rev_page, rev_timestamp, rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1, comment_rev_comment.comment_text AS `rev_comment_text`, comment_rev_comment.comment_data AS `rev_comment_data`, comment_rev_comment.comment_id AS `rev_comment_cid`, actor_rev_user.actor_user AS `rev_user`, actor_rev_user.actor_name AS `rev_user_text`, temp_rev_user.revactor_actor AS `rev_actor`, page_namespace, page_title, page_id, page_latest, page_is_redirect, page_len, user_name FROM `revision` JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `revision_actor_temp` `temp_rev_user` ON ((temp_rev_user.revactor_rev = rev_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor)) JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((actor_rev_user.actor_user != 0) AND (user_id = actor_rev_user.actor_user)) WHERE rev_page = '35159700' ORDER BY rev_timestamp DESC, rev_id DESC LIMIT 2;
      

      That query seems to be choosing the wrong index rev_timestamp on some hosts, and the right index page_timestamp on some others.

      The wrong index shows a terrible query plan (this is a show explain for)

      +------+-------------+---------------------+--------+--------------------------------------------------------+---------------+---------+-----------------------------------------------+-----------+-----------------------------+
      | id   | select_type | table               | type   | possible_keys                                          | key           | key_len | ref                                           | rows      | Extra                       |
      +------+-------------+---------------------+--------+--------------------------------------------------------+---------------+---------+-----------------------------------------------+-----------+-----------------------------+
      |    1 | SIMPLE      | revision            | index  | PRIMARY,page_timestamp,page_user_timestamp,rev_page_id | rev_timestamp | 16      | const                                         | 830807594 | Using where; Using filesort |
      |    1 | SIMPLE      | page                | eq_ref | PRIMARY                                                | PRIMARY       | 4       | enwiki.revision.rev_page                      |         1 |                             |
      |    1 | SIMPLE      | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                 | PRIMARY       | 4       | enwiki.revision.rev_id                        |         1 | Using index                 |
      |    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                | PRIMARY       | 8       | enwiki.temp_rev_comment.revcomment_comment_id |         1 |                             |
      |    1 | SIMPLE      | temp_rev_user       | ref    | PRIMARY,revactor_rev,actor_timestamp                   | PRIMARY       | 4       | enwiki.revision.rev_id                        |         1 | Using index                 |
      |    1 | SIMPLE      | actor_rev_user      | eq_ref | PRIMARY                                                | PRIMARY       | 8       | enwiki.temp_rev_user.revactor_actor           |         1 |                             |
      |    1 | SIMPLE      | user                | eq_ref | PRIMARY                                                | PRIMARY       | 4       | enwiki.actor_rev_user.actor_user              |         1 | Using where                 |
      +------+-------------+---------------------+--------+--------------------------------------------------------+---------------+---------+-----------------------------------------------+-----------+-----------------------------+
       

      And the query takes around 10 seconds.

      On the other hand, for those hosts choosing the expected index page_timestamp the query runs almost instantly and it scans a lot less rows:

      +------+-------------+---------------------+--------+--------------------------------------------------------+----------------+---------+-----------------------------------------------+--------+-------------+
      | id   | select_type | table               | type   | possible_keys                                          | key            | key_len | ref                                           | rows   | Extra       |
      +------+-------------+---------------------+--------+--------------------------------------------------------+----------------+---------+-----------------------------------------------+--------+-------------+
      |    1 | SIMPLE      | revision            | ref    | PRIMARY,page_timestamp,page_user_timestamp,rev_page_id | page_timestamp | 4       | const                                         | 153670 | Using where |
      |    1 | SIMPLE      | page                | eq_ref | PRIMARY                                                | PRIMARY        | 4       | enwiki.revision.rev_page                      |      1 |             |
      |    1 | SIMPLE      | temp_rev_user       | ref    | PRIMARY,revactor_rev,actor_timestamp                   | PRIMARY        | 4       | enwiki.revision.rev_id                        |      1 | Using index |
      |    1 | SIMPLE      | actor_rev_user      | eq_ref | PRIMARY                                                | PRIMARY        | 8       | enwiki.temp_rev_user.revactor_actor           |      1 |             |
      |    1 | SIMPLE      | user                | eq_ref | PRIMARY                                                | PRIMARY        | 4       | enwiki.actor_rev_user.actor_user              |      1 | Using where |
      |    1 | SIMPLE      | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                 | PRIMARY        | 4       | enwiki.revision.rev_id                        |      1 | Using index |
      |    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                | PRIMARY        | 8       | enwiki.temp_rev_comment.revcomment_comment_id |      1 |             |
      +------+-------------+---------------------+--------+--------------------------------------------------------+----------------+---------+-----------------------------------------------+--------+-------------+
      

      We ran the following on one of the hosts (10.1.39) that have the wrong query plan:
      alter table revision engine=innodb, force

      And the optimizer changed and started to use the right one.
      However, doing the same on other hosts (with 10.1.37, 10.1.38 and 10.1.39) didn't make the optimizer to choose the right plan.
      On that host, we also ran an analyze table and stopped+started mariadb, but it still uses the wrong plan.

      This can also be seen with a lot simpler queries:

      explain SELECT rev_id, rev_page, rev_timestamp, rev_minor_edit FROM `revision` WHERE rev_page = '35159700' ORDER BY rev_timestamp DESC, rev_id DESC LIMIT 2;
      +------+-------------+----------+-------+------------------------------------------------+---------------+---------+------+-------+-------------+
      | id   | select_type | table    | type  | possible_keys                                  | key           | key_len | ref  | rows  | Extra       |
      +------+-------------+----------+-------+------------------------------------------------+---------------+---------+------+-------+-------------+
      |    1 | SIMPLE      | revision | index | page_timestamp,page_user_timestamp,rev_page_id | rev_timestamp | 16      | NULL | 10359 | Using where |
      +------+-------------+----------+-------+------------------------------------------------+---------------+---------+------+-------+-------------+
      1 row in set (0.04 sec)
      

      And then another host with the correct index:

      explain SELECT rev_id, rev_page, rev_timestamp, rev_minor_edit FROM `revision` WHERE rev_page = '35159700' ORDER BY rev_timestamp DESC, rev_id DESC LIMIT 2;
      +------+-------------+----------+------+------------------------------------------------+----------------+---------+-------+--------+-------------+
      | id   | select_type | table    | type | possible_keys                                  | key            | key_len | ref   | rows   | Extra       |
      +------+-------------+----------+------+------------------------------------------------+----------------+---------+-------+--------+-------------+
      |    1 | SIMPLE      | revision | ref  | page_timestamp,page_user_timestamp,rev_page_id | page_timestamp | 4       | const | 153670 | Using where |
      +------+-------------+----------+------+------------------------------------------------+----------------+---------+-------+--------+-------------+
      1 row in set (0.05 sec)
      

      This behaviour seems to be very random across all the hosts (and versions, either 10.1.38 and 10.1.39), some of them seem to be using the right index and some other don't, even if the run the same mariadb version and same config. And some of them seem to be changing the query plan after the table rebuilt but some others don't.
      The table schema is exactly the same across all of them.

      These are the table definitions:

      *************************** 1. row ***************************
             Table: revision
      Create Table: CREATE TABLE `revision` (
        `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
        `rev_page` int(8) unsigned NOT NULL DEFAULT '0',
        `rev_text_id` int(8) unsigned NOT NULL DEFAULT '0',
        `rev_comment` varbinary(255) DEFAULT '',
        `rev_user` int(5) unsigned NOT NULL DEFAULT '0',
        `rev_user_text` varbinary(255) NOT NULL DEFAULT '',
        `rev_timestamp` varbinary(14) NOT NULL DEFAULT '',
        `rev_minor_edit` tinyint(1) unsigned NOT NULL DEFAULT '0',
        `rev_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
        `rev_len` int(8) unsigned DEFAULT NULL,
        `rev_parent_id` int(8) unsigned DEFAULT NULL,
        `rev_sha1` varbinary(32) NOT NULL DEFAULT '',
        `rev_content_model` varbinary(32) DEFAULT NULL,
        `rev_content_format` varbinary(64) DEFAULT NULL,
        PRIMARY KEY (`rev_id`),
        KEY `rev_timestamp` (`rev_timestamp`),
        KEY `page_timestamp` (`rev_page`,`rev_timestamp`),
        KEY `user_timestamp` (`rev_user`,`rev_timestamp`),
        KEY `usertext_timestamp` (`rev_user_text`,`rev_timestamp`),
        KEY `page_user_timestamp` (`rev_page`,`rev_user`,`rev_timestamp`),
        KEY `rev_page_id` (`rev_page`,`rev_id`)
      ) ENGINE=InnoDB AUTO_INCREMENT=898392806 DEFAULT CHARSET=binary
      *************************** 1. row ***************************
             Table: revision_comment_temp
      Create Table: CREATE TABLE `revision_comment_temp` (
        `revcomment_rev` int(10) unsigned NOT NULL,
        `revcomment_comment_id` bigint(20) unsigned NOT NULL,
        PRIMARY KEY (`revcomment_rev`,`revcomment_comment_id`),
        UNIQUE KEY `revcomment_rev` (`revcomment_rev`)
      ) ENGINE=InnoDB DEFAULT CHARSET=binary
      *************************** 1. row ***************************
             Table: comment
      Create Table: CREATE TABLE `comment` (
        `comment_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        `comment_hash` int(11) NOT NULL,
        `comment_text` blob NOT NULL,
        `comment_data` blob,
        PRIMARY KEY (`comment_id`),
        KEY `comment_hash` (`comment_hash`)
      ) ENGINE=InnoDB AUTO_INCREMENT=287235912 DEFAULT CHARSET=binary
      *************************** 1. row ***************************
             Table: revision_actor_temp
      Create Table: CREATE TABLE `revision_actor_temp` (
        `revactor_rev` int(10) unsigned NOT NULL,
        `revactor_actor` bigint(20) unsigned NOT NULL,
        `revactor_timestamp` binary(14) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
        `revactor_page` int(10) unsigned NOT NULL,
        PRIMARY KEY (`revactor_rev`,`revactor_actor`),
        UNIQUE KEY `revactor_rev` (`revactor_rev`),
        KEY `actor_timestamp` (`revactor_actor`,`revactor_timestamp`),
        KEY `page_actor_timestamp` (`revactor_page`,`revactor_actor`,`revactor_timestamp`)
      ) ENGINE=InnoDB DEFAULT CHARSET=binary
      *************************** 1. row ***************************
             Table: user
      Create Table: CREATE TABLE `user` (
        `user_id` int(5) unsigned NOT NULL AUTO_INCREMENT,
        `user_name` varbinary(255) NOT NULL DEFAULT '',
        `user_real_name` varbinary(255) NOT NULL DEFAULT '',
        `user_password` tinyblob NOT NULL,
        `user_newpassword` tinyblob NOT NULL,
        `user_email` tinyblob NOT NULL,
        `user_touched` varbinary(14) NOT NULL DEFAULT '',
        `user_token` varbinary(32) NOT NULL DEFAULT '',
        `user_email_authenticated` varbinary(14) DEFAULT NULL,
        `user_email_token` varbinary(32) DEFAULT NULL,
        `user_email_token_expires` varbinary(14) DEFAULT NULL,
        `user_registration` varbinary(14) DEFAULT NULL,
        `user_newpass_time` varbinary(14) DEFAULT NULL,
        `user_editcount` int(11) DEFAULT NULL,
        `user_password_expires` varbinary(14) DEFAULT NULL,
        PRIMARY KEY (`user_id`),
        UNIQUE KEY `user_name` (`user_name`),
        KEY `user_email_token` (`user_email_token`),
        KEY `user_email` (`user_email`(50))
      ) ENGINE=InnoDB AUTO_INCREMENT=36583698 DEFAULT CHARSET=binary
      *************************** 1. row ***************************
             Table: actor
      Create Table: CREATE TABLE `actor` (
        `actor_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        `actor_user` int(10) unsigned DEFAULT NULL,
        `actor_name` varbinary(255) NOT NULL,
        PRIMARY KEY (`actor_id`),
        UNIQUE KEY `actor_name` (`actor_name`),
        UNIQUE KEY `actor_user` (`actor_user`)
      ) ENGINE=InnoDB AUTO_INCREMENT=192760880 DEFAULT CHARSET=binary
      *************************** 1. row ***************************
             Table: page
      Create Table: CREATE TABLE `page` (
        `page_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
        `page_namespace` int(11) NOT NULL DEFAULT '0',
        `page_title` varbinary(255) NOT NULL DEFAULT '',
        `page_restrictions` tinyblob NOT NULL,
        `page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0',
        `page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0',
        `page_random` double unsigned NOT NULL DEFAULT '0',
        `page_touched` varbinary(14) NOT NULL DEFAULT '',
        `page_links_updated` varbinary(14) DEFAULT NULL,
        `page_latest` int(8) unsigned NOT NULL DEFAULT '0',
        `page_len` int(8) unsigned NOT NULL DEFAULT '0',
        `page_content_model` varbinary(32) DEFAULT NULL,
        `page_lang` varbinary(35) DEFAULT NULL,
        PRIMARY KEY (`page_id`),
        UNIQUE KEY `name_title` (`page_namespace`,`page_title`),
        KEY `page_random` (`page_random`),
        KEY `page_len` (`page_len`),
        KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`)
      ) ENGINE=InnoDB AUTO_INCREMENT=60840578 DEFAULT CHARSET=binary
      

      They are quite big on disk too:

      -rw-rw---- 1 mysql mysql 242G May 23 08:17 revision.ibd
      -rw-rw---- 1 mysql mysql 48G May 23 08:16 revision_comment_temp.ibd
      -rw-rw---- 1 mysql mysql 48G May 23 08:16 comment.ibd
      -rw-rw---- 1 mysql mysql 156G May 23 08:17 revision_actor_temp.ibd
      -rw-rw---- 1 mysql mysql 13G May 23 08:17 user.ibd
      -rw-rw---- 1 mysql mysql 9.5G May 23 08:17 actor.ibd
      -rw-rw---- 1 mysql mysql 14G May 23 08:17 page.ibd
      

      These are the optimizer variables

      | optimizer_switch                         | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
                                                                                                                                         
      | optimizer_prune_level                    | 1                                                                                                                                                                              
      | optimizer_search_depth                   | 62                                                                                                                                                           |
      | optimizer_selectivity_sampling_limit     | 100
      

      To sum up all the hosts that got the table rebuilt, their version and the result:

      db1080 (10.1.38) plan remained on rev_timestamp (later upgraded to 10.1.39 and nothing changed)
      db2048 (10.1.37) plan remained on rev_timestamp
      db2055 (10.1.38) plan changed to page_timestamp
      db2071 (10.1.38) plan changed to page_timestamp
      db2085:3311 (10.1.38) plan changed to page_timestamp
      db2092 (10.1.38) plan changed to page_timestamp
      db2103 (10.1.39) plan changed to page_timestamp
      db2112 (10.1.39) plan remained to rev_timestamp

      This is a pt-config-diff of two hosts with the same version but with different behaviours:

      Variable                  db2112                    db2103
      ========================= ========================= =========================
      general_log_file          db2112.log                db2103.log
      gtid_binlog_pos           171974720-171974720-15... 171974720-171974720-15...
      gtid_binlog_state         171974720-171974720-15... 171974720-171974720-15...
      gtid_current_pos          0-171970637-5484646134... 0-171970637-5484646134...
      gtid_domain_id            180363268                 180355190
      gtid_slave_pos            0-171970637-5484646134... 0-171970637-5484646134...
      hostname                  db2112                    db2103
      log_bin_basename          /srv/sqldata/db2112-bin   /srv/sqldata/db2103-bin
      log_bin_index             /srv/sqldata/db2112-bi... /srv/sqldata/db2103-bi...
      pid_file                  /srv/sqldata/db2112.pid   /srv/sqldata/db2103.pid
      server_id                 180363268                 180355190
      slow_query_log_file       db2112-slow.log           db2103-slow.log
      wsrep_node_name           db2112                    db2103
      

      Index cardinality for two failing hosts
      Host with the wrong index being used

      +----------+------------+---------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | Table    | Non_unique | Key_name            | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
      +----------+------------+---------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | revision |          0 | PRIMARY             |            1 | rev_id        | A         |   830807594 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | rev_timestamp       |            1 | rev_timestamp | A         |   371509902 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_timestamp      |            1 | rev_page      | A         |    48061342 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_timestamp      |            2 | rev_timestamp | A         |   830724521 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | user_timestamp      |            1 | rev_user      | A         |     9217389 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | user_timestamp      |            2 | rev_timestamp | A         |   782673192 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | usertext_timestamp  |            1 | rev_user_text | A         |    54411751 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | usertext_timestamp  |            2 | rev_timestamp | A         |   816037318 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_user_timestamp |            1 | rev_page      | A         |    48061342 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_user_timestamp |            2 | rev_user      | A         |   296272588 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_user_timestamp |            3 | rev_timestamp | A         |   830724521 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | rev_page_id         |            1 | rev_page      | A         |    48061342 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | rev_page_id         |            2 | rev_id        | A         |   830807594 |     NULL | NULL   |      | BTREE      |         |               |
      +----------+------------+---------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      

      Host with the right index being used

      +----------+------------+---------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | Table    | Non_unique | Key_name            | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
      +----------+------------+---------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | revision |          0 | PRIMARY             |            1 | rev_id        | A         |   830575333 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | rev_timestamp       |            1 | rev_timestamp | A         |   371389435 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_timestamp      |            1 | rev_page      | A         |    48048462 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_timestamp      |            2 | rev_timestamp | A         |   830492283 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | user_timestamp      |            1 | rev_user      | A         |     9214945 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | user_timestamp      |            2 | rev_timestamp | A         |   782454388 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | usertext_timestamp  |            1 | rev_user_text | A         |    54398321 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | usertext_timestamp  |            2 | rev_timestamp | A         |   815809186 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_user_timestamp |            1 | rev_page      | A         |    48048462 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_user_timestamp |            2 | rev_user      | A         |   296200325 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | page_user_timestamp |            3 | rev_timestamp | A         |   830492283 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | rev_page_id         |            1 | rev_page      | A         |    48048462 |     NULL | NULL   |      | BTREE      |         |               |
      | revision |          1 | rev_page_id         |            2 | rev_id        | A         |   830575333 |     NULL | NULL   |      | BTREE      |         |               |
      +----------+------------+---------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      

      Please let me know which other mysql config flags you'd need.

      Attachments

        Activity

          People

            igor Igor Babaev
            marostegui Manuel Arostegui
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.