Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9097

Engine independent statistics only allows utf8 values, making query plans fail

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.0.16, 10.0.22, 10.0(EOL), 10.1(EOL)
    • 10.0.23
    • Optimizer
    • 10.1.9-3

    Description

      When using use_stat_tables = PREFERABLY, and then run ANALYZE TABLE on a table with a an indexed column containing UTF-8 characters (but not mysql's 'utf8'), some query plans can be altered, sometimes with horrible performance consequences.

      What it is worse, EXPLAIN does not show the new query plan from broken statistics, but it shows the original query plan, while we can check after execution than we have a very different execution plan not only on the query latency but on the Handler statistics.

      How to reproduce:

      Error formatting macro: code: java.lang.StackOverflowError

      CREATE DATABASE zhwiki;
      use zhwiki;
      CREATE TABLE `recentchanges` (
      `rc_id` int(8) NOT NULL AUTO_INCREMENT,
      `rc_timestamp` varbinary(14) NOT NULL DEFAULT '',
      `rc_cur_time` varbinary(14) NOT NULL DEFAULT '',
      `rc_user` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_user_text` varbinary(255) NOT NULL DEFAULT '',
      `rc_namespace` int(11) NOT NULL DEFAULT '0',
      `rc_title` varbinary(255) NOT NULL DEFAULT '',
      `rc_comment` varbinary(255) NOT NULL DEFAULT '',
      `rc_minor` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_bot` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_new` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_cur_id` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_this_oldid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_last_oldid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_type` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_source` varbinary(16) NOT NULL DEFAULT '',
      `rc_moved_to_ns` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_moved_to_title` varbinary(255) NOT NULL DEFAULT '',
      `rc_patrolled` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_ip` varbinary(40) NOT NULL DEFAULT '',
      `rc_old_len` int(10) DEFAULT NULL,
      `rc_new_len` int(10) DEFAULT NULL,
      `rc_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
      `rc_logid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_log_type` varbinary(255) DEFAULT NULL,
      `rc_log_action` varbinary(255) DEFAULT NULL,
      `rc_params` blob NOT NULL,
      PRIMARY KEY (`rc_id`),
      KEY `rc_timestamp` (`rc_timestamp`),
      KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
      KEY `rc_cur_id` (`rc_cur_id`),
      KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
      KEY `rc_ip` (`rc_ip`),
      KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
      KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`)
      ) ENGINE=InnoDB AUTO_INCREMENT=52457042 DEFAULT CHARSET=binary

      – Note the charset binary, this allows any character encoding

      INSERT INTO `recentchanges` VALUES (51267205,'20151023001221','',0,'Reinheitsgebot',0,'��鯰科','添加语言链接:[[:specieswiki:Pangasiidae]]',1,1,0,566064,36471314,36471314,5,'wb',0,'',1,'',1254,1254,0,0,NULL,'','a:3:{s:20:\"wikibase-repo-change\";a:14:

      {s:2:\"id\";i:259555851;s:4:\"type\";s:20:\"wikibase-item~update\";s:4:\"time\";s:14:\"20151023001221\";s:9:\"object_id\";s:7:\"q901080\";s:7:\"user_id\";s:6:\"940976\";s:11:\"revision_id\";s:9:\"261218043\";s:11:\"entity_type\";s:4:\"item\";s:9:\"user_text\";s:14:\"Reinheitsgebot\";s:7:\"page_id\";i:851866;s:9:\"parent_id\";i:255341522;s:7:\"comment\";s:49:\"/* wbsetsitelink-add:1|specieswiki */ Pangasiidae\";s:6:\"rev_id\";i:261218043;s:3:\"bot\";i:1;s:7:\"site_id\";s:12:\"wikidatawiki\";}

      s:7:\"comment\";s:49:\"添加语言链接:[[:specieswiki:Pangasiidae]]\";s:12:\"comment-html\";s:123:\"添加语言链接:<a class=\"external\" href=\"//www.wikidata.org/wiki/Specieswiki:Pangasiidae\">specieswiki:Pangasiidae</a>\";}');

      SET GLOBAL use_stat_tables=PREFERABLY;

      – reload connection for change to take effect to your current session

      MariaDB [zhwiki]> ANALYZE TABLE recentchanges;
      --------------------------------------------------------------------------------------------------------------------------+

      Table Op Msg_type Msg_text

      --------------------------------------------------------------------------------------------------------------------------+

      zhwiki.recentchanges analyze status Engine-independent statistics collected
      zhwiki.recentchanges analyze Warning Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'min_value' at row 1
      zhwiki.recentchanges analyze Warning Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'max_value' at row 1
      zhwiki.recentchanges analyze status OK

      --------------------------------------------------------------------------------------------------------------------------+

      We can check that effectively, the tables have the wrong encoding:

      CREATE TABLE `column_stats` (
      `db_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `column_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `min_value` varchar(255) COLLATE utf8_bin DEFAULT NULL,
      `max_value` varchar(255) COLLATE utf8_bin DEFAULT NULL,
      `nulls_ratio` decimal(12,4) DEFAULT NULL,
      `avg_length` decimal(12,4) DEFAULT NULL,
      `avg_frequency` decimal(12,4) DEFAULT NULL,
      `hist_size` tinyint(3) unsigned DEFAULT NULL,
      `hist_type` enum('SINGLE_PREC_HB','DOUBLE_PREC_HB') COLLATE utf8_bin DEFAULT NULL,
      `histogram` varbinary(255) DEFAULT NULL,
      PRIMARY KEY (`db_name`,`table_name`,`column_name`)
      ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Columns'

      With 1.4 millons rows, a simple query like:

      FLUSH STATUS;
      SELECT /* ApiQueryRecentChanges::run SANITIZED */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,rc_user,rc_user_text,ts_tags,page_is_redirect FROM `recentchanges` LEFT JOIN `tag_summary` ON ((rc_id=ts_rc_id)) LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = '0' AND rc_type = '1' AND (rc_patrolled = 0) AND (page_is_redirect = 0 OR page_is_redirect IS NULL) AND rc_new=1 ORDER BY rc_timestamp DESC,rc_id DESC LIMIT 51;
      SHOW STATUS like 'Hand%';

      goes from reading 3000 rows to read the full table (full index scan) using the timestamp index, 1.4 millon rows.

      As a workaround, disable use_stat_tables.

      3 things went wrong here:

      • most value-related columns from the stats tables should be utf8mb4, probably binary, as binary columns can be indexed too
      • Execute updates of use_stat_tables transactionally. Those are not warnings, they are errors and should rollback all updates to those tables
      • EXPLAIN should show the real query plan (is it ignoring use_stat_tables?). When stats are broken, rows: 51 is shown, when in reality 1.4 million rows are scanned.

      Attachments

        Issue Links

          Activity

            jcrespo Jaime Crespo created issue -
            jcrespo Jaime Crespo made changes -
            Field Original Value New Value
            Description When using use_stat_tables = PREFERABLY, and then run ANALYZE TABLE on a table with a an indexed column containing UTF-8 characters (but not mysql's 'utf8'), some query plans can be altered, sometimes with horrible performance consequences.

            What it is worse, EXPLAIN does not show the new query plan from broken statistics, but it shows the original query plan, while we can check after execution than we have a very different execution plan not only on the query latency but on the Handler statistics.

            How to reproduce:

            {code:sql}
            CREATE DATABASE zhwiki;
            use zhwiki;
            CREATE TABLE `recentchanges` (
              `rc_id` int(8) NOT NULL AUTO_INCREMENT,
              `rc_timestamp` varbinary(14) NOT NULL DEFAULT '',
              `rc_cur_time` varbinary(14) NOT NULL DEFAULT '',
              `rc_user` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_user_text` varbinary(255) NOT NULL DEFAULT '',
              `rc_namespace` int(11) NOT NULL DEFAULT '0',
              `rc_title` varbinary(255) NOT NULL DEFAULT '',
              `rc_comment` varbinary(255) NOT NULL DEFAULT '',
              `rc_minor` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_bot` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_new` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_cur_id` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_this_oldid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_last_oldid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_type` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_source` varbinary(16) NOT NULL DEFAULT '',
              `rc_moved_to_ns` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_moved_to_title` varbinary(255) NOT NULL DEFAULT '',
              `rc_patrolled` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_ip` varbinary(40) NOT NULL DEFAULT '',
              `rc_old_len` int(10) DEFAULT NULL,
              `rc_new_len` int(10) DEFAULT NULL,
              `rc_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
              `rc_logid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_log_type` varbinary(255) DEFAULT NULL,
              `rc_log_action` varbinary(255) DEFAULT NULL,
              `rc_params` blob NOT NULL,
              PRIMARY KEY (`rc_id`),
              KEY `rc_timestamp` (`rc_timestamp`),
              KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
              KEY `rc_cur_id` (`rc_cur_id`),
              KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
              KEY `rc_ip` (`rc_ip`),
              KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
              KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`)
            ) ENGINE=InnoDB AUTO_INCREMENT=52457042 DEFAULT CHARSET=binary

            -- Note the charset binary, this allows any character encoding

            INSERT INTO `recentchanges` VALUES (51267205,'20151023001221','',0,'Reinheitsgebot',0,'��鯰科','添加语言链接:[[:specieswiki:Pangasiidae]]',1,1,0,566064,36471314,36471314,5,'wb',0,'',1,'',1254,1254,0,0,NULL,'','a:3:{s:20:\"wikibase-repo-change\";a:14:{s:2:\"id\";i:259555851;s:4:\"type\";s:20:\"wikibase-item~update\";s:4:\"time\";s:14:\"20151023001221\";s:9:\"object_id\";s:7:\"q901080\";s:7:\"user_id\";s:6:\"940976\";s:11:\"revision_id\";s:9:\"261218043\";s:11:\"entity_type\";s:4:\"item\";s:9:\"user_text\";s:14:\"Reinheitsgebot\";s:7:\"page_id\";i:851866;s:9:\"parent_id\";i:255341522;s:7:\"comment\";s:49:\"/* wbsetsitelink-add:1|specieswiki */ Pangasiidae\";s:6:\"rev_id\";i:261218043;s:3:\"bot\";i:1;s:7:\"site_id\";s:12:\"wikidatawiki\";}s:7:\"comment\";s:49:\"添加语言链接:[[:specieswiki:Pangasiidae]]\";s:12:\"comment-html\";s:123:\"添加语言链接:<a class=\"external\" href=\"//www.wikidata.org/wiki/Specieswiki:Pangasiidae\">specieswiki:Pangasiidae</a>\";}');

            SET GLOBAL use_stat_tables=PREFERABLY;

            -- reload connection for change to take effect to your current session

            MariaDB [zhwiki]> ANALYZE TABLE recentchanges;
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            | Table | Op | Msg_type | Msg_text |
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            | zhwiki.recentchanges | analyze | status | Engine-independent statistics collected |
            | zhwiki.recentchanges | analyze | Warning | Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'min_value' at row 1 |
            | zhwiki.recentchanges | analyze | Warning | Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'max_value' at row 1 |
            | zhwiki.recentchanges | analyze | status | OK |
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            With 1.4 millons rows, a simple query like:

            FLUSH STATUS;
            SELECT /* ApiQueryRecentChanges::run SANITIZED */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,rc_user,rc_user_text,ts_tags,page_is_redirect FROM `recentchanges` LEFT JOIN `tag_summary` ON ((rc_id=ts_rc_id)) LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = '0' AND rc_type = '1' AND (rc_patrolled = 0) AND (page_is_redirect = 0 OR page_is_redirect IS NULL) AND rc_new=1 ORDER BY rc_timestamp DESC,rc_id DESC LIMIT 51;
            SHOW STATUS like 'Hand%';

            goes from reading 3000 rows to read the full table (full index scan) using the timestamp index, 1.4 millon rows.
            {code}

            As a workaround, disable use_stat_tables.

            3 things went wrong here:

            * most value-related columns from the stats tables should be utf8mb4, probably binary, as binary columns can be indexed too
            * Execute updates of use_stat_tables transitionally. Those are not warnings, they are errors and should rollback all updates to those tables
            * EXPLAIN should show the real query plan (is it ignoring use_stat_tables?). When stats are broken, rows: 51 is shown, when in reality 1.4 million rows are scanned.
            When using use_stat_tables = PREFERABLY, and then run ANALYZE TABLE on a table with a an indexed column containing UTF-8 characters (but not mysql's 'utf8'), some query plans can be altered, sometimes with horrible performance consequences.

            What it is worse, EXPLAIN does not show the new query plan from broken statistics, but it shows the original query plan, while we can check after execution than we have a very different execution plan not only on the query latency but on the Handler statistics.

            How to reproduce:

            {code:sql}
            CREATE DATABASE zhwiki;
            use zhwiki;
            CREATE TABLE `recentchanges` (
              `rc_id` int(8) NOT NULL AUTO_INCREMENT,
              `rc_timestamp` varbinary(14) NOT NULL DEFAULT '',
              `rc_cur_time` varbinary(14) NOT NULL DEFAULT '',
              `rc_user` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_user_text` varbinary(255) NOT NULL DEFAULT '',
              `rc_namespace` int(11) NOT NULL DEFAULT '0',
              `rc_title` varbinary(255) NOT NULL DEFAULT '',
              `rc_comment` varbinary(255) NOT NULL DEFAULT '',
              `rc_minor` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_bot` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_new` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_cur_id` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_this_oldid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_last_oldid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_type` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_source` varbinary(16) NOT NULL DEFAULT '',
              `rc_moved_to_ns` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_moved_to_title` varbinary(255) NOT NULL DEFAULT '',
              `rc_patrolled` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_ip` varbinary(40) NOT NULL DEFAULT '',
              `rc_old_len` int(10) DEFAULT NULL,
              `rc_new_len` int(10) DEFAULT NULL,
              `rc_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
              `rc_logid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_log_type` varbinary(255) DEFAULT NULL,
              `rc_log_action` varbinary(255) DEFAULT NULL,
              `rc_params` blob NOT NULL,
              PRIMARY KEY (`rc_id`),
              KEY `rc_timestamp` (`rc_timestamp`),
              KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
              KEY `rc_cur_id` (`rc_cur_id`),
              KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
              KEY `rc_ip` (`rc_ip`),
              KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
              KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`)
            ) ENGINE=InnoDB AUTO_INCREMENT=52457042 DEFAULT CHARSET=binary

            -- Note the charset binary, this allows any character encoding

            INSERT INTO `recentchanges` VALUES (51267205,'20151023001221','',0,'Reinheitsgebot',0,'��鯰科','添加语言链接:[[:specieswiki:Pangasiidae]]',1,1,0,566064,36471314,36471314,5,'wb',0,'',1,'',1254,1254,0,0,NULL,'','a:3:{s:20:\"wikibase-repo-change\";a:14:{s:2:\"id\";i:259555851;s:4:\"type\";s:20:\"wikibase-item~update\";s:4:\"time\";s:14:\"20151023001221\";s:9:\"object_id\";s:7:\"q901080\";s:7:\"user_id\";s:6:\"940976\";s:11:\"revision_id\";s:9:\"261218043\";s:11:\"entity_type\";s:4:\"item\";s:9:\"user_text\";s:14:\"Reinheitsgebot\";s:7:\"page_id\";i:851866;s:9:\"parent_id\";i:255341522;s:7:\"comment\";s:49:\"/* wbsetsitelink-add:1|specieswiki */ Pangasiidae\";s:6:\"rev_id\";i:261218043;s:3:\"bot\";i:1;s:7:\"site_id\";s:12:\"wikidatawiki\";}s:7:\"comment\";s:49:\"添加语言链接:[[:specieswiki:Pangasiidae]]\";s:12:\"comment-html\";s:123:\"添加语言链接:<a class=\"external\" href=\"//www.wikidata.org/wiki/Specieswiki:Pangasiidae\">specieswiki:Pangasiidae</a>\";}');

            SET GLOBAL use_stat_tables=PREFERABLY;

            -- reload connection for change to take effect to your current session

            MariaDB [zhwiki]> ANALYZE TABLE recentchanges;
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            | Table | Op | Msg_type | Msg_text |
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            | zhwiki.recentchanges | analyze | status | Engine-independent statistics collected |
            | zhwiki.recentchanges | analyze | Warning | Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'min_value' at row 1 |
            | zhwiki.recentchanges | analyze | Warning | Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'max_value' at row 1 |
            | zhwiki.recentchanges | analyze | status | OK |
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            With 1.4 millons rows, a simple query like:

            FLUSH STATUS;
            SELECT /* ApiQueryRecentChanges::run SANITIZED */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,rc_user,rc_user_text,ts_tags,page_is_redirect FROM `recentchanges` LEFT JOIN `tag_summary` ON ((rc_id=ts_rc_id)) LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = '0' AND rc_type = '1' AND (rc_patrolled = 0) AND (page_is_redirect = 0 OR page_is_redirect IS NULL) AND rc_new=1 ORDER BY rc_timestamp DESC,rc_id DESC LIMIT 51;
            SHOW STATUS like 'Hand%';

            goes from reading 3000 rows to read the full table (full index scan) using the timestamp index, 1.4 millon rows.
            {code}

            As a workaround, disable use_stat_tables.

            3 things went wrong here:

            * most value-related columns from the stats tables should be utf8mb4, probably binary, as binary columns can be indexed too
            * Execute updates of use_stat_tables transactionally. Those are not warnings, they are errors and should rollback all updates to those tables
            * EXPLAIN should show the real query plan (is it ignoring use_stat_tables?). When stats are broken, rows: 51 is shown, when in reality 1.4 million rows are scanned.
            jcrespo Jaime Crespo made changes -
            Description When using use_stat_tables = PREFERABLY, and then run ANALYZE TABLE on a table with a an indexed column containing UTF-8 characters (but not mysql's 'utf8'), some query plans can be altered, sometimes with horrible performance consequences.

            What it is worse, EXPLAIN does not show the new query plan from broken statistics, but it shows the original query plan, while we can check after execution than we have a very different execution plan not only on the query latency but on the Handler statistics.

            How to reproduce:

            {code:sql}
            CREATE DATABASE zhwiki;
            use zhwiki;
            CREATE TABLE `recentchanges` (
              `rc_id` int(8) NOT NULL AUTO_INCREMENT,
              `rc_timestamp` varbinary(14) NOT NULL DEFAULT '',
              `rc_cur_time` varbinary(14) NOT NULL DEFAULT '',
              `rc_user` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_user_text` varbinary(255) NOT NULL DEFAULT '',
              `rc_namespace` int(11) NOT NULL DEFAULT '0',
              `rc_title` varbinary(255) NOT NULL DEFAULT '',
              `rc_comment` varbinary(255) NOT NULL DEFAULT '',
              `rc_minor` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_bot` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_new` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_cur_id` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_this_oldid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_last_oldid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_type` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_source` varbinary(16) NOT NULL DEFAULT '',
              `rc_moved_to_ns` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_moved_to_title` varbinary(255) NOT NULL DEFAULT '',
              `rc_patrolled` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_ip` varbinary(40) NOT NULL DEFAULT '',
              `rc_old_len` int(10) DEFAULT NULL,
              `rc_new_len` int(10) DEFAULT NULL,
              `rc_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
              `rc_logid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_log_type` varbinary(255) DEFAULT NULL,
              `rc_log_action` varbinary(255) DEFAULT NULL,
              `rc_params` blob NOT NULL,
              PRIMARY KEY (`rc_id`),
              KEY `rc_timestamp` (`rc_timestamp`),
              KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
              KEY `rc_cur_id` (`rc_cur_id`),
              KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
              KEY `rc_ip` (`rc_ip`),
              KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
              KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`)
            ) ENGINE=InnoDB AUTO_INCREMENT=52457042 DEFAULT CHARSET=binary

            -- Note the charset binary, this allows any character encoding

            INSERT INTO `recentchanges` VALUES (51267205,'20151023001221','',0,'Reinheitsgebot',0,'��鯰科','添加语言链接:[[:specieswiki:Pangasiidae]]',1,1,0,566064,36471314,36471314,5,'wb',0,'',1,'',1254,1254,0,0,NULL,'','a:3:{s:20:\"wikibase-repo-change\";a:14:{s:2:\"id\";i:259555851;s:4:\"type\";s:20:\"wikibase-item~update\";s:4:\"time\";s:14:\"20151023001221\";s:9:\"object_id\";s:7:\"q901080\";s:7:\"user_id\";s:6:\"940976\";s:11:\"revision_id\";s:9:\"261218043\";s:11:\"entity_type\";s:4:\"item\";s:9:\"user_text\";s:14:\"Reinheitsgebot\";s:7:\"page_id\";i:851866;s:9:\"parent_id\";i:255341522;s:7:\"comment\";s:49:\"/* wbsetsitelink-add:1|specieswiki */ Pangasiidae\";s:6:\"rev_id\";i:261218043;s:3:\"bot\";i:1;s:7:\"site_id\";s:12:\"wikidatawiki\";}s:7:\"comment\";s:49:\"添加语言链接:[[:specieswiki:Pangasiidae]]\";s:12:\"comment-html\";s:123:\"添加语言链接:<a class=\"external\" href=\"//www.wikidata.org/wiki/Specieswiki:Pangasiidae\">specieswiki:Pangasiidae</a>\";}');

            SET GLOBAL use_stat_tables=PREFERABLY;

            -- reload connection for change to take effect to your current session

            MariaDB [zhwiki]> ANALYZE TABLE recentchanges;
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            | Table | Op | Msg_type | Msg_text |
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            | zhwiki.recentchanges | analyze | status | Engine-independent statistics collected |
            | zhwiki.recentchanges | analyze | Warning | Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'min_value' at row 1 |
            | zhwiki.recentchanges | analyze | Warning | Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'max_value' at row 1 |
            | zhwiki.recentchanges | analyze | status | OK |
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            With 1.4 millons rows, a simple query like:

            FLUSH STATUS;
            SELECT /* ApiQueryRecentChanges::run SANITIZED */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,rc_user,rc_user_text,ts_tags,page_is_redirect FROM `recentchanges` LEFT JOIN `tag_summary` ON ((rc_id=ts_rc_id)) LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = '0' AND rc_type = '1' AND (rc_patrolled = 0) AND (page_is_redirect = 0 OR page_is_redirect IS NULL) AND rc_new=1 ORDER BY rc_timestamp DESC,rc_id DESC LIMIT 51;
            SHOW STATUS like 'Hand%';

            goes from reading 3000 rows to read the full table (full index scan) using the timestamp index, 1.4 millon rows.
            {code}

            As a workaround, disable use_stat_tables.

            3 things went wrong here:

            * most value-related columns from the stats tables should be utf8mb4, probably binary, as binary columns can be indexed too
            * Execute updates of use_stat_tables transactionally. Those are not warnings, they are errors and should rollback all updates to those tables
            * EXPLAIN should show the real query plan (is it ignoring use_stat_tables?). When stats are broken, rows: 51 is shown, when in reality 1.4 million rows are scanned.
            When using use_stat_tables = PREFERABLY, and then run ANALYZE TABLE on a table with a an indexed column containing UTF-8 characters (but not mysql's 'utf8'), some query plans can be altered, sometimes with horrible performance consequences.

            What it is worse, EXPLAIN does not show the new query plan from broken statistics, but it shows the original query plan, while we can check after execution than we have a very different execution plan not only on the query latency but on the Handler statistics.

            How to reproduce:

            {code:sql}
            CREATE DATABASE zhwiki;
            use zhwiki;
            CREATE TABLE `recentchanges` (
              `rc_id` int(8) NOT NULL AUTO_INCREMENT,
              `rc_timestamp` varbinary(14) NOT NULL DEFAULT '',
              `rc_cur_time` varbinary(14) NOT NULL DEFAULT '',
              `rc_user` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_user_text` varbinary(255) NOT NULL DEFAULT '',
              `rc_namespace` int(11) NOT NULL DEFAULT '0',
              `rc_title` varbinary(255) NOT NULL DEFAULT '',
              `rc_comment` varbinary(255) NOT NULL DEFAULT '',
              `rc_minor` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_bot` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_new` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_cur_id` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_this_oldid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_last_oldid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_type` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_source` varbinary(16) NOT NULL DEFAULT '',
              `rc_moved_to_ns` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_moved_to_title` varbinary(255) NOT NULL DEFAULT '',
              `rc_patrolled` tinyint(3) unsigned NOT NULL DEFAULT '0',
              `rc_ip` varbinary(40) NOT NULL DEFAULT '',
              `rc_old_len` int(10) DEFAULT NULL,
              `rc_new_len` int(10) DEFAULT NULL,
              `rc_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
              `rc_logid` int(10) unsigned NOT NULL DEFAULT '0',
              `rc_log_type` varbinary(255) DEFAULT NULL,
              `rc_log_action` varbinary(255) DEFAULT NULL,
              `rc_params` blob NOT NULL,
              PRIMARY KEY (`rc_id`),
              KEY `rc_timestamp` (`rc_timestamp`),
              KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
              KEY `rc_cur_id` (`rc_cur_id`),
              KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
              KEY `rc_ip` (`rc_ip`),
              KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
              KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`)
            ) ENGINE=InnoDB AUTO_INCREMENT=52457042 DEFAULT CHARSET=binary

            -- Note the charset binary, this allows any character encoding

            INSERT INTO `recentchanges` VALUES (51267205,'20151023001221','',0,'Reinheitsgebot',0,'��鯰科','添加语言链接:[[:specieswiki:Pangasiidae]]',1,1,0,566064,36471314,36471314,5,'wb',0,'',1,'',1254,1254,0,0,NULL,'','a:3:{s:20:\"wikibase-repo-change\";a:14:{s:2:\"id\";i:259555851;s:4:\"type\";s:20:\"wikibase-item~update\";s:4:\"time\";s:14:\"20151023001221\";s:9:\"object_id\";s:7:\"q901080\";s:7:\"user_id\";s:6:\"940976\";s:11:\"revision_id\";s:9:\"261218043\";s:11:\"entity_type\";s:4:\"item\";s:9:\"user_text\";s:14:\"Reinheitsgebot\";s:7:\"page_id\";i:851866;s:9:\"parent_id\";i:255341522;s:7:\"comment\";s:49:\"/* wbsetsitelink-add:1|specieswiki */ Pangasiidae\";s:6:\"rev_id\";i:261218043;s:3:\"bot\";i:1;s:7:\"site_id\";s:12:\"wikidatawiki\";}s:7:\"comment\";s:49:\"添加语言链接:[[:specieswiki:Pangasiidae]]\";s:12:\"comment-html\";s:123:\"添加语言链接:<a class=\"external\" href=\"//www.wikidata.org/wiki/Specieswiki:Pangasiidae\">specieswiki:Pangasiidae</a>\";}');

            SET GLOBAL use_stat_tables=PREFERABLY;

            -- reload connection for change to take effect to your current session

            MariaDB [zhwiki]> ANALYZE TABLE recentchanges;
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            | Table | Op | Msg_type | Msg_text |
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+
            | zhwiki.recentchanges | analyze | status | Engine-independent statistics collected |
            | zhwiki.recentchanges | analyze | Warning | Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'min_value' at row 1 |
            | zhwiki.recentchanges | analyze | Warning | Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'max_value' at row 1 |
            | zhwiki.recentchanges | analyze | status | OK |
            +----------------------+---------+----------+---------------------------------------------------------------------------------------+

            We can check that effectively, the tables have the wrong encoding:

            CREATE TABLE `column_stats` (
              `db_name` varchar(64) COLLATE utf8_bin NOT NULL,
              `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
              `column_name` varchar(64) COLLATE utf8_bin NOT NULL,
              `min_value` varchar(255) COLLATE utf8_bin DEFAULT NULL,
              `max_value` varchar(255) COLLATE utf8_bin DEFAULT NULL,
              `nulls_ratio` decimal(12,4) DEFAULT NULL,
              `avg_length` decimal(12,4) DEFAULT NULL,
              `avg_frequency` decimal(12,4) DEFAULT NULL,
              `hist_size` tinyint(3) unsigned DEFAULT NULL,
              `hist_type` enum('SINGLE_PREC_HB','DOUBLE_PREC_HB') COLLATE utf8_bin DEFAULT NULL,
              `histogram` varbinary(255) DEFAULT NULL,
              PRIMARY KEY (`db_name`,`table_name`,`column_name`)
            ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Columns'

            With 1.4 millons rows, a simple query like:

            FLUSH STATUS;
            SELECT /* ApiQueryRecentChanges::run SANITIZED */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,rc_user,rc_user_text,ts_tags,page_is_redirect FROM `recentchanges` LEFT JOIN `tag_summary` ON ((rc_id=ts_rc_id)) LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = '0' AND rc_type = '1' AND (rc_patrolled = 0) AND (page_is_redirect = 0 OR page_is_redirect IS NULL) AND rc_new=1 ORDER BY rc_timestamp DESC,rc_id DESC LIMIT 51;
            SHOW STATUS like 'Hand%';

            goes from reading 3000 rows to read the full table (full index scan) using the timestamp index, 1.4 millon rows.
            {code}

            As a workaround, disable use_stat_tables.

            3 things went wrong here:

            * most value-related columns from the stats tables should be utf8mb4, probably binary, as binary columns can be indexed too
            * Execute updates of use_stat_tables transactionally. Those are not warnings, they are errors and should rollback all updates to those tables
            * EXPLAIN should show the real query plan (is it ignoring use_stat_tables?). When stats are broken, rows: 51 is shown, when in reality 1.4 million rows are scanned.
            danblack Daniel Black made changes -
            elenst Elena Stepanova made changes -
            Fix Version/s 10.0 [ 16000 ]
            Fix Version/s 10.1 [ 16100 ]
            Affects Version/s 10.0 [ 16000 ]
            Affects Version/s 10.1 [ 16100 ]
            Assignee Sergei Petrunia [ psergey ]
            Labels eits
            ratzpo Rasmus Johansson (Inactive) made changes -
            Sprint 10.1.9-3 [ 20 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Rank Ranked higher
            ratzpo Rasmus Johansson (Inactive) made changes -
            Rank Ranked higher
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.0.23 [ 20401 ]
            Fix Version/s 10.0 [ 16000 ]
            Fix Version/s 10.1 [ 16100 ]
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 72485 ] MariaDB v4 [ 149802 ]

            People

              psergei Sergei Petrunia
              jcrespo Jaime Crespo
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.