Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9097

Engine independent statistics only allows utf8 values, making query plans fail

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 10.0.16, 10.0.22, 10.0, 10.1
    • Fix Version/s: 10.0.23
    • Component/s: Optimizer
    • Labels:
    • Sprint:
      10.1.9-3

      Description

      When using use_stat_tables = PREFERABLY, and then run ANALYZE TABLE on a table with a an indexed column containing UTF-8 characters (but not mysql's 'utf8'), some query plans can be altered, sometimes with horrible performance consequences.

      What it is worse, EXPLAIN does not show the new query plan from broken statistics, but it shows the original query plan, while we can check after execution than we have a very different execution plan not only on the query latency but on the Handler statistics.

      How to reproduce:

      Error formatting macro: code: java.lang.StackOverflowError

      CREATE DATABASE zhwiki;
      use zhwiki;
      CREATE TABLE `recentchanges` (
      `rc_id` int(8) NOT NULL AUTO_INCREMENT,
      `rc_timestamp` varbinary(14) NOT NULL DEFAULT '',
      `rc_cur_time` varbinary(14) NOT NULL DEFAULT '',
      `rc_user` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_user_text` varbinary(255) NOT NULL DEFAULT '',
      `rc_namespace` int(11) NOT NULL DEFAULT '0',
      `rc_title` varbinary(255) NOT NULL DEFAULT '',
      `rc_comment` varbinary(255) NOT NULL DEFAULT '',
      `rc_minor` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_bot` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_new` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_cur_id` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_this_oldid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_last_oldid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_type` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_source` varbinary(16) NOT NULL DEFAULT '',
      `rc_moved_to_ns` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_moved_to_title` varbinary(255) NOT NULL DEFAULT '',
      `rc_patrolled` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_ip` varbinary(40) NOT NULL DEFAULT '',
      `rc_old_len` int(10) DEFAULT NULL,
      `rc_new_len` int(10) DEFAULT NULL,
      `rc_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
      `rc_logid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_log_type` varbinary(255) DEFAULT NULL,
      `rc_log_action` varbinary(255) DEFAULT NULL,
      `rc_params` blob NOT NULL,
      PRIMARY KEY (`rc_id`),
      KEY `rc_timestamp` (`rc_timestamp`),
      KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
      KEY `rc_cur_id` (`rc_cur_id`),
      KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
      KEY `rc_ip` (`rc_ip`),
      KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
      KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`)
      ) ENGINE=InnoDB AUTO_INCREMENT=52457042 DEFAULT CHARSET=binary

      – Note the charset binary, this allows any character encoding

      INSERT INTO `recentchanges` VALUES (51267205,'20151023001221','',0,'Reinheitsgebot',0,'��鯰科','添加语言链接:[[:specieswiki:Pangasiidae]]',1,1,0,566064,36471314,36471314,5,'wb',0,'',1,'',1254,1254,0,0,NULL,'','a:3:{s:20:\"wikibase-repo-change\";a:14:

      {s:2:\"id\";i:259555851;s:4:\"type\";s:20:\"wikibase-item~update\";s:4:\"time\";s:14:\"20151023001221\";s:9:\"object_id\";s:7:\"q901080\";s:7:\"user_id\";s:6:\"940976\";s:11:\"revision_id\";s:9:\"261218043\";s:11:\"entity_type\";s:4:\"item\";s:9:\"user_text\";s:14:\"Reinheitsgebot\";s:7:\"page_id\";i:851866;s:9:\"parent_id\";i:255341522;s:7:\"comment\";s:49:\"/* wbsetsitelink-add:1|specieswiki */ Pangasiidae\";s:6:\"rev_id\";i:261218043;s:3:\"bot\";i:1;s:7:\"site_id\";s:12:\"wikidatawiki\";}

      s:7:\"comment\";s:49:\"添加语言链接:[[:specieswiki:Pangasiidae]]\";s:12:\"comment-html\";s:123:\"添加语言链接:<a class=\"external\" href=\"//www.wikidata.org/wiki/Specieswiki:Pangasiidae\">specieswiki:Pangasiidae</a>\";}');

      SET GLOBAL use_stat_tables=PREFERABLY;

      – reload connection for change to take effect to your current session

      MariaDB [zhwiki]> ANALYZE TABLE recentchanges;
      --------------------------------------------------------------------------------------------------------------------------+

      Table Op Msg_type Msg_text

      --------------------------------------------------------------------------------------------------------------------------+

      zhwiki.recentchanges analyze status Engine-independent statistics collected
      zhwiki.recentchanges analyze Warning Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'min_value' at row 1
      zhwiki.recentchanges analyze Warning Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'max_value' at row 1
      zhwiki.recentchanges analyze status OK

      --------------------------------------------------------------------------------------------------------------------------+

      We can check that effectively, the tables have the wrong encoding:

      CREATE TABLE `column_stats` (
      `db_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `column_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `min_value` varchar(255) COLLATE utf8_bin DEFAULT NULL,
      `max_value` varchar(255) COLLATE utf8_bin DEFAULT NULL,
      `nulls_ratio` decimal(12,4) DEFAULT NULL,
      `avg_length` decimal(12,4) DEFAULT NULL,
      `avg_frequency` decimal(12,4) DEFAULT NULL,
      `hist_size` tinyint(3) unsigned DEFAULT NULL,
      `hist_type` enum('SINGLE_PREC_HB','DOUBLE_PREC_HB') COLLATE utf8_bin DEFAULT NULL,
      `histogram` varbinary(255) DEFAULT NULL,
      PRIMARY KEY (`db_name`,`table_name`,`column_name`)
      ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Columns'

      With 1.4 millons rows, a simple query like:

      FLUSH STATUS;
      SELECT /* ApiQueryRecentChanges::run SANITIZED */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,rc_user,rc_user_text,ts_tags,page_is_redirect FROM `recentchanges` LEFT JOIN `tag_summary` ON ((rc_id=ts_rc_id)) LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = '0' AND rc_type = '1' AND (rc_patrolled = 0) AND (page_is_redirect = 0 OR page_is_redirect IS NULL) AND rc_new=1 ORDER BY rc_timestamp DESC,rc_id DESC LIMIT 51;
      SHOW STATUS like 'Hand%';

      goes from reading 3000 rows to read the full table (full index scan) using the timestamp index, 1.4 millon rows.

      As a workaround, disable use_stat_tables.

      3 things went wrong here:

      • most value-related columns from the stats tables should be utf8mb4, probably binary, as binary columns can be indexed too
      • Execute updates of use_stat_tables transactionally. Those are not warnings, they are errors and should rollback all updates to those tables
      • EXPLAIN should show the real query plan (is it ignoring use_stat_tables?). When stats are broken, rows: 51 is shown, when in reality 1.4 million rows are scanned.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              psergei Sergei Petrunia
              Reporter:
              jcrespo Jaime Crespo
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration