Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9097

Engine independent statistics only allows utf8 values, making query plans fail

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.0.16, 10.0.22, 10.0(EOL), 10.1(EOL)
    • 10.0.23
    • Optimizer
    • 10.1.9-3

    Description

      When using use_stat_tables = PREFERABLY, and then run ANALYZE TABLE on a table with a an indexed column containing UTF-8 characters (but not mysql's 'utf8'), some query plans can be altered, sometimes with horrible performance consequences.

      What it is worse, EXPLAIN does not show the new query plan from broken statistics, but it shows the original query plan, while we can check after execution than we have a very different execution plan not only on the query latency but on the Handler statistics.

      How to reproduce:

      Error formatting macro: code: java.lang.StackOverflowError

      CREATE DATABASE zhwiki;
      use zhwiki;
      CREATE TABLE `recentchanges` (
      `rc_id` int(8) NOT NULL AUTO_INCREMENT,
      `rc_timestamp` varbinary(14) NOT NULL DEFAULT '',
      `rc_cur_time` varbinary(14) NOT NULL DEFAULT '',
      `rc_user` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_user_text` varbinary(255) NOT NULL DEFAULT '',
      `rc_namespace` int(11) NOT NULL DEFAULT '0',
      `rc_title` varbinary(255) NOT NULL DEFAULT '',
      `rc_comment` varbinary(255) NOT NULL DEFAULT '',
      `rc_minor` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_bot` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_new` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_cur_id` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_this_oldid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_last_oldid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_type` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_source` varbinary(16) NOT NULL DEFAULT '',
      `rc_moved_to_ns` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_moved_to_title` varbinary(255) NOT NULL DEFAULT '',
      `rc_patrolled` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `rc_ip` varbinary(40) NOT NULL DEFAULT '',
      `rc_old_len` int(10) DEFAULT NULL,
      `rc_new_len` int(10) DEFAULT NULL,
      `rc_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
      `rc_logid` int(10) unsigned NOT NULL DEFAULT '0',
      `rc_log_type` varbinary(255) DEFAULT NULL,
      `rc_log_action` varbinary(255) DEFAULT NULL,
      `rc_params` blob NOT NULL,
      PRIMARY KEY (`rc_id`),
      KEY `rc_timestamp` (`rc_timestamp`),
      KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
      KEY `rc_cur_id` (`rc_cur_id`),
      KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
      KEY `rc_ip` (`rc_ip`),
      KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
      KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`)
      ) ENGINE=InnoDB AUTO_INCREMENT=52457042 DEFAULT CHARSET=binary

      – Note the charset binary, this allows any character encoding

      INSERT INTO `recentchanges` VALUES (51267205,'20151023001221','',0,'Reinheitsgebot',0,'��鯰科','添加语言链接:[[:specieswiki:Pangasiidae]]',1,1,0,566064,36471314,36471314,5,'wb',0,'',1,'',1254,1254,0,0,NULL,'','a:3:{s:20:\"wikibase-repo-change\";a:14:

      {s:2:\"id\";i:259555851;s:4:\"type\";s:20:\"wikibase-item~update\";s:4:\"time\";s:14:\"20151023001221\";s:9:\"object_id\";s:7:\"q901080\";s:7:\"user_id\";s:6:\"940976\";s:11:\"revision_id\";s:9:\"261218043\";s:11:\"entity_type\";s:4:\"item\";s:9:\"user_text\";s:14:\"Reinheitsgebot\";s:7:\"page_id\";i:851866;s:9:\"parent_id\";i:255341522;s:7:\"comment\";s:49:\"/* wbsetsitelink-add:1|specieswiki */ Pangasiidae\";s:6:\"rev_id\";i:261218043;s:3:\"bot\";i:1;s:7:\"site_id\";s:12:\"wikidatawiki\";}

      s:7:\"comment\";s:49:\"添加语言链接:[[:specieswiki:Pangasiidae]]\";s:12:\"comment-html\";s:123:\"添加语言链接:<a class=\"external\" href=\"//www.wikidata.org/wiki/Specieswiki:Pangasiidae\">specieswiki:Pangasiidae</a>\";}');

      SET GLOBAL use_stat_tables=PREFERABLY;

      – reload connection for change to take effect to your current session

      MariaDB [zhwiki]> ANALYZE TABLE recentchanges;
      --------------------------------------------------------------------------------------------------------------------------+

      Table Op Msg_type Msg_text

      --------------------------------------------------------------------------------------------------------------------------+

      zhwiki.recentchanges analyze status Engine-independent statistics collected
      zhwiki.recentchanges analyze Warning Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'min_value' at row 1
      zhwiki.recentchanges analyze Warning Incorrect string value: '\xF0\xA9\xB7\xB6\xE9\xAF...' for column 'max_value' at row 1
      zhwiki.recentchanges analyze status OK

      --------------------------------------------------------------------------------------------------------------------------+

      We can check that effectively, the tables have the wrong encoding:

      CREATE TABLE `column_stats` (
      `db_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `column_name` varchar(64) COLLATE utf8_bin NOT NULL,
      `min_value` varchar(255) COLLATE utf8_bin DEFAULT NULL,
      `max_value` varchar(255) COLLATE utf8_bin DEFAULT NULL,
      `nulls_ratio` decimal(12,4) DEFAULT NULL,
      `avg_length` decimal(12,4) DEFAULT NULL,
      `avg_frequency` decimal(12,4) DEFAULT NULL,
      `hist_size` tinyint(3) unsigned DEFAULT NULL,
      `hist_type` enum('SINGLE_PREC_HB','DOUBLE_PREC_HB') COLLATE utf8_bin DEFAULT NULL,
      `histogram` varbinary(255) DEFAULT NULL,
      PRIMARY KEY (`db_name`,`table_name`,`column_name`)
      ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Statistics on Columns'

      With 1.4 millons rows, a simple query like:

      FLUSH STATUS;
      SELECT /* ApiQueryRecentChanges::run SANITIZED */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,rc_user,rc_user_text,ts_tags,page_is_redirect FROM `recentchanges` LEFT JOIN `tag_summary` ON ((rc_id=ts_rc_id)) LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = '0' AND rc_type = '1' AND (rc_patrolled = 0) AND (page_is_redirect = 0 OR page_is_redirect IS NULL) AND rc_new=1 ORDER BY rc_timestamp DESC,rc_id DESC LIMIT 51;
      SHOW STATUS like 'Hand%';

      goes from reading 3000 rows to read the full table (full index scan) using the timestamp index, 1.4 millon rows.

      As a workaround, disable use_stat_tables.

      3 things went wrong here:

      • most value-related columns from the stats tables should be utf8mb4, probably binary, as binary columns can be indexed too
      • Execute updates of use_stat_tables transactionally. Those are not warnings, they are errors and should rollback all updates to those tables
      • EXPLAIN should show the real query plan (is it ignoring use_stat_tables?). When stats are broken, rows: 51 is shown, when in reality 1.4 million rows are scanned.

      Attachments

        Issue Links

          Activity

            jcrespo

            I am hot-patching the mysql table, converting it to binary (I know this may have problems on mysql_upgrade,

            I think there won't be any problems with mysql_upgrade (the table will just stay the same).

            I take the table columns are using VARBINARY data type?

            The patch consists of 1. Modifications to mysql.column_stats table and 2. Changes in the source code. I have debugged to see what happens when one applies #1 but not #2. utf8-based charset and VARBINARY columns in user tables are ok.
            If you use a charset that requires non-trivial conversion (I tried ucs2) , then you need to BOTH apply the patch and modify the mysql.column_stats table .

            I can confirm that the EXPLAIN giving different query plans than the actual query is unrelated to the engine_independent statistics and a different issue.

            Ok. The set of such queries is getting smaller over time, but some of them are still there.

            psergei Sergei Petrunia added a comment - jcrespo I am hot-patching the mysql table, converting it to binary (I know this may have problems on mysql_upgrade, I think there won't be any problems with mysql_upgrade (the table will just stay the same). I take the table columns are using VARBINARY data type? The patch consists of 1. Modifications to mysql.column_stats table and 2. Changes in the source code. I have debugged to see what happens when one applies #1 but not #2. utf8-based charset and VARBINARY columns in user tables are ok. If you use a charset that requires non-trivial conversion (I tried ucs2) , then you need to BOTH apply the patch and modify the mysql.column_stats table . I can confirm that the EXPLAIN giving different query plans than the actual query is unrelated to the engine_independent statistics and a different issue. Ok. The set of such queries is getting smaller over time, but some of them are still there.
            jcrespo Jaime Crespo added a comment -

            I have applied the schema change, it seems to work well. We use only binary fields for text due to legacy reasons. All encoding is handled at application side.

            I will either upgrade to a version patched or patch it myself. I am now more concerned about the full index scan on the query plan, but I need more testing. I will try to solve it on application code rather than a change in the database server.

            jcrespo Jaime Crespo added a comment - I have applied the schema change, it seems to work well. We use only binary fields for text due to legacy reasons. All encoding is handled at application side. I will either upgrade to a version patched or patch it myself. I am now more concerned about the full index scan on the query plan, but I need more testing. I will try to solve it on application code rather than a change in the database server.
            danblack Daniel Black added a comment -

            I can't see what utf8 charset can't be converted to VARBINARY however I'm happy with the change for the gains of all the non-utf8 varchar columns I can't use.

            I am now more concerned about the full index scan on the query plan, but I need more testing. I will try to solve it on application code rather than a change in the database server.

            You can insert into the stats table directly without using analyze tables which means you shouldn't need to patch it manually.

            The charset aspects of https://mariadb.atlassian.net/secure/attachment/36708/mdev-7383-statistics-to-handle-as-my_charset_bin.patch relate to the storage by ANALYZE TABLE rather than any aspect to do with retrieval.

            We could re-run ANALYZE TABLE for anything, but this could be a long process.

            Eek, too long/invasive.

            My opinion is that a mention in the changelog would sufficient.

            Agree.

            .bq (I know this may have problems on mysql_upgrade...

            jcrespo, this shouldn't be the case. The alter table is idempotent.

            jcrespo agree on skipping column stats for blob/text columns in the last bit of discussion in MDEV-7383? If so happy to hear your reasons on MDEV-9118.

            So solved as per MDEV-7383 - can be closed now?

            danblack Daniel Black added a comment - I can't see what utf8 charset can't be converted to VARBINARY however I'm happy with the change for the gains of all the non-utf8 varchar columns I can't use. I am now more concerned about the full index scan on the query plan, but I need more testing. I will try to solve it on application code rather than a change in the database server. You can insert into the stats table directly without using analyze tables which means you shouldn't need to patch it manually. The charset aspects of https://mariadb.atlassian.net/secure/attachment/36708/mdev-7383-statistics-to-handle-as-my_charset_bin.patch relate to the storage by ANALYZE TABLE rather than any aspect to do with retrieval. We could re-run ANALYZE TABLE for anything, but this could be a long process. Eek, too long/invasive. My opinion is that a mention in the changelog would sufficient. Agree. .bq (I know this may have problems on mysql_upgrade... jcrespo , this shouldn't be the case. The alter table is idempotent. jcrespo agree on skipping column stats for blob/text columns in the last bit of discussion in MDEV-7383 ? If so happy to hear your reasons on MDEV-9118 . So solved as per MDEV-7383 - can be closed now?
            jcrespo Jaime Crespo added a comment -

            Yes, I will report any issue related to EXPLAIN separately, if any.

            jcrespo Jaime Crespo added a comment - Yes, I will report any issue related to EXPLAIN separately, if any.

            Yes. Closed by fix for MDEV-7383.

            psergei Sergei Petrunia added a comment - Yes. Closed by fix for MDEV-7383 .

            People

              psergei Sergei Petrunia
              jcrespo Jaime Crespo
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.