Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8872

Performance regressions with utf8mb4 vs utf8 in WordPress

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.0.21-galera
    • N/A
    • Character Sets
    • None

    Description

      When switching tables from utf8 to utf8mb4, we're seeing fairly severe performance regressions - queries that used to run in microseconds, now taking 20 seconds to run.

      We've run some tests with query and index variations, or there's a summary here.

      For a direct comparison, see:
      utf8: https://travis-ci.org/tollmanz/utf8mb4-query-time/jobs/82864656
      utf8mb4: https://travis-ci.org/tollmanz/utf8mb4-query-time/jobs/82864660

      Here's the WordPress tracking ticket. Comment #2 notes that reducing the meta_key column length to 191 (to match the length of the prefix index) fixes the performance issue, but the utf8 character set doesn't have the same problem.

      The same behaviour occurs in MySQL, as well.

      Attachments

        Issue Links

          Activity

            greenman Ian Gilfillan added a comment -

            Debian defaults to utf8mb4 so perhaps this should get an increase in priority.

            greenman Ian Gilfillan added a comment - Debian defaults to utf8mb4 so perhaps this should get an increase in priority.

            Would be nice, since Nextcloud 12.0 with News plugin ask to switch DB from utf8 to utf8mb4 .

            Is any progress with this issue done?

            okias David Heidelberg added a comment - Would be nice, since Nextcloud 12.0 with News plugin ask to switch DB from utf8 to utf8mb4 . Is any progress with this issue done?

            Note that the reporter of this issue Gary Pendergast is one of the core contributors of WordPress.

            otto Otto Kekäläinen added a comment - Note that the reporter of this issue Gary Pendergast is one of the core contributors of WordPress.

            This issue is not relevant to utf8 vs utf8mb4 performance. The problem happened because after switch from utf8 to utf8mb4 the non-prefix key on meta_key become a prefix key:

            The table before the change:

            CREATE TABLE `wp_postmeta` (
              `meta_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
              `post_id` bigint(20) unsigned NOT NULL DEFAULT '0',
              `meta_key` varchar(255) DEFAULT NULL,
              `meta_value` longtext,
              PRIMARY KEY (`meta_id`),
              KEY `post_id` (`post_id`),
              KEY `meta_key` (`meta_key`)
            ) ENGINE=InnoDB AUTO_INCREMENT=212732319 DEFAULT CHARSET=utf8;
            

            The table after the change:

            CREATE TABLE `wp_postmeta` (
              `meta_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
              `post_id` bigint(20) unsigned NOT NULL DEFAULT '0',
              `meta_key` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
              `meta_value` longtext COLLATE utf8mb4_unicode_ci,
              PRIMARY KEY (`meta_id`),
              KEY `post_id` (`post_id`),
              KEY `meta_key` (`meta_key`(191))
            ) ENGINE=InnoDB AUTO_INCREMENT=212732257 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
            

            Notice, the index:

              KEY `meta_key` (`meta_key`)
            

            became:

              KEY `meta_key` (`meta_key`(191))
            

            The queries that were used in the performance tests:

            SELECT meta_key
            FROM wp_postmeta
            GROUP BY meta_key
            HAVING meta_key NOT LIKE '\_%'
            ORDER BY meta_key
            LIMIT 30;
            

            SELECT DISTINCT meta_key
            FROM wp_postmeta
            WHERE meta_key NOT BETWEEN '_' AND '_z'
            HAVING meta_key NOT LIKE '\_%'
            ORDER BY meta_key
            LIMIT 30;
            

            Prefix indexes cannot be used for GROUP BY and for ORDER BY because they are not covering. This is by design.

            This simplified test demonstrates the issue:

            CREATE OR REPLACE TABLE t1 (a VARCHAR(10), KEY(a));
            INSERT INTO t1 VALUES ('a'),('b'),('c'),('d');
            EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10;
            

            +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
            | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
            +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
            |    1 | SIMPLE      | t1    | range | NULL          | a    | 13      | NULL | 5    | Using index for group-by |
            +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
            

            CREATE OR REPLACE TABLE t1 (a VARCHAR(10), KEY(a(3)));
            INSERT INTO t1 VALUES ('a'),('b'),('c'),('d');
            EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10;
            

            +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
            +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
            |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 4    | Using temporary; Using filesort |
            +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
            

            In case of the table with a prefix key, the query cannot use indexes.

            This is expected behaviour.

            The WordPress team correctly detected the source of the problem, according to this post:

            https://core.trac.wordpress.org/ticket/33885#comment:2

            bar Alexander Barkov added a comment - This issue is not relevant to utf8 vs utf8mb4 performance. The problem happened because after switch from utf8 to utf8mb4 the non-prefix key on meta_key become a prefix key: The table before the change: CREATE TABLE `wp_postmeta` ( `meta_id` bigint (20) unsigned NOT NULL AUTO_INCREMENT, `post_id` bigint (20) unsigned NOT NULL DEFAULT '0' , `meta_key` varchar (255) DEFAULT NULL , `meta_value` longtext, PRIMARY KEY (`meta_id`), KEY `post_id` (`post_id`), KEY `meta_key` (`meta_key`) ) ENGINE=InnoDB AUTO_INCREMENT=212732319 DEFAULT CHARSET=utf8; The table after the change: CREATE TABLE `wp_postmeta` ( `meta_id` bigint (20) unsigned NOT NULL AUTO_INCREMENT, `post_id` bigint (20) unsigned NOT NULL DEFAULT '0' , `meta_key` varchar (255) COLLATE utf8mb4_unicode_ci DEFAULT NULL , `meta_value` longtext COLLATE utf8mb4_unicode_ci, PRIMARY KEY (`meta_id`), KEY `post_id` (`post_id`), KEY `meta_key` (`meta_key`(191)) ) ENGINE=InnoDB AUTO_INCREMENT=212732257 DEFAULT CHARSET=utf8mb4 COLLATE =utf8mb4_unicode_ci; Notice, the index: KEY `meta_key` (`meta_key`) became: KEY `meta_key` (`meta_key`(191)) The queries that were used in the performance tests: SELECT meta_key FROM wp_postmeta GROUP BY meta_key HAVING meta_key NOT LIKE '\_%' ORDER BY meta_key LIMIT 30; SELECT DISTINCT meta_key FROM wp_postmeta WHERE meta_key NOT BETWEEN '_' AND '_z' HAVING meta_key NOT LIKE '\_%' ORDER BY meta_key LIMIT 30; Prefix indexes cannot be used for GROUP BY and for ORDER BY because they are not covering. This is by design. This simplified test demonstrates the issue: CREATE OR REPLACE TABLE t1 (a VARCHAR (10), KEY (a)); INSERT INTO t1 VALUES ( 'a' ),( 'b' ),( 'c' ),( 'd' ); EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10; +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | 1 | SIMPLE | t1 | range | NULL | a | 13 | NULL | 5 | Using index for group-by | +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ CREATE OR REPLACE TABLE t1 (a VARCHAR (10), KEY (a(3))); INSERT INTO t1 VALUES ( 'a' ),( 'b' ),( 'c' ),( 'd' ); EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10; +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort | +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ In case of the table with a prefix key, the query cannot use indexes. This is expected behaviour. The WordPress team correctly detected the source of the problem, according to this post: https://core.trac.wordpress.org/ticket/33885#comment:2

            otto, thanks for your comment. This issue should have been closed as not a bug earlier. Closing now.

            bar Alexander Barkov added a comment - otto , thanks for your comment. This issue should have been closed as not a bug earlier. Closing now.

            This issue is superseded by MDEV-32113

            monty Michael Widenius added a comment - This issue is superseded by MDEV-32113

            People

              psergei Sergei Petrunia
              pento Gary Pendergast
              Votes:
              8 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.