[MDEV-8872] Performance regressions with utf8mb4 vs utf8 in WordPress Created: 2015-09-30  Updated: 2023-12-19  Resolved: 2023-09-07

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.0.21-galera
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Gary Pendergast Assignee: Sergei Petrunia
Resolution: Duplicate Votes: 8
Labels: None

Issue Links:
PartOf
is part of MDEV-32113 utf8mb3_key_col=utf8mb4_value cannot ... Closed
Relates
relates to MDEV-16413 test performance of distinct range qu... Closed
relates to MDEV-19123 Change default charset from latin1 to... Open
relates to MDEV-32113 utf8mb3_key_col=utf8mb4_value cannot ... Closed

 Description   

When switching tables from utf8 to utf8mb4, we're seeing fairly severe performance regressions - queries that used to run in microseconds, now taking 20 seconds to run.

We've run some tests with query and index variations, or there's a summary here.

For a direct comparison, see:
utf8: https://travis-ci.org/tollmanz/utf8mb4-query-time/jobs/82864656
utf8mb4: https://travis-ci.org/tollmanz/utf8mb4-query-time/jobs/82864660

Here's the WordPress tracking ticket. Comment #2 notes that reducing the meta_key column length to 191 (to match the length of the prefix index) fixes the performance issue, but the utf8 character set doesn't have the same problem.

The same behaviour occurs in MySQL, as well.



 Comments   
Comment by Ian Gilfillan [ 2016-12-19 ]

Debian defaults to utf8mb4 so perhaps this should get an increase in priority.

Comment by David Heidelberg [ 2017-06-30 ]

Would be nice, since Nextcloud 12.0 with News plugin ask to switch DB from utf8 to utf8mb4 .

Is any progress with this issue done?

Comment by Otto Kekäläinen [ 2020-06-25 ]

Note that the reporter of this issue Gary Pendergast is one of the core contributors of WordPress.

Comment by Alexander Barkov [ 2020-06-26 ]

This issue is not relevant to utf8 vs utf8mb4 performance. The problem happened because after switch from utf8 to utf8mb4 the non-prefix key on meta_key become a prefix key:

The table before the change:

CREATE TABLE `wp_postmeta` (
  `meta_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `post_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `meta_key` varchar(255) DEFAULT NULL,
  `meta_value` longtext,
  PRIMARY KEY (`meta_id`),
  KEY `post_id` (`post_id`),
  KEY `meta_key` (`meta_key`)
) ENGINE=InnoDB AUTO_INCREMENT=212732319 DEFAULT CHARSET=utf8;

The table after the change:

CREATE TABLE `wp_postmeta` (
  `meta_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `post_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `meta_key` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `meta_value` longtext COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`meta_id`),
  KEY `post_id` (`post_id`),
  KEY `meta_key` (`meta_key`(191))
) ENGINE=InnoDB AUTO_INCREMENT=212732257 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Notice, the index:

  KEY `meta_key` (`meta_key`)

became:

  KEY `meta_key` (`meta_key`(191))

The queries that were used in the performance tests:

SELECT meta_key
FROM wp_postmeta
GROUP BY meta_key
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30;

SELECT DISTINCT meta_key
FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30;

Prefix indexes cannot be used for GROUP BY and for ORDER BY because they are not covering. This is by design.

This simplified test demonstrates the issue:

CREATE OR REPLACE TABLE t1 (a VARCHAR(10), KEY(a));
INSERT INTO t1 VALUES ('a'),('b'),('c'),('d');
EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10;

+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|    1 | SIMPLE      | t1    | range | NULL          | a    | 13      | NULL | 5    | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+

CREATE OR REPLACE TABLE t1 (a VARCHAR(10), KEY(a(3)));
INSERT INTO t1 VALUES ('a'),('b'),('c'),('d');
EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10;

+------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 4    | Using temporary; Using filesort |
+------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+

In case of the table with a prefix key, the query cannot use indexes.

This is expected behaviour.

The WordPress team correctly detected the source of the problem, according to this post:

https://core.trac.wordpress.org/ticket/33885#comment:2

Comment by Alexander Barkov [ 2020-06-26 ]

otto, thanks for your comment. This issue should have been closed as not a bug earlier. Closing now.

Comment by Michael Widenius [ 2023-09-07 ]

This issue is superseded by MDEV-32113

Generated at Thu Feb 08 07:30:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.