Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8872

Performance regressions with utf8mb4 vs utf8 in WordPress

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.0.21-galera
    • N/A
    • Character Sets
    • None

    Description

      When switching tables from utf8 to utf8mb4, we're seeing fairly severe performance regressions - queries that used to run in microseconds, now taking 20 seconds to run.

      We've run some tests with query and index variations, or there's a summary here.

      For a direct comparison, see:
      utf8: https://travis-ci.org/tollmanz/utf8mb4-query-time/jobs/82864656
      utf8mb4: https://travis-ci.org/tollmanz/utf8mb4-query-time/jobs/82864660

      Here's the WordPress tracking ticket. Comment #2 notes that reducing the meta_key column length to 191 (to match the length of the prefix index) fixes the performance issue, but the utf8 character set doesn't have the same problem.

      The same behaviour occurs in MySQL, as well.

      Attachments

        Issue Links

          Activity

            pento Gary Pendergast created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            Assignee Alexander Barkov [ bar ]
            greenman Ian Gilfillan added a comment -

            Debian defaults to utf8mb4 so perhaps this should get an increase in priority.

            greenman Ian Gilfillan added a comment - Debian defaults to utf8mb4 so perhaps this should get an increase in priority.

            Would be nice, since Nextcloud 12.0 with News plugin ask to switch DB from utf8 to utf8mb4 .

            Is any progress with this issue done?

            okias David Heidelberg added a comment - Would be nice, since Nextcloud 12.0 with News plugin ask to switch DB from utf8 to utf8mb4 . Is any progress with this issue done?
            elenst Elena Stepanova made changes -
            Fix Version/s 10.0 [ 16000 ]
            bar Alexander Barkov made changes -

            Note that the reporter of this issue Gary Pendergast is one of the core contributors of WordPress.

            otto Otto Kekäläinen added a comment - Note that the reporter of this issue Gary Pendergast is one of the core contributors of WordPress.
            otto Otto Kekäläinen made changes -

            This issue is not relevant to utf8 vs utf8mb4 performance. The problem happened because after switch from utf8 to utf8mb4 the non-prefix key on meta_key become a prefix key:

            The table before the change:

            CREATE TABLE `wp_postmeta` (
              `meta_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
              `post_id` bigint(20) unsigned NOT NULL DEFAULT '0',
              `meta_key` varchar(255) DEFAULT NULL,
              `meta_value` longtext,
              PRIMARY KEY (`meta_id`),
              KEY `post_id` (`post_id`),
              KEY `meta_key` (`meta_key`)
            ) ENGINE=InnoDB AUTO_INCREMENT=212732319 DEFAULT CHARSET=utf8;
            

            The table after the change:

            CREATE TABLE `wp_postmeta` (
              `meta_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
              `post_id` bigint(20) unsigned NOT NULL DEFAULT '0',
              `meta_key` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
              `meta_value` longtext COLLATE utf8mb4_unicode_ci,
              PRIMARY KEY (`meta_id`),
              KEY `post_id` (`post_id`),
              KEY `meta_key` (`meta_key`(191))
            ) ENGINE=InnoDB AUTO_INCREMENT=212732257 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
            

            Notice, the index:

              KEY `meta_key` (`meta_key`)
            

            became:

              KEY `meta_key` (`meta_key`(191))
            

            The queries that were used in the performance tests:

            SELECT meta_key
            FROM wp_postmeta
            GROUP BY meta_key
            HAVING meta_key NOT LIKE '\_%'
            ORDER BY meta_key
            LIMIT 30;
            

            SELECT DISTINCT meta_key
            FROM wp_postmeta
            WHERE meta_key NOT BETWEEN '_' AND '_z'
            HAVING meta_key NOT LIKE '\_%'
            ORDER BY meta_key
            LIMIT 30;
            

            Prefix indexes cannot be used for GROUP BY and for ORDER BY because they are not covering. This is by design.

            This simplified test demonstrates the issue:

            CREATE OR REPLACE TABLE t1 (a VARCHAR(10), KEY(a));
            INSERT INTO t1 VALUES ('a'),('b'),('c'),('d');
            EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10;
            

            +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
            | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
            +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
            |    1 | SIMPLE      | t1    | range | NULL          | a    | 13      | NULL | 5    | Using index for group-by |
            +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
            

            CREATE OR REPLACE TABLE t1 (a VARCHAR(10), KEY(a(3)));
            INSERT INTO t1 VALUES ('a'),('b'),('c'),('d');
            EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10;
            

            +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
            +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
            |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 4    | Using temporary; Using filesort |
            +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
            

            In case of the table with a prefix key, the query cannot use indexes.

            This is expected behaviour.

            The WordPress team correctly detected the source of the problem, according to this post:

            https://core.trac.wordpress.org/ticket/33885#comment:2

            bar Alexander Barkov added a comment - This issue is not relevant to utf8 vs utf8mb4 performance. The problem happened because after switch from utf8 to utf8mb4 the non-prefix key on meta_key become a prefix key: The table before the change: CREATE TABLE `wp_postmeta` ( `meta_id` bigint (20) unsigned NOT NULL AUTO_INCREMENT, `post_id` bigint (20) unsigned NOT NULL DEFAULT '0' , `meta_key` varchar (255) DEFAULT NULL , `meta_value` longtext, PRIMARY KEY (`meta_id`), KEY `post_id` (`post_id`), KEY `meta_key` (`meta_key`) ) ENGINE=InnoDB AUTO_INCREMENT=212732319 DEFAULT CHARSET=utf8; The table after the change: CREATE TABLE `wp_postmeta` ( `meta_id` bigint (20) unsigned NOT NULL AUTO_INCREMENT, `post_id` bigint (20) unsigned NOT NULL DEFAULT '0' , `meta_key` varchar (255) COLLATE utf8mb4_unicode_ci DEFAULT NULL , `meta_value` longtext COLLATE utf8mb4_unicode_ci, PRIMARY KEY (`meta_id`), KEY `post_id` (`post_id`), KEY `meta_key` (`meta_key`(191)) ) ENGINE=InnoDB AUTO_INCREMENT=212732257 DEFAULT CHARSET=utf8mb4 COLLATE =utf8mb4_unicode_ci; Notice, the index: KEY `meta_key` (`meta_key`) became: KEY `meta_key` (`meta_key`(191)) The queries that were used in the performance tests: SELECT meta_key FROM wp_postmeta GROUP BY meta_key HAVING meta_key NOT LIKE '\_%' ORDER BY meta_key LIMIT 30; SELECT DISTINCT meta_key FROM wp_postmeta WHERE meta_key NOT BETWEEN '_' AND '_z' HAVING meta_key NOT LIKE '\_%' ORDER BY meta_key LIMIT 30; Prefix indexes cannot be used for GROUP BY and for ORDER BY because they are not covering. This is by design. This simplified test demonstrates the issue: CREATE OR REPLACE TABLE t1 (a VARCHAR (10), KEY (a)); INSERT INTO t1 VALUES ( 'a' ),( 'b' ),( 'c' ),( 'd' ); EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10; +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | 1 | SIMPLE | t1 | range | NULL | a | 13 | NULL | 5 | Using index for group-by | +------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ CREATE OR REPLACE TABLE t1 (a VARCHAR (10), KEY (a(3))); INSERT INTO t1 VALUES ( 'a' ),( 'b' ),( 'c' ),( 'd' ); EXPLAIN SELECT * FROM t1 GROUP BY a ORDER BY a LIMIT 10; +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort | +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ In case of the table with a prefix key, the query cannot use indexes. This is expected behaviour. The WordPress team correctly detected the source of the problem, according to this post: https://core.trac.wordpress.org/ticket/33885#comment:2
            bar Alexander Barkov made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.0 [ 16000 ]
            Resolution Not a Bug [ 6 ]
            Status Open [ 1 ] Closed [ 6 ]

            otto, thanks for your comment. This issue should have been closed as not a bug earlier. Closing now.

            bar Alexander Barkov added a comment - otto , thanks for your comment. This issue should have been closed as not a bug earlier. Closing now.
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 71842 ] MariaDB v4 [ 149667 ]
            monty Michael Widenius made changes -
            monty Michael Widenius made changes -
            psergei Sergei Petrunia made changes -
            monty Michael Widenius made changes -
            Assignee Alexander Barkov [ bar ] Sergei Petrunia [ psergey ]
            Resolution Not a Bug [ 6 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            monty Michael Widenius made changes -
            Comment [ The usage of comparisons of indexed column with another column from a different character set is becoming more and more frequent and we should fix this in the cases it is possible.

            The current case that is causing performance issues (index can not be used) is

            WHERE utf8bm3_index=utf8mb4_column
            as utft8bm3 cannot handle all utf8mb4 characters.
            Note that WHERE utf8bm4_index=utf8mb3_column works.

            The idea is to do allow comparison of indexed column to other column with = for different character sets as long as the collation is not making it impossible (for example having an index with a case-sensitive collation comparing to a column that is not case sensitive).
            If the character set conversion from compared column to indexed_column fails because a non existing character, then we will assume that there is no matching row in the table.


            ]
            psergei Sergei Petrunia made changes -
            Comment [ Further work on the issue from the above comment is done in MDEV-32113. ]

            This issue is superseded by MDEV-32113

            monty Michael Widenius added a comment - This issue is superseded by MDEV-32113
            monty Michael Widenius made changes -
            Resolution Duplicate [ 3 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            monty Michael Widenius made changes -
            julien.fritsch Julien Fritsch made changes -

            People

              psergei Sergei Petrunia
              pento Gary Pendergast
              Votes:
              8 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.