[MDEV-15072] Massive performance impact after PTI fix - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 10.1.26, 10.0.32
Fix Version/s: N/A
Component/s: Data Manipulation - Subquery, Storage Engine - MyISAM
Labels:
None
Environment:
GNU/Linux Debian 8 Jessie
GNU/Linux Debian 9 stretch

Description

Hello

I have just applied the PTI patch on the OS and it is causing a massive difference in the execution time of a specific query.

Disabling and reenabling PTI enables me to reproduce the difference at will :

PTI disabled : 7s
PTI enabled : 87s

The query :

SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS (SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`);

Explain doesn't help much :

+------+--------------------+-------+------+---------------+------+---------+------+------+-------------+

| id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |

+------+--------------------+-------+------+---------------+------+---------+------+------+-------------+

|    1 | PRIMARY            | b1    | ALL  | NULL          | NULL | NULL    | NULL | 4499 | Using where |

|    2 | DEPENDENT SUBQUERY | b2    | ALL  | NULL          | NULL | NULL    | NULL | 4499 | Using where |

+------+--------------------+-------+------+---------------+------+---------+------+------+-------------+

Can't see any report matching that kind of difference so far so i'm wondering what's up

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

test.sql.gz
43 kB
2018-02-01 12:22
test.sql.gz
6 kB
2018-01-26 14:30

Issue Links

relates to

MDEV-15163 Massive performance impact after PTI fix 2

Closed

Activity

Ascending order - Click to sort in descending order

Axel Schwenke added a comment - 2018-01-25 19:56

Hi Jean-Marc,

could you please check the execution plan (EXPLAIN) with and without KPTI? Such big differences normally only happen when the plan is changed.

Axel Schwenke added a comment - 2018-01-25 19:56 Hi Jean-Marc, could you please check the execution plan (EXPLAIN) with and without KPTI? Such big differences normally only happen when the plan is changed.

Jean-Marc Messina added a comment - 2018-01-26 14:30

the explain output is exactly the same in both cases.
You can find attached a sample of the table to reproduce the problem. There are less columns than the original table but still show the difference.
test.sql.gz

Jean-Marc Messina added a comment - 2018-01-26 14:30 the explain output is exactly the same in both cases. You can find attached a sample of the table to reproduce the problem. There are less columns than the original table but still show the difference. test.sql.gz

Axel Schwenke added a comment - 2018-01-26 15:03

Hi Jean-Marc,

thanks for that. I can however not reproduce what you are seeing. I'm using the table dump from you and this script:

use test;

source 45019_test.sql

SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24'

AND NOT EXISTS

( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip`

  AND b2.`dateFin` > b1.`dateFin`);

EXPLAIN SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24'

AND NOT EXISTS

( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip`

  AND b2.`dateFin` > b1.`dateFin`)\G

On a system with the latest Ubuntu kernel (4.4.0-111, including KPTI and Spectre 1 patch) I get identical results for MariaDB 10.0.32 and 10.1.26:

...

406 rows in set (0.07 sec)

*************************** 1. row ***************************

           id: 1

  select_type: PRIMARY

        table: b1

         type: ALL

possible_keys: NULL

          key: NULL

      key_len: NULL

          ref: NULL

         rows: 468

        Extra: Using where

*************************** 2. row ***************************

           id: 2

  select_type: DEPENDENT SUBQUERY

        table: b2

         type: ALL

possible_keys: NULL

          key: NULL

      key_len: NULL

          ref: NULL

         rows: 468

        Extra: Using where

I didn't even bother to run this on an unpatched system, as it already is very fast.

Do you have additional Spectre patches enabled? The microcode updates from Intel are reported to have a massive impact, but AFAIK they are not yet rolled out.

I just checked the status of my machine with the tool from here: https://github.com/speed47/spectre-meltdown-checker and it reports that I am protected against Spectre Variant1 and against Meltdown, but not Spectre Variant 2.

Axel Schwenke added a comment - 2018-01-26 15:03 Hi Jean-Marc, thanks for that. I can however not reproduce what you are seeing. I'm using the table dump from you and this script: use test; source 45019_test.sql SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS ( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`); EXPLAIN SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS ( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`)\G On a system with the latest Ubuntu kernel (4.4.0-111, including KPTI and Spectre 1 patch) I get identical results for MariaDB 10.0.32 and 10.1.26: ... 406 rows in set (0.07 sec) *************************** 1. row *************************** id: 1 select_type: PRIMARY table: b1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 468 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: b2 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 468 Extra: Using where I didn't even bother to run this on an unpatched system, as it already is very fast. Do you have additional Spectre patches enabled? The microcode updates from Intel are reported to have a massive impact, but AFAIK they are not yet rolled out. I just checked the status of my machine with the tool from here: https://github.com/speed47/spectre-meltdown-checker and it reports that I am protected against Spectre Variant1 and against Meltdown, but not Spectre Variant 2.

Axel Schwenke added a comment - 2018-01-26 15:15

I tried that now also on a Debian machine. It's Debian Jessie, running kernel 3.16.51-3+deb8u1 (2018-01-08). It is even faster, running the query in 0.02s. The spectre-meltdown-checker reports that this kernel is only protected against Meltdown.

Axel Schwenke added a comment - 2018-01-26 15:15 I tried that now also on a Debian machine. It's Debian Jessie, running kernel 3.16.51-3+deb8u1 (2018-01-08). It is even faster, running the query in 0.02s. The spectre-meltdown-checker reports that this kernel is only protected against Meltdown.

Axel Schwenke added a comment - 2018-01-26 15:22

One more thing. For my tests I had an empty my.cnf, just setting datadir and socket location. In case you have anything unusual in your my.cnf (optimizer switch?) please attach it.

Axel Schwenke added a comment - 2018-01-26 15:22 One more thing. For my tests I had an empty my.cnf, just setting datadir and socket location. In case you have anything unusual in your my.cnf (optimizer switch?) please attach it.

Jean-Marc Messina added a comment - 2018-01-26 15:31

hum, weird. The box is indeed not powerfull at all and the data i sent you incomplete but i still observe massive differences

Here are my parameters :

[mysqld]

user            = mysql

pid-file        = /var/run/mysqld/mysqld.pid

socket          = /var/run/mysqld/mysqld.sock

port            = 3306

basedir         = /usr

datadir         = /var/lib/mysql

tmpdir          = /var/tmp

lc-messages-dir = /usr/share/mysql

lc-messages     = 'en_US'

skip-external-locking

skip-name-resolve

key_buffer_size         = 128M #16M

max_allowed_packet      = 16M

thread_cache_size       = 16 #8

myisam_recover_options  = BACKUP

open_files_limit        = 25000

table_open_cache        = 1500 #64

table_definition_cache  = 1500 #400

join_buffer_size        = 2M

bulk_insert_buffer_size = 64M

myisam_sort_buffer_size = 128M

query_cache_limit       = 1M

query_cache_size        = 64M #16M

tmp_table_size          = 128M #32M

max_heap_table_size     = 128M #16M

innodb_buffer_pool_size = 128M #8M

innodb_log_buffer_size  = 2M #1M

innodb_file_format      = barracuda

innodb_file_per_table

innodb_flush_method     = O_DIRECT

innodb_old_blocks_time  = 1000

slow_query_log_file     = /var/log/mysql/mysql-slow.log

long_query_time         = 8

slow_query_log          = OFF

expire_logs_days        = 10

max_binlog_size         = 100M

Jean-Marc Messina added a comment - 2018-01-26 15:31 hum, weird. The box is indeed not powerfull at all and the data i sent you incomplete but i still observe massive differences Here are my parameters : [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /var/tmp lc-messages-dir = /usr/share/mysql lc-messages = 'en_US' skip-external-locking skip-name-resolve key_buffer_size = 128M #16M max_allowed_packet = 16M thread_cache_size = 16 #8 myisam_recover_options = BACKUP open_files_limit = 25000 table_open_cache = 1500 #64 table_definition_cache = 1500 #400 join_buffer_size = 2M bulk_insert_buffer_size = 64M myisam_sort_buffer_size = 128M query_cache_limit = 1M query_cache_size = 64M #16M tmp_table_size = 128M #32M max_heap_table_size = 128M #16M innodb_buffer_pool_size = 128M #8M innodb_log_buffer_size = 2M #1M innodb_file_format = barracuda innodb_file_per_table innodb_flush_method = O_DIRECT innodb_old_blocks_time = 1000 slow_query_log_file = /var/log/mysql/mysql-slow.log long_query_time = 8 slow_query_log = OFF expire_logs_days = 10 max_binlog_size = 100M

Axel Schwenke added a comment - 2018-01-26 15:55

Jean-Marc,

I tried it with your my.cnf and the results do not change. So I doubt it's related to my.cnf.

But I had another idea. Your system seems to be rather slow, in other words: old. KPTI has to take different measures, based on processor capabilities. Specifically if your processor lacks the PCID (and more: INVPCID) instructions, then KPTI becomes more costly.

Both features are announced in the "flags" line in /proc/cpuinfo. Could you share your cpuinfo file and also tell me which kernel exactly you are running?

If that is indeed a case of a very old cpu with very high KPTI impact, then you should see it for all kinds of operations, not only MariaDB. Did you notice anything? Programs doing networking or disk IO would be most susceptible. Try nmap or bonnie.

Axel Schwenke added a comment - 2018-01-26 15:55 Jean-Marc, I tried it with your my.cnf and the results do not change. So I doubt it's related to my.cnf. But I had another idea. Your system seems to be rather slow, in other words: old. KPTI has to take different measures, based on processor capabilities. Specifically if your processor lacks the PCID (and more: INVPCID) instructions, then KPTI becomes more costly. Both features are announced in the "flags" line in /proc/cpuinfo. Could you share your cpuinfo file and also tell me which kernel exactly you are running? If that is indeed a case of a very old cpu with very high KPTI impact, then you should see it for all kinds of operations, not only MariaDB. Did you notice anything? Programs doing networking or disk IO would be most susceptible. Try nmap or bonnie.

Jean-Marc Messina added a comment - 2018-01-26 16:03

Here's the version i'm running :
Linux stretch 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

CPUinfo :

processor       : 0

vendor_id       : GenuineIntel

cpu family      : 6

model           : 44

model name      : Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz

stepping        : 1

microcode       : 0x603

cpu MHz         : 2932.739

cache size      : 12288 KB

physical id     : 0

siblings        : 1

core id         : 0

cpu cores       : 1

apicid          : 0

initial apicid  : 0

fpu             : yes

fpu_exception   : yes

cpuid level     : 11

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm epb dtherm ida arat

bugs            :

bogomips        : 5866.87

clflush size    : 64

cache_alignment : 64

address sizes   : 40 bits physical, 48 bits virtual

power management:

It's a VMware VM that doesn't have much resource indeed.

Jean-Marc Messina added a comment - 2018-01-26 16:03 Here's the version i'm running : Linux stretch 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux CPUinfo : processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz stepping : 1 microcode : 0x603 cpu MHz : 2932.739 cache size : 12288 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm epb dtherm ida arat bugs : bogomips : 5866.87 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: It's a VMware VM that doesn't have much resource indeed.

Axel Schwenke added a comment - 2018-01-26 16:19

Indeed your (virtual) cpu does lack PCID support. I've seen third party reports that this could be an issue. I.e. here: https://forums.gentoo.org/viewtopic-p-8169908.html#8169908 - for QEMU, enabling or disabling PCID support makes a big difference.

I suggest you do some performance tests for other applications besides MariaDB. But I'm pretty sure this affects other programs too. As I said: use programs that do disk reads like the bonnie disk benchmark. Or programs accessing the network.

If you see this performance impact for other apllications too (as I expect) then it is time to check your VMware settings and maybe contact VMware support. I'm setting this ticket to "cannot reproduce", but in case you find some indicator that your problem is indeed specific for MariaDB, don't hesitate to add a comment and we will re-open it.

Axel Schwenke added a comment - 2018-01-26 16:19 Indeed your (virtual) cpu does lack PCID support. I've seen third party reports that this could be an issue. I.e. here: https://forums.gentoo.org/viewtopic-p-8169908.html#8169908 - for QEMU, enabling or disabling PCID support makes a big difference. I suggest you do some performance tests for other applications besides MariaDB. But I'm pretty sure this affects other programs too. As I said: use programs that do disk reads like the bonnie disk benchmark. Or programs accessing the network. If you see this performance impact for other apllications too (as I expect) then it is time to check your VMware settings and maybe contact VMware support. I'm setting this ticket to "cannot reproduce", but in case you find some indicator that your problem is indeed specific for MariaDB, don't hesitate to add a comment and we will re-open it.

Jean-Marc Messina added a comment - 2018-01-26 16:25 - edited

Alright, i'll try these out !

Thx very much for your time anyways.

Jean-Marc Messina added a comment - 2018-01-26 16:25 - edited Alright, i'll try these out ! Thx very much for your time anyways.

Jean-Marc Messina added a comment - 2018-02-01 12:22

Heya

I'm back with some news.

I thought i had tried the version i sent you (only a chunk of the table) but i didn't and it seems that on the chunk only, there is no problem.
Please find attached the full version to reproduce the problem.

I have taken a look at the PCID support but it's hard to find information on it. Beside the fact that it's supported on vmware vmachine 11 and i'm running 9 (currently impossible to upgrade apparently), i've read that PCID support is only available on a much newer kernel version. But then again, i might have misunderstood this information.

Anyways, if you can take try again with that new table i'm sending you, same request as before, that'd be great !

And thx again for your help mate =) test.sql.gz

Jean-Marc Messina added a comment - 2018-02-01 12:22 Heya I'm back with some news. I thought i had tried the version i sent you (only a chunk of the table) but i didn't and it seems that on the chunk only, there is no problem. Please find attached the full version to reproduce the problem. I have taken a look at the PCID support but it's hard to find information on it. Beside the fact that it's supported on vmware vmachine 11 and i'm running 9 (currently impossible to upgrade apparently), i've read that PCID support is only available on a much newer kernel version. But then again, i might have misunderstood this information. Anyways, if you can take try again with that new table i'm sending you, same request as before, that'd be great ! And thx again for your help mate =) test.sql.gz

Axel Schwenke added a comment - 2018-02-08 13:47

Reopened due to new test case.

Axel Schwenke added a comment - 2018-02-08 13:47 Reopened due to new test case.

Axel Schwenke added a comment - 2018-02-08 14:13

I reproduced a regression, though not as big as reported.

non-PTI kernel (3.16.0-77): 1403 rows in set (3.74 sec)
PTI-kernel (4.4.0-111): 1403 rows in set (6.50 sec)

-> around 42% slower

Axel Schwenke added a comment - 2018-02-08 14:13 I reproduced a regression, though not as big as reported. non-PTI kernel (3.16.0-77): 1403 rows in set (3.74 sec) PTI-kernel (4.4.0-111): 1403 rows in set (6.50 sec) -> around 42% slower

Axel Schwenke added a comment - 2018-02-12 12:49 - edited

Hi Jean-Marc,

I have found the reason why your case is so heavily impacted by PTI: you are using MyISAM tables and the query reads rows from that table in a tight loop (the subquery).

The MyISAM engine does not have a cache for row data in the database server process, only for indexes. Instead it relies on the operating systems page cache for data reads. This works pretty well, however for the server process this cache is at least one syscall away. And the syscall operation is, what becomes more expensive with the PTI fix in the kernel.

I reproduced the same effect with simple SELECT workload on MyISAM tables. However then the effect wasn't very big, because there is extra overhead for sending the query to the server, parsing it, etc. And this overhead is the same for PTI and non-PTI kernels. But when the row-reads happen in a tight loop like a subquery or join, then the effect becomes quite visible.

There is no solution for the underlying problem - it would require a redesign of the MyISAM engine. But a simple workaround exists: switch to an engine with a row cache. If the table is switched to ARIA or InnoDB, then there is no longer any difference between a PTI and non-PTI system:

Engine	non-PTI	PTI
MyISAM	3.77s	6.48s
ARIA	0.95s	0.79s
InnoDB	1.55s	1.54s

For that test I used your my.cnf with innodb_buffer_pool_size = 128M. The ARIA engine uses it's own buffer, which size can be configured with aria_pagecache_buffer_size. The default is 128M. If you intend to switch many or even all your MyISAM tables to ARIA, you might need to increase it.

Axel Schwenke added a comment - 2018-02-12 12:49 - edited Hi Jean-Marc, I have found the reason why your case is so heavily impacted by PTI: you are using MyISAM tables and the query reads rows from that table in a tight loop (the subquery). The MyISAM engine does not have a cache for row data in the database server process, only for indexes. Instead it relies on the operating systems page cache for data reads. This works pretty well, however for the server process this cache is at least one syscall away. And the syscall operation is, what becomes more expensive with the PTI fix in the kernel. I reproduced the same effect with simple SELECT workload on MyISAM tables. However then the effect wasn't very big, because there is extra overhead for sending the query to the server, parsing it, etc. And this overhead is the same for PTI and non-PTI kernels. But when the row-reads happen in a tight loop like a subquery or join, then the effect becomes quite visible. There is no solution for the underlying problem - it would require a redesign of the MyISAM engine. But a simple workaround exists: switch to an engine with a row cache. If the table is switched to ARIA or InnoDB, then there is no longer any difference between a PTI and non-PTI system: Engine non-PTI PTI MyISAM 3.77s 6.48s ARIA 0.95s 0.79s InnoDB 1.55s 1.54s For that test I used your my.cnf with innodb_buffer_pool_size = 128M. The ARIA engine uses it's own buffer, which size can be configured with aria_pagecache_buffer_size. The default is 128M. If you intend to switch many or even all your MyISAM tables to ARIA, you might need to increase it.

Axel Schwenke added a comment - 2018-02-12 12:53

Reading rows from a MyISAM table in a tight loop (subquery or join) emits a massive number of fget() calls, resulting in __fget syscalls. Hence this type of workload gets a significant performance hit from the KPTI workaround for the Meltdown vulnerability.
The workaround for this problem is to switch to an engine with a row cache inside the server, i.e. ARIA or InnoDB.

Axel Schwenke added a comment - 2018-02-12 12:53 Reading rows from a MyISAM table in a tight loop (subquery or join) emits a massive number of fget() calls, resulting in __fget syscalls. Hence this type of workload gets a significant performance hit from the KPTI workaround for the Meltdown vulnerability. The workaround for this problem is to switch to an engine with a row cache inside the server, i.e. ARIA or InnoDB.

Axel Schwenke added a comment - 2018-02-12 15:34

Hi Jean-Marc,

do you agree that I use this report for a blog post? You can see a preview here:

https://mariadb.org/?p=4822&preview=1&_ppp=dc3281f82d

Axel Schwenke added a comment - 2018-02-12 15:34 Hi Jean-Marc, do you agree that I use this report for a blog post? You can see a preview here: https://mariadb.org/?p=4822&preview=1&_ppp=dc3281f82d

Jean-Marc Messina added a comment - 2018-02-13 10:14

This is an amazing in-depth analysis !

We did find a workaround in the meantime by reworking the db but i've tried to switch to INNODB and i go from 52sec to 2sec !

You can absolutely use the data i sent you for a post, it's all been anonymized.

Thx ver much for your help !

Jean-Marc Messina added a comment - 2018-02-13 10:14 This is an amazing in-depth analysis ! We did find a workaround in the meantime by reworking the db but i've tried to switch to INNODB and i go from 52sec to 2sec ! You can absolutely use the data i sent you for a post, it's all been anonymized. Thx ver much for your help !

People

Assignee:: Axel Schwenke

Reporter:: Jean-Marc Messina

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-01-25 15:00

Updated:: 2018-02-13 10:14

Resolved:: 2018-02-12 12:53

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration