[MDEV-15072] Massive performance impact after PTI fix Created: 2018-01-25  Updated: 2018-02-13  Resolved: 2018-02-12

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Subquery, Storage Engine - MyISAM
Affects Version/s: 10.1.26, 10.0.32
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Jean-Marc Messina Assignee: Axel Schwenke
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

GNU/Linux Debian 8 Jessie
GNU/Linux Debian 9 stretch


Attachments: File test.sql.gz     File test.sql.gz    
Issue Links:
Relates
relates to MDEV-15163 Massive performance impact after PTI ... Closed

 Description   

Hello

I have just applied the PTI patch on the OS and it is causing a massive difference in the execution time of a specific query.

Disabling and reenabling PTI enables me to reproduce the difference at will :

PTI disabled : 7s
PTI enabled : 87s

The query :

SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS (SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`);

Explain doesn't help much :

+------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | PRIMARY            | b1    | ALL  | NULL          | NULL | NULL    | NULL | 4499 | Using where |
|    2 | DEPENDENT SUBQUERY | b2    | ALL  | NULL          | NULL | NULL    | NULL | 4499 | Using where |
+------+--------------------+-------+------+---------------+------+---------+------+------+-------------+

Can't see any report matching that kind of difference so far so i'm wondering what's up



 Comments   
Comment by Axel Schwenke [ 2018-01-25 ]

Hi Jean-Marc,

could you please check the execution plan (EXPLAIN) with and without KPTI? Such big differences normally only happen when the plan is changed.

Comment by Jean-Marc Messina [ 2018-01-26 ]

the explain output is exactly the same in both cases.
You can find attached a sample of the table to reproduce the problem. There are less columns than the original table but still show the difference.
test.sql.gz

Comment by Axel Schwenke [ 2018-01-26 ]

Hi Jean-Marc,

thanks for that. I can however not reproduce what you are seeing. I'm using the table dump from you and this script:

use test;
source 45019_test.sql
 
SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24'
AND NOT EXISTS 
( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` 
  AND b2.`dateFin` > b1.`dateFin`);
 
EXPLAIN SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24'
AND NOT EXISTS 
( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` 
  AND b2.`dateFin` > b1.`dateFin`)\G

On a system with the latest Ubuntu kernel (4.4.0-111, including KPTI and Spectre 1 patch) I get identical results for MariaDB 10.0.32 and 10.1.26:

...
406 rows in set (0.07 sec)
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: b1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 468
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: b2
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 468
        Extra: Using where

I didn't even bother to run this on an unpatched system, as it already is very fast.

Do you have additional Spectre patches enabled? The microcode updates from Intel are reported to have a massive impact, but AFAIK they are not yet rolled out.

I just checked the status of my machine with the tool from here: https://github.com/speed47/spectre-meltdown-checker and it reports that I am protected against Spectre Variant1 and against Meltdown, but not Spectre Variant 2.

Comment by Axel Schwenke [ 2018-01-26 ]

I tried that now also on a Debian machine. It's Debian Jessie, running kernel 3.16.51-3+deb8u1 (2018-01-08). It is even faster, running the query in 0.02s. The spectre-meltdown-checker reports that this kernel is only protected against Meltdown.

Comment by Axel Schwenke [ 2018-01-26 ]

One more thing. For my tests I had an empty my.cnf, just setting datadir and socket location. In case you have anything unusual in your my.cnf (optimizer switch?) please attach it.

Comment by Jean-Marc Messina [ 2018-01-26 ]

hum, weird. The box is indeed not powerfull at all and the data i sent you incomplete but i still observe massive differences

Here are my parameters :

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /var/tmp
lc-messages-dir = /usr/share/mysql
lc-messages     = 'en_US'
skip-external-locking
skip-name-resolve
key_buffer_size         = 128M #16M
max_allowed_packet      = 16M
thread_cache_size       = 16 #8
myisam_recover_options  = BACKUP
open_files_limit        = 25000
table_open_cache        = 1500 #64
table_definition_cache  = 1500 #400
join_buffer_size        = 2M
bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 128M
query_cache_limit       = 1M
query_cache_size        = 64M #16M
tmp_table_size          = 128M #32M
max_heap_table_size     = 128M #16M
innodb_buffer_pool_size = 128M #8M
innodb_log_buffer_size  = 2M #1M
innodb_file_format      = barracuda
innodb_file_per_table
innodb_flush_method     = O_DIRECT
innodb_old_blocks_time  = 1000
slow_query_log_file     = /var/log/mysql/mysql-slow.log
long_query_time         = 8
slow_query_log          = OFF
expire_logs_days        = 10
max_binlog_size         = 100M

Comment by Axel Schwenke [ 2018-01-26 ]

Jean-Marc,

I tried it with your my.cnf and the results do not change. So I doubt it's related to my.cnf.

But I had another idea. Your system seems to be rather slow, in other words: old. KPTI has to take different measures, based on processor capabilities. Specifically if your processor lacks the PCID (and more: INVPCID) instructions, then KPTI becomes more costly.

Both features are announced in the "flags" line in /proc/cpuinfo. Could you share your cpuinfo file and also tell me which kernel exactly you are running?

If that is indeed a case of a very old cpu with very high KPTI impact, then you should see it for all kinds of operations, not only MariaDB. Did you notice anything? Programs doing networking or disk IO would be most susceptible. Try nmap or bonnie.

Comment by Jean-Marc Messina [ 2018-01-26 ]

Here's the version i'm running :
Linux stretch 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

CPUinfo :

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
stepping        : 1
microcode       : 0x603
cpu MHz         : 2932.739
cache size      : 12288 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm epb dtherm ida arat
bugs            :
bogomips        : 5866.87
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

It's a VMware VM that doesn't have much resource indeed.

Comment by Axel Schwenke [ 2018-01-26 ]

Indeed your (virtual) cpu does lack PCID support. I've seen third party reports that this could be an issue. I.e. here: https://forums.gentoo.org/viewtopic-p-8169908.html#8169908 - for QEMU, enabling or disabling PCID support makes a big difference.

I suggest you do some performance tests for other applications besides MariaDB. But I'm pretty sure this affects other programs too. As I said: use programs that do disk reads like the bonnie disk benchmark. Or programs accessing the network.

If you see this performance impact for other apllications too (as I expect) then it is time to check your VMware settings and maybe contact VMware support. I'm setting this ticket to "cannot reproduce", but in case you find some indicator that your problem is indeed specific for MariaDB, don't hesitate to add a comment and we will re-open it.

Comment by Jean-Marc Messina [ 2018-01-26 ]

Alright, i'll try these out !

Thx very much for your time anyways.

Comment by Jean-Marc Messina [ 2018-02-01 ]

Heya

I'm back with some news.

I thought i had tried the version i sent you (only a chunk of the table) but i didn't and it seems that on the chunk only, there is no problem.
Please find attached the full version to reproduce the problem.

I have taken a look at the PCID support but it's hard to find information on it. Beside the fact that it's supported on vmware vmachine 11 and i'm running 9 (currently impossible to upgrade apparently), i've read that PCID support is only available on a much newer kernel version. But then again, i might have misunderstood this information.

Anyways, if you can take try again with that new table i'm sending you, same request as before, that'd be great !

And thx again for your help mate =) test.sql.gz

Comment by Axel Schwenke [ 2018-02-08 ]

Reopened due to new test case.

Comment by Axel Schwenke [ 2018-02-08 ]

I reproduced a regression, though not as big as reported.

non-PTI kernel (3.16.0-77): 1403 rows in set (3.74 sec)
PTI-kernel (4.4.0-111): 1403 rows in set (6.50 sec)

-> around 42% slower

Comment by Axel Schwenke [ 2018-02-12 ]

Hi Jean-Marc,

I have found the reason why your case is so heavily impacted by PTI: you are using MyISAM tables and the query reads rows from that table in a tight loop (the subquery).

The MyISAM engine does not have a cache for row data in the database server process, only for indexes. Instead it relies on the operating systems page cache for data reads. This works pretty well, however for the server process this cache is at least one syscall away. And the syscall operation is, what becomes more expensive with the PTI fix in the kernel.

I reproduced the same effect with simple SELECT workload on MyISAM tables. However then the effect wasn't very big, because there is extra overhead for sending the query to the server, parsing it, etc. And this overhead is the same for PTI and non-PTI kernels. But when the row-reads happen in a tight loop like a subquery or join, then the effect becomes quite visible.

There is no solution for the underlying problem - it would require a redesign of the MyISAM engine. But a simple workaround exists: switch to an engine with a row cache. If the table is switched to ARIA or InnoDB, then there is no longer any difference between a PTI and non-PTI system:

Engine non-PTI PTI
MyISAM 3.77s 6.48s
ARIA 0.95s 0.79s
InnoDB 1.55s 1.54s

For that test I used your my.cnf with innodb_buffer_pool_size = 128M. The ARIA engine uses it's own buffer, which size can be configured with aria_pagecache_buffer_size. The default is 128M. If you intend to switch many or even all your MyISAM tables to ARIA, you might need to increase it.

Comment by Axel Schwenke [ 2018-02-12 ]

Reading rows from a MyISAM table in a tight loop (subquery or join) emits a massive number of fget() calls, resulting in __fget syscalls. Hence this type of workload gets a significant performance hit from the KPTI workaround for the Meltdown vulnerability.
The workaround for this problem is to switch to an engine with a row cache inside the server, i.e. ARIA or InnoDB.

Comment by Axel Schwenke [ 2018-02-12 ]

Hi Jean-Marc,

do you agree that I use this report for a blog post? You can see a preview here:

https://mariadb.org/?p=4822&preview=1&_ppp=dc3281f82d

Comment by Jean-Marc Messina [ 2018-02-13 ]

This is an amazing in-depth analysis !

We did find a workaround in the meantime by reworking the db but i've tried to switch to INNODB and i go from 52sec to 2sec !

You can absolutely use the data i sent you for a post, it's all been anonymized.

Thx ver much for your help !

Generated at Thu Feb 08 08:18:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.