Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15072

Massive performance impact after PTI fix

Details

    Description

      Hello

      I have just applied the PTI patch on the OS and it is causing a massive difference in the execution time of a specific query.

      Disabling and reenabling PTI enables me to reproduce the difference at will :

      PTI disabled : 7s
      PTI enabled : 87s

      The query :

      SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS (SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`);
      

      Explain doesn't help much :

      +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
      | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
      |    1 | PRIMARY            | b1    | ALL  | NULL          | NULL | NULL    | NULL | 4499 | Using where |
      |    2 | DEPENDENT SUBQUERY | b2    | ALL  | NULL          | NULL | NULL    | NULL | 4499 | Using where |
      +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
      

      Can't see any report matching that kind of difference so far so i'm wondering what's up

      Attachments

        Issue Links

          Activity

            falcoris Jean-Marc Messina created issue -
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Description Hello

            I have just applied the PTI patch on the OS and it is causing a massive difference in the execution time of a specific query.

            Disabling and reenabling PTI enables me to reproduce the difference at will :

            PTI disabled : 7s
            PTI enabled : 87s

            The query :
            SELECT * FROM `blacklist` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS (SELECT 1 FROM `blacklist` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`);

            Explain doesn't help much :
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | PRIMARY | b1 | ALL | NULL | NULL | NULL | NULL | 4499 | Using where |
            | 2 | DEPENDENT SUBQUERY | b2 | ALL | NULL | NULL | NULL | NULL | 4499 | Using where |
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+

            Can't see any report matching that kind of difference so far so i'm wondering what's up
            Hello

            I have just applied the PTI patch on the OS and it is causing a massive difference in the execution time of a specific query.

            Disabling and reenabling PTI enables me to reproduce the difference at will :

            PTI disabled : 7s
            PTI enabled : 87s

            The query :
            {code:sql}
            SELECT * FROM `blacklist` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS (SELECT 1 FROM `blacklist` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`);
            {code}
            Explain doesn't help much :
            {noformat}
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | PRIMARY | b1 | ALL | NULL | NULL | NULL | NULL | 4499 | Using where |
            | 2 | DEPENDENT SUBQUERY | b2 | ALL | NULL | NULL | NULL | NULL | 4499 | Using where |
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}
            Can't see any report matching that kind of difference so far so i'm wondering what's up
            falcoris Jean-Marc Messina made changes -
            Affects Version/s 10.1.26 [ 22553 ]
            Environment GNU/Linux Debian 8 Jessie GNU/Linux Debian 8 Jessie
            GNU/Linux Debian 9 stretch
            axel Axel Schwenke made changes -
            Assignee Axel Schwenke [ axel ]
            axel Axel Schwenke added a comment -

            Hi Jean-Marc,

            could you please check the execution plan (EXPLAIN) with and without KPTI? Such big differences normally only happen when the plan is changed.

            axel Axel Schwenke added a comment - Hi Jean-Marc, could you please check the execution plan (EXPLAIN) with and without KPTI? Such big differences normally only happen when the plan is changed.
            falcoris Jean-Marc Messina made changes -
            Attachment cossi-ip.sql.gz [ 45016 ]
            falcoris Jean-Marc Messina made changes -
            Comment [ exactly the same output on the execution plan with and without KPTI
            Here are the tables (attached to this ticket) if you want to reproduce the problem [^cossi-ip.sql.gz] ]
            falcoris Jean-Marc Messina made changes -
            Attachment cossi-ip.sql.gz [ 45016 ]
            falcoris Jean-Marc Messina made changes -
            Description Hello

            I have just applied the PTI patch on the OS and it is causing a massive difference in the execution time of a specific query.

            Disabling and reenabling PTI enables me to reproduce the difference at will :

            PTI disabled : 7s
            PTI enabled : 87s

            The query :
            {code:sql}
            SELECT * FROM `blacklist` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS (SELECT 1 FROM `blacklist` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`);
            {code}
            Explain doesn't help much :
            {noformat}
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | PRIMARY | b1 | ALL | NULL | NULL | NULL | NULL | 4499 | Using where |
            | 2 | DEPENDENT SUBQUERY | b2 | ALL | NULL | NULL | NULL | NULL | 4499 | Using where |
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}
            Can't see any report matching that kind of difference so far so i'm wondering what's up
            Hello

            I have just applied the PTI patch on the OS and it is causing a massive difference in the execution time of a specific query.

            Disabling and reenabling PTI enables me to reproduce the difference at will :

            PTI disabled : 7s
            PTI enabled : 87s

            The query :
            {code:sql}
            SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS (SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`);
            {code}
            Explain doesn't help much :
            {noformat}
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            | 1 | PRIMARY | b1 | ALL | NULL | NULL | NULL | NULL | 4499 | Using where |
            | 2 | DEPENDENT SUBQUERY | b2 | ALL | NULL | NULL | NULL | NULL | 4499 | Using where |
            +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
            {noformat}
            Can't see any report matching that kind of difference so far so i'm wondering what's up
            falcoris Jean-Marc Messina made changes -
            Attachment test.sql.gz [ 45019 ]

            the explain output is exactly the same in both cases.
            You can find attached a sample of the table to reproduce the problem. There are less columns than the original table but still show the difference.
            test.sql.gz

            falcoris Jean-Marc Messina added a comment - the explain output is exactly the same in both cases. You can find attached a sample of the table to reproduce the problem. There are less columns than the original table but still show the difference. test.sql.gz
            axel Axel Schwenke added a comment -

            Hi Jean-Marc,

            thanks for that. I can however not reproduce what you are seeing. I'm using the table dump from you and this script:

            use test;
            source 45019_test.sql
             
            SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24'
            AND NOT EXISTS 
            ( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` 
              AND b2.`dateFin` > b1.`dateFin`);
             
            EXPLAIN SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24'
            AND NOT EXISTS 
            ( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` 
              AND b2.`dateFin` > b1.`dateFin`)\G
            

            On a system with the latest Ubuntu kernel (4.4.0-111, including KPTI and Spectre 1 patch) I get identical results for MariaDB 10.0.32 and 10.1.26:

            ...
            406 rows in set (0.07 sec)
            *************************** 1. row ***************************
                       id: 1
              select_type: PRIMARY
                    table: b1
                     type: ALL
            possible_keys: NULL
                      key: NULL
                  key_len: NULL
                      ref: NULL
                     rows: 468
                    Extra: Using where
            *************************** 2. row ***************************
                       id: 2
              select_type: DEPENDENT SUBQUERY
                    table: b2
                     type: ALL
            possible_keys: NULL
                      key: NULL
                  key_len: NULL
                      ref: NULL
                     rows: 468
                    Extra: Using where
            

            I didn't even bother to run this on an unpatched system, as it already is very fast.

            Do you have additional Spectre patches enabled? The microcode updates from Intel are reported to have a massive impact, but AFAIK they are not yet rolled out.

            I just checked the status of my machine with the tool from here: https://github.com/speed47/spectre-meltdown-checker and it reports that I am protected against Spectre Variant1 and against Meltdown, but not Spectre Variant 2.

            axel Axel Schwenke added a comment - Hi Jean-Marc, thanks for that. I can however not reproduce what you are seeing. I'm using the table dump from you and this script: use test; source 45019_test.sql   SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS ( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`);   EXPLAIN SELECT * FROM `test` as b1 WHERE b1.`dateFin` > '2018-01-24' AND NOT EXISTS ( SELECT 1 FROM `test` as b2 WHERE b2.`ip` = b1.`ip` AND b2.`dateFin` > b1.`dateFin`)\G On a system with the latest Ubuntu kernel (4.4.0-111, including KPTI and Spectre 1 patch) I get identical results for MariaDB 10.0.32 and 10.1.26: ... 406 rows in set (0.07 sec) *************************** 1. row *************************** id: 1 select_type: PRIMARY table: b1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 468 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: b2 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 468 Extra: Using where I didn't even bother to run this on an unpatched system, as it already is very fast. Do you have additional Spectre patches enabled? The microcode updates from Intel are reported to have a massive impact, but AFAIK they are not yet rolled out. I just checked the status of my machine with the tool from here: https://github.com/speed47/spectre-meltdown-checker and it reports that I am protected against Spectre Variant1 and against Meltdown, but not Spectre Variant 2.
            axel Axel Schwenke added a comment -

            I tried that now also on a Debian machine. It's Debian Jessie, running kernel 3.16.51-3+deb8u1 (2018-01-08). It is even faster, running the query in 0.02s. The spectre-meltdown-checker reports that this kernel is only protected against Meltdown.

            axel Axel Schwenke added a comment - I tried that now also on a Debian machine. It's Debian Jessie, running kernel 3.16.51-3+deb8u1 (2018-01-08). It is even faster, running the query in 0.02s. The spectre-meltdown-checker reports that this kernel is only protected against Meltdown.
            axel Axel Schwenke added a comment -

            One more thing. For my tests I had an empty my.cnf, just setting datadir and socket location. In case you have anything unusual in your my.cnf (optimizer switch?) please attach it.

            axel Axel Schwenke added a comment - One more thing. For my tests I had an empty my.cnf, just setting datadir and socket location. In case you have anything unusual in your my.cnf (optimizer switch?) please attach it.

            hum, weird. The box is indeed not powerfull at all and the data i sent you incomplete but i still observe massive differences

            Here are my parameters :

            [mysqld]
            user            = mysql
            pid-file        = /var/run/mysqld/mysqld.pid
            socket          = /var/run/mysqld/mysqld.sock
            port            = 3306
            basedir         = /usr
            datadir         = /var/lib/mysql
            tmpdir          = /var/tmp
            lc-messages-dir = /usr/share/mysql
            lc-messages     = 'en_US'
            skip-external-locking
            skip-name-resolve
            key_buffer_size         = 128M #16M
            max_allowed_packet      = 16M
            thread_cache_size       = 16 #8
            myisam_recover_options  = BACKUP
            open_files_limit        = 25000
            table_open_cache        = 1500 #64
            table_definition_cache  = 1500 #400
            join_buffer_size        = 2M
            bulk_insert_buffer_size = 64M
            myisam_sort_buffer_size = 128M
            query_cache_limit       = 1M
            query_cache_size        = 64M #16M
            tmp_table_size          = 128M #32M
            max_heap_table_size     = 128M #16M
            innodb_buffer_pool_size = 128M #8M
            innodb_log_buffer_size  = 2M #1M
            innodb_file_format      = barracuda
            innodb_file_per_table
            innodb_flush_method     = O_DIRECT
            innodb_old_blocks_time  = 1000
            slow_query_log_file     = /var/log/mysql/mysql-slow.log
            long_query_time         = 8
            slow_query_log          = OFF
            expire_logs_days        = 10
            max_binlog_size         = 100M
            

            falcoris Jean-Marc Messina added a comment - hum, weird. The box is indeed not powerfull at all and the data i sent you incomplete but i still observe massive differences Here are my parameters : [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /var/tmp lc-messages-dir = /usr/share/mysql lc-messages = 'en_US' skip-external-locking skip-name-resolve key_buffer_size = 128M #16M max_allowed_packet = 16M thread_cache_size = 16 #8 myisam_recover_options = BACKUP open_files_limit = 25000 table_open_cache = 1500 #64 table_definition_cache = 1500 #400 join_buffer_size = 2M bulk_insert_buffer_size = 64M myisam_sort_buffer_size = 128M query_cache_limit = 1M query_cache_size = 64M #16M tmp_table_size = 128M #32M max_heap_table_size = 128M #16M innodb_buffer_pool_size = 128M #8M innodb_log_buffer_size = 2M #1M innodb_file_format = barracuda innodb_file_per_table innodb_flush_method = O_DIRECT innodb_old_blocks_time = 1000 slow_query_log_file = /var/log/mysql/mysql-slow.log long_query_time = 8 slow_query_log = OFF expire_logs_days = 10 max_binlog_size = 100M
            axel Axel Schwenke added a comment -

            Jean-Marc,

            I tried it with your my.cnf and the results do not change. So I doubt it's related to my.cnf.

            But I had another idea. Your system seems to be rather slow, in other words: old. KPTI has to take different measures, based on processor capabilities. Specifically if your processor lacks the PCID (and more: INVPCID) instructions, then KPTI becomes more costly.

            Both features are announced in the "flags" line in /proc/cpuinfo. Could you share your cpuinfo file and also tell me which kernel exactly you are running?

            If that is indeed a case of a very old cpu with very high KPTI impact, then you should see it for all kinds of operations, not only MariaDB. Did you notice anything? Programs doing networking or disk IO would be most susceptible. Try nmap or bonnie.

            axel Axel Schwenke added a comment - Jean-Marc, I tried it with your my.cnf and the results do not change. So I doubt it's related to my.cnf. But I had another idea. Your system seems to be rather slow, in other words: old. KPTI has to take different measures, based on processor capabilities. Specifically if your processor lacks the PCID (and more: INVPCID) instructions, then KPTI becomes more costly. Both features are announced in the "flags" line in /proc/cpuinfo. Could you share your cpuinfo file and also tell me which kernel exactly you are running? If that is indeed a case of a very old cpu with very high KPTI impact, then you should see it for all kinds of operations, not only MariaDB. Did you notice anything? Programs doing networking or disk IO would be most susceptible. Try nmap or bonnie.

            Here's the version i'm running :
            Linux stretch 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

            CPUinfo :

            processor       : 0
            vendor_id       : GenuineIntel
            cpu family      : 6
            model           : 44
            model name      : Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
            stepping        : 1
            microcode       : 0x603
            cpu MHz         : 2932.739
            cache size      : 12288 KB
            physical id     : 0
            siblings        : 1
            core id         : 0
            cpu cores       : 1
            apicid          : 0
            initial apicid  : 0
            fpu             : yes
            fpu_exception   : yes
            cpuid level     : 11
            wp              : yes
            flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm epb dtherm ida arat
            bugs            :
            bogomips        : 5866.87
            clflush size    : 64
            cache_alignment : 64
            address sizes   : 40 bits physical, 48 bits virtual
            power management:
            
            

            It's a VMware VM that doesn't have much resource indeed.

            falcoris Jean-Marc Messina added a comment - Here's the version i'm running : Linux stretch 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux CPUinfo : processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz stepping : 1 microcode : 0x603 cpu MHz : 2932.739 cache size : 12288 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm epb dtherm ida arat bugs : bogomips : 5866.87 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: It's a VMware VM that doesn't have much resource indeed.
            axel Axel Schwenke added a comment -

            Indeed your (virtual) cpu does lack PCID support. I've seen third party reports that this could be an issue. I.e. here: https://forums.gentoo.org/viewtopic-p-8169908.html#8169908 - for QEMU, enabling or disabling PCID support makes a big difference.

            I suggest you do some performance tests for other applications besides MariaDB. But I'm pretty sure this affects other programs too. As I said: use programs that do disk reads like the bonnie disk benchmark. Or programs accessing the network.

            If you see this performance impact for other apllications too (as I expect) then it is time to check your VMware settings and maybe contact VMware support. I'm setting this ticket to "cannot reproduce", but in case you find some indicator that your problem is indeed specific for MariaDB, don't hesitate to add a comment and we will re-open it.

            axel Axel Schwenke added a comment - Indeed your (virtual) cpu does lack PCID support. I've seen third party reports that this could be an issue. I.e. here: https://forums.gentoo.org/viewtopic-p-8169908.html#8169908 - for QEMU, enabling or disabling PCID support makes a big difference. I suggest you do some performance tests for other applications besides MariaDB. But I'm pretty sure this affects other programs too. As I said: use programs that do disk reads like the bonnie disk benchmark. Or programs accessing the network. If you see this performance impact for other apllications too (as I expect) then it is time to check your VMware settings and maybe contact VMware support. I'm setting this ticket to "cannot reproduce", but in case you find some indicator that your problem is indeed specific for MariaDB, don't hesitate to add a comment and we will re-open it.
            axel Axel Schwenke made changes -
            issue.field.resolutiondate 2018-01-26 16:20:27.0 2018-01-26 16:20:27.824
            axel Axel Schwenke made changes -
            Fix Version/s N/A [ 14700 ]
            Resolution Cannot Reproduce [ 5 ]
            Status Open [ 1 ] Closed [ 6 ]
            falcoris Jean-Marc Messina added a comment - - edited

            Alright, i'll try these out !

            Thx very much for your time anyways.

            falcoris Jean-Marc Messina added a comment - - edited Alright, i'll try these out ! Thx very much for your time anyways.
            falcoris Jean-Marc Messina made changes -
            Attachment test.sql.gz [ 45129 ]

            Heya

            I'm back with some news.

            I thought i had tried the version i sent you (only a chunk of the table) but i didn't and it seems that on the chunk only, there is no problem.
            Please find attached the full version to reproduce the problem.

            I have taken a look at the PCID support but it's hard to find information on it. Beside the fact that it's supported on vmware vmachine 11 and i'm running 9 (currently impossible to upgrade apparently), i've read that PCID support is only available on a much newer kernel version. But then again, i might have misunderstood this information.

            Anyways, if you can take try again with that new table i'm sending you, same request as before, that'd be great !

            And thx again for your help mate =) test.sql.gz

            falcoris Jean-Marc Messina added a comment - Heya I'm back with some news. I thought i had tried the version i sent you (only a chunk of the table) but i didn't and it seems that on the chunk only, there is no problem. Please find attached the full version to reproduce the problem. I have taken a look at the PCID support but it's hard to find information on it. Beside the fact that it's supported on vmware vmachine 11 and i'm running 9 (currently impossible to upgrade apparently), i've read that PCID support is only available on a much newer kernel version. But then again, i might have misunderstood this information. Anyways, if you can take try again with that new table i'm sending you, same request as before, that'd be great ! And thx again for your help mate =) test.sql.gz
            falcoris Jean-Marc Messina made changes -
            axel Axel Schwenke added a comment -

            Reopened due to new test case.

            axel Axel Schwenke added a comment - Reopened due to new test case.
            axel Axel Schwenke made changes -
            Resolution Cannot Reproduce [ 5 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            axel Axel Schwenke made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            axel Axel Schwenke added a comment -

            I reproduced a regression, though not as big as reported.

            non-PTI kernel (3.16.0-77): 1403 rows in set (3.74 sec)
            PTI-kernel (4.4.0-111): 1403 rows in set (6.50 sec)

            -> around 42% slower

            axel Axel Schwenke added a comment - I reproduced a regression, though not as big as reported. non-PTI kernel (3.16.0-77): 1403 rows in set (3.74 sec) PTI-kernel (4.4.0-111): 1403 rows in set (6.50 sec) -> around 42% slower
            axel Axel Schwenke added a comment - - edited

            Hi Jean-Marc,

            I have found the reason why your case is so heavily impacted by PTI: you are using MyISAM tables and the query reads rows from that table in a tight loop (the subquery).

            The MyISAM engine does not have a cache for row data in the database server process, only for indexes. Instead it relies on the operating systems page cache for data reads. This works pretty well, however for the server process this cache is at least one syscall away. And the syscall operation is, what becomes more expensive with the PTI fix in the kernel.

            I reproduced the same effect with simple SELECT workload on MyISAM tables. However then the effect wasn't very big, because there is extra overhead for sending the query to the server, parsing it, etc. And this overhead is the same for PTI and non-PTI kernels. But when the row-reads happen in a tight loop like a subquery or join, then the effect becomes quite visible.

            There is no solution for the underlying problem - it would require a redesign of the MyISAM engine. But a simple workaround exists: switch to an engine with a row cache. If the table is switched to ARIA or InnoDB, then there is no longer any difference between a PTI and non-PTI system:

            Engine non-PTI PTI
            MyISAM 3.77s 6.48s
            ARIA 0.95s 0.79s
            InnoDB 1.55s 1.54s

            For that test I used your my.cnf with innodb_buffer_pool_size = 128M. The ARIA engine uses it's own buffer, which size can be configured with aria_pagecache_buffer_size. The default is 128M. If you intend to switch many or even all your MyISAM tables to ARIA, you might need to increase it.

            axel Axel Schwenke added a comment - - edited Hi Jean-Marc, I have found the reason why your case is so heavily impacted by PTI: you are using MyISAM tables and the query reads rows from that table in a tight loop (the subquery). The MyISAM engine does not have a cache for row data in the database server process, only for indexes. Instead it relies on the operating systems page cache for data reads. This works pretty well, however for the server process this cache is at least one syscall away. And the syscall operation is, what becomes more expensive with the PTI fix in the kernel. I reproduced the same effect with simple SELECT workload on MyISAM tables. However then the effect wasn't very big, because there is extra overhead for sending the query to the server, parsing it, etc. And this overhead is the same for PTI and non-PTI kernels. But when the row-reads happen in a tight loop like a subquery or join, then the effect becomes quite visible. There is no solution for the underlying problem - it would require a redesign of the MyISAM engine. But a simple workaround exists: switch to an engine with a row cache. If the table is switched to ARIA or InnoDB, then there is no longer any difference between a PTI and non-PTI system: Engine non-PTI PTI MyISAM 3.77s 6.48s ARIA 0.95s 0.79s InnoDB 1.55s 1.54s For that test I used your my.cnf with innodb_buffer_pool_size = 128M. The ARIA engine uses it's own buffer, which size can be configured with aria_pagecache_buffer_size. The default is 128M. If you intend to switch many or even all your MyISAM tables to ARIA, you might need to increase it.
            axel Axel Schwenke added a comment -

            Reading rows from a MyISAM table in a tight loop (subquery or join) emits a massive number of fget() calls, resulting in __fget syscalls. Hence this type of workload gets a significant performance hit from the KPTI workaround for the Meltdown vulnerability.
            The workaround for this problem is to switch to an engine with a row cache inside the server, i.e. ARIA or InnoDB.

            axel Axel Schwenke added a comment - Reading rows from a MyISAM table in a tight loop (subquery or join) emits a massive number of fget() calls, resulting in __fget syscalls. Hence this type of workload gets a significant performance hit from the KPTI workaround for the Meltdown vulnerability. The workaround for this problem is to switch to an engine with a row cache inside the server, i.e. ARIA or InnoDB.
            axel Axel Schwenke made changes -
            Component/s Storage Engine - MyISAM [ 10600 ]
            Resolution Won't Fix [ 2 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            axel Axel Schwenke added a comment -

            Hi Jean-Marc,

            do you agree that I use this report for a blog post? You can see a preview here:

            https://mariadb.org/?p=4822&preview=1&_ppp=dc3281f82d

            axel Axel Schwenke added a comment - Hi Jean-Marc, do you agree that I use this report for a blog post? You can see a preview here: https://mariadb.org/?p=4822&preview=1&_ppp=dc3281f82d

            This is an amazing in-depth analysis !

            We did find a workaround in the meantime by reworking the db but i've tried to switch to INNODB and i go from 52sec to 2sec !

            You can absolutely use the data i sent you for a post, it's all been anonymized.

            Thx ver much for your help !

            falcoris Jean-Marc Messina added a comment - This is an amazing in-depth analysis ! We did find a workaround in the meantime by reworking the db but i've tried to switch to INNODB and i go from 52sec to 2sec ! You can absolutely use the data i sent you for a post, it's all been anonymized. Thx ver much for your help !
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 85139 ] MariaDB v4 [ 153651 ]

            People

              axel Axel Schwenke
              falcoris Jean-Marc Messina
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.