Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23634

Select query hanged the server and leads to OOM in MariaDB 10.4.14

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4.14, 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
    • 10.4.19, 10.5.10
    • Optimizer
    • None

    Description

      Hi Team,

      Select query hanged the server with high memory usage and leads to OOM in MariaDB 10.4.14.

      [root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock
      Enter password: 
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 10
      Server version: 10.4.14-MariaDB MariaDB Server
      ...
      MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage_client` (
       ...
          -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
      Query OK, 0 rows affected (0.037 sec)
       
      MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage` (
      ...
          -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
      Query OK, 0 rows affected (0.039 sec)
       
      MariaDB [test]> quit
      Bye
      [root@localhost ~]# 
      

      Query is something like

       explain
       SELECT u_flexera_cloud_usage_client0.`sys_id` 
          FROM (u_flexera_cloud_usage_client u_flexera_cloud_usage_client0  
              LEFT JOIN u_flexera_cloud_usage u_flexera_cloud_usage1 ON u_flexera_cloud_usage_client0.`u_usage` = u_flexera_cloud_usage1.`sys_id` )  
          WHERE u_flexera_cloud_usage1.`u_feature_name` = 'ToscaCommander' AND u_flexera_cloud_usage1.`u_server` = 'b319fdf2376a7a0ca30fa16043990e72' AND u_flexera_cloud_usage_client0.`u_host_id` IN ('6a426ad6cef5e0830cb64bfcbff4e0b3ोरोबकहसोी') AND u_flexera_cloud_usage1.`u_collected` > '2020-08-21 08:00:25' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' ....
      

      There is a long list of sys_id != and condition.

      [root@localhost ~]# 
      [root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock test < lq1.sql 
      Enter password: 
      ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
      [root@localhost ~]# 
      ...
      ..
      [root@localhost data]# cat /var/log/messages | grep mysqld
      ....
      Aug  7 05:38:45 localhost kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
      Aug  7 05:38:45 localhost kernel: mysqld cpuset=/ mems_allowed=0
      Aug  7 05:38:45 localhost kernel: CPU: 0 PID: 18039 Comm: mysqld Kdump: loaded Not tainted 3.10.0-1127.el7.x86_64 #1
      Aug  7 05:38:45 localhost kernel: [17906]     0 17906    28355        1      13       87             0 mysqld_safe
      Aug  7 05:38:45 localhost kernel: [18004]     0 18004  1133133   431832    1713   400514             0 mysqld
      Aug  7 05:38:45 localhost kernel: Out of memory: Kill process 18004 (mysqld) score 890 or sacrifice child
      Aug  7 05:38:45 localhost kernel: Killed process 18004 (mysqld), UID 0, total-vm:4532532kB, anon-rss:1727328kB, file-rss:0kB, shmem-rss:0kB
       
      [root@localhost ~]# top | grep mysqld
      18103 root      20   0 4399336   1.6g      0 S 46.0 90.6   2:35.61 mysqld                                                                                                               
      18103 root      20   0 4464872   1.6g      0 S 88.8 89.9   2:38.38 mysqld                                                                                                               
      18103 root      20   0 4530408   1.6g      0 S 85.7 91.3   2:41.01 mysqld                                                                                                               
      18103 root      20   0 4530408   1.6g      0 S 19.8 91.9   2:42.33 mysqld                                                                                                               
      18167 root      20   0 1253308  83844  11188 S  2.6  4.5   0:00.08 mysqld             -- OOM happened and server restarted.                                                                                                   
      17906 root      20   0  113424    892    612 S  0.3  0.0   0:00.02 mysqld_safe                                                                                                          
      18167 root      20   0 1253308  83844  11188 S  0.3  4.5   0:00.09 mysqld                                                                                                               
      

      Required table structures and query is attached with this bug.

      Attachments

        1. lq1.sql
          3.67 MB
        2. lq1.tables
          1 kB

        Issue Links

          Activity

            Reproducible as described – create the tables from lq1.tables and then source the query file lq1.sql.

            To avoid OOM, it may make sense to set max-session-mem-used introduced in MDEV-9764 which alice linked to this report.

            However, the problem itself still needs to be looked at. MySQL 5.7 and 8.0 return the plan in seconds, while all of MariaDB 10.1-10.5 keep processing this EXPLAIN endlessly (I didn't wait till OOM and interrupted each time around 5G and several minutes of execution).

            A couple of consequent selects from performance_schema on 10.5.5:

            +-----------------------------------------------------------------------------+-------------+---------------------------+
            | event_name                                                                  | count_alloc | sum_number_of_bytes_alloc |
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            | memory/sql/QUICK_RANGE_SELECT::alloc                                        |        2527 |                3242690880 |
            | memory/innodb/buf_buf_pool                                                  |           1 |                 134217728 |
            | memory/sql/thd::main_mem_root                                               |         264 |                  92721160 |
            | memory/sql/XID                                                              |           1 |                  19922944 |
            | memory/performance_schema/events_statements_summary_by_thread_by_event_name |           1 |                  10457088 |
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            | event_name                                                                  | count_alloc | sum_number_of_bytes_alloc |
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            | memory/sql/QUICK_RANGE_SELECT::alloc                                        |        2893 |                4249179168 |
            | memory/innodb/buf_buf_pool                                                  |           1 |                 134217728 |
            | memory/sql/thd::main_mem_root                                               |         264 |                  92721160 |
            | memory/sql/XID                                                              |           1 |                  19922944 |
            | memory/performance_schema/events_statements_summary_by_thread_by_event_name |           1 |                  10457088 |
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            

            elenst Elena Stepanova added a comment - Reproducible as described – create the tables from lq1.tables and then source the query file lq1.sql . To avoid OOM, it may make sense to set max-session-mem-used introduced in MDEV-9764 which alice linked to this report. However, the problem itself still needs to be looked at. MySQL 5.7 and 8.0 return the plan in seconds, while all of MariaDB 10.1-10.5 keep processing this EXPLAIN endlessly (I didn't wait till OOM and interrupted each time around 5G and several minutes of execution). A couple of consequent selects from performance_schema on 10.5.5: +-----------------------------------------------------------------------------+-------------+---------------------------+ | event_name | count_alloc | sum_number_of_bytes_alloc | +-----------------------------------------------------------------------------+-------------+---------------------------+ | memory/sql/QUICK_RANGE_SELECT::alloc | 2527 | 3242690880 | | memory/innodb/buf_buf_pool | 1 | 134217728 | | memory/sql/thd::main_mem_root | 264 | 92721160 | | memory/sql/XID | 1 | 19922944 | | memory/performance_schema/events_statements_summary_by_thread_by_event_name | 1 | 10457088 | +-----------------------------------------------------------------------------+-------------+---------------------------+ +-----------------------------------------------------------------------------+-------------+---------------------------+ | event_name | count_alloc | sum_number_of_bytes_alloc | +-----------------------------------------------------------------------------+-------------+---------------------------+ | memory/sql/QUICK_RANGE_SELECT::alloc | 2893 | 4249179168 | | memory/innodb/buf_buf_pool | 1 | 134217728 | | memory/sql/thd::main_mem_root | 264 | 92721160 | | memory/sql/XID | 1 | 19922944 | | memory/performance_schema/events_statements_summary_by_thread_by_event_name | 1 | 10457088 | +-----------------------------------------------------------------------------+-------------+---------------------------+

            One doesn't need most of the query, this part with a lot of non-equalities is sufficient:

            explain
            SELECT u_flexera_cloud_usage_client0.`sys_id`
            FROM
            u_flexera_cloud_usage_client u_flexera_cloud_usage_client0
            WHERE
            u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND 
            u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' AND 
            u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbc2' AND
            ...
            

            psergei Sergei Petrunia added a comment - One doesn't need most of the query, this part with a lot of non-equalities is sufficient: explain SELECT u_flexera_cloud_usage_client0.`sys_id` FROM u_flexera_cloud_usage_client u_flexera_cloud_usage_client0 WHERE u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbc2' AND ...
            psergei Sergei Petrunia added a comment - - edited

            The tree I'm trying this on has this fix, but it doesn't help (because it's about NOT IN, not individual non-equalities):

            commit 066212d16cc2b3995e2c53de9e2f799fdab557bd
            Author: Sergei Petrunia <psergey@askmonty.org>
            Date:   Tue Dec 15 14:38:30 2020 +0300
             
                MDEV-21958: Query having many NOT-IN clauses running forever
                
                Basic variant of the fix: do not consider conditions in form
                
                  unique_key NOT IN (c1,c2...)
                
                to be sargable. If there are only a few constants, the condition
                is not selective. If there are a lot constants, the overhead of
                processing such a huge range list is not worth it.
                
                (Backport to 10.2)
            

            psergei Sergei Petrunia added a comment - - edited The tree I'm trying this on has this fix, but it doesn't help (because it's about NOT IN, not individual non-equalities): commit 066212d16cc2b3995e2c53de9e2f799fdab557bd Author: Sergei Petrunia <psergey@askmonty.org> Date: Tue Dec 15 14:38:30 2020 +0300   MDEV-21958: Query having many NOT-IN clauses running forever Basic variant of the fix: do not consider conditions in form unique_key NOT IN (c1,c2...) to be sargable. If there are only a few constants, the condition is not selective. If there are a lot constants, the overhead of processing such a huge range list is not worth it. (Backport to 10.2)

            Developed an analog of the fix for MDEV-21958 that handles NOT INs. With it, the query memory usage becomes:

            MAX_MEMORY_USED: 77 141 656
            

            psergei Sergei Petrunia added a comment - Developed an analog of the fix for MDEV-21958 that handles NOT INs. With it, the query memory usage becomes: MAX_MEMORY_USED: 77 141 656
            psergei Sergei Petrunia added a comment - Patch: http://lists.askmonty.org/pipermail/commits/2021-April/014538.html

            bb-10.5-mdev23634

            psergei Sergei Petrunia added a comment - bb-10.5-mdev23634

            bb-10.4-mdev23634

            psergei Sergei Petrunia added a comment - bb-10.4-mdev23634

            People

              psergei Sergei Petrunia
              niljoshi Nilnandan Joshi
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.