Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23634

Select query hanged the server and leads to OOM in MariaDB 10.4.14

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4.14, 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
    • 10.4.19, 10.5.10
    • Optimizer
    • None

    Description

      Hi Team,

      Select query hanged the server with high memory usage and leads to OOM in MariaDB 10.4.14.

      [root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock
      Enter password: 
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 10
      Server version: 10.4.14-MariaDB MariaDB Server
      ...
      MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage_client` (
       ...
          -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
      Query OK, 0 rows affected (0.037 sec)
       
      MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage` (
      ...
          -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
      Query OK, 0 rows affected (0.039 sec)
       
      MariaDB [test]> quit
      Bye
      [root@localhost ~]# 
      

      Query is something like

       explain
       SELECT u_flexera_cloud_usage_client0.`sys_id` 
          FROM (u_flexera_cloud_usage_client u_flexera_cloud_usage_client0  
              LEFT JOIN u_flexera_cloud_usage u_flexera_cloud_usage1 ON u_flexera_cloud_usage_client0.`u_usage` = u_flexera_cloud_usage1.`sys_id` )  
          WHERE u_flexera_cloud_usage1.`u_feature_name` = 'ToscaCommander' AND u_flexera_cloud_usage1.`u_server` = 'b319fdf2376a7a0ca30fa16043990e72' AND u_flexera_cloud_usage_client0.`u_host_id` IN ('6a426ad6cef5e0830cb64bfcbff4e0b3ोरोबकहसोी') AND u_flexera_cloud_usage1.`u_collected` > '2020-08-21 08:00:25' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' ....
      

      There is a long list of sys_id != and condition.

      [root@localhost ~]# 
      [root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock test < lq1.sql 
      Enter password: 
      ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
      [root@localhost ~]# 
      ...
      ..
      [root@localhost data]# cat /var/log/messages | grep mysqld
      ....
      Aug  7 05:38:45 localhost kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
      Aug  7 05:38:45 localhost kernel: mysqld cpuset=/ mems_allowed=0
      Aug  7 05:38:45 localhost kernel: CPU: 0 PID: 18039 Comm: mysqld Kdump: loaded Not tainted 3.10.0-1127.el7.x86_64 #1
      Aug  7 05:38:45 localhost kernel: [17906]     0 17906    28355        1      13       87             0 mysqld_safe
      Aug  7 05:38:45 localhost kernel: [18004]     0 18004  1133133   431832    1713   400514             0 mysqld
      Aug  7 05:38:45 localhost kernel: Out of memory: Kill process 18004 (mysqld) score 890 or sacrifice child
      Aug  7 05:38:45 localhost kernel: Killed process 18004 (mysqld), UID 0, total-vm:4532532kB, anon-rss:1727328kB, file-rss:0kB, shmem-rss:0kB
       
      [root@localhost ~]# top | grep mysqld
      18103 root      20   0 4399336   1.6g      0 S 46.0 90.6   2:35.61 mysqld                                                                                                               
      18103 root      20   0 4464872   1.6g      0 S 88.8 89.9   2:38.38 mysqld                                                                                                               
      18103 root      20   0 4530408   1.6g      0 S 85.7 91.3   2:41.01 mysqld                                                                                                               
      18103 root      20   0 4530408   1.6g      0 S 19.8 91.9   2:42.33 mysqld                                                                                                               
      18167 root      20   0 1253308  83844  11188 S  2.6  4.5   0:00.08 mysqld             -- OOM happened and server restarted.                                                                                                   
      17906 root      20   0  113424    892    612 S  0.3  0.0   0:00.02 mysqld_safe                                                                                                          
      18167 root      20   0 1253308  83844  11188 S  0.3  4.5   0:00.09 mysqld                                                                                                               
      

      Required table structures and query is attached with this bug.

      Attachments

        1. lq1.sql
          3.67 MB
        2. lq1.tables
          1 kB

        Issue Links

          Activity

            niljoshi Nilnandan Joshi created issue -
            niljoshi Nilnandan Joshi made changes -
            Field Original Value New Value
            Attachment lq1.sql [ 53706 ]
            niljoshi Nilnandan Joshi made changes -
            Attachment lq1.tables [ 53707 ]
            niljoshi Nilnandan Joshi made changes -
            Affects Version/s 10.3.24 [ 24306 ]
            niljoshi Nilnandan Joshi made changes -
            Affects Version/s 10.3.24 [ 24306 ]
            niljoshi Nilnandan Joshi made changes -
            Description Hi Team,

            Select query hanged the server with high memory usage and leads to OOM in MariaDB 10.4.14.

            {code}
            [root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock
            Enter password:
            Welcome to the MariaDB monitor. Commands end with ; or \g.
            Your MariaDB connection id is 10
            Server version: 10.4.14-MariaDB MariaDB Server
            ...
            MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage_client` (
             ...
                -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
            Query OK, 0 rows affected (0.037 sec)

            MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage` (
            ...
                -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
            Query OK, 0 rows affected (0.039 sec)

            MariaDB [test]> quit
            Bye
            [root@localhost ~]#
            [root@localhost ~]#
            [root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock test < lq1.sql
            Enter password:
            ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
            [root@localhost ~]#
            ...
            ..
            [root@localhost data]# cat /var/log/messages | grep mysqld
            ....
            Aug 7 05:38:45 localhost kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
            Aug 7 05:38:45 localhost kernel: mysqld cpuset=/ mems_allowed=0
            Aug 7 05:38:45 localhost kernel: CPU: 0 PID: 18039 Comm: mysqld Kdump: loaded Not tainted 3.10.0-1127.el7.x86_64 #1
            Aug 7 05:38:45 localhost kernel: [17906] 0 17906 28355 1 13 87 0 mysqld_safe
            Aug 7 05:38:45 localhost kernel: [18004] 0 18004 1133133 431832 1713 400514 0 mysqld
            Aug 7 05:38:45 localhost kernel: Out of memory: Kill process 18004 (mysqld) score 890 or sacrifice child
            Aug 7 05:38:45 localhost kernel: Killed process 18004 (mysqld), UID 0, total-vm:4532532kB, anon-rss:1727328kB, file-rss:0kB, shmem-rss:0kB

            [root@localhost ~]# top | grep mysqld
            18103 root 20 0 4399336 1.6g 0 S 46.0 90.6 2:35.61 mysqld
            18103 root 20 0 4464872 1.6g 0 S 88.8 89.9 2:38.38 mysqld
            18103 root 20 0 4530408 1.6g 0 S 85.7 91.3 2:41.01 mysqld
            18103 root 20 0 4530408 1.6g 0 S 19.8 91.9 2:42.33 mysqld
            18167 root 20 0 1253308 83844 11188 S 2.6 4.5 0:00.08 mysqld -- OOM happened and server restarted.
            17906 root 20 0 113424 892 612 S 0.3 0.0 0:00.02 mysqld_safe
            18167 root 20 0 1253308 83844 11188 S 0.3 4.5 0:00.09 mysqld
            {code}

            Required table structures and query is attached with this bug.

            Hi Team,

            Select query hanged the server with high memory usage and leads to OOM in MariaDB 10.4.14.

            {code}
            [root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock
            Enter password:
            Welcome to the MariaDB monitor. Commands end with ; or \g.
            Your MariaDB connection id is 10
            Server version: 10.4.14-MariaDB MariaDB Server
            ...
            MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage_client` (
             ...
                -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
            Query OK, 0 rows affected (0.037 sec)

            MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage` (
            ...
                -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
            Query OK, 0 rows affected (0.039 sec)

            MariaDB [test]> quit
            Bye
            [root@localhost ~]#
            {code}

            Query is something like

            {code}
             explain
             SELECT u_flexera_cloud_usage_client0.`sys_id`
                FROM (u_flexera_cloud_usage_client u_flexera_cloud_usage_client0
                    LEFT JOIN u_flexera_cloud_usage u_flexera_cloud_usage1 ON u_flexera_cloud_usage_client0.`u_usage` = u_flexera_cloud_usage1.`sys_id` )
                WHERE u_flexera_cloud_usage1.`u_feature_name` = 'ToscaCommander' AND u_flexera_cloud_usage1.`u_server` = 'b319fdf2376a7a0ca30fa16043990e72' AND u_flexera_cloud_usage_client0.`u_host_id` IN ('6a426ad6cef5e0830cb64bfcbff4e0b3ोरोबकहसोी') AND u_flexera_cloud_usage1.`u_collected` > '2020-08-21 08:00:25' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' ....
            {code}

            There is a long list of sys_id != and condition.

            {code}
            [root@localhost ~]#
            [root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock test < lq1.sql
            Enter password:
            ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
            [root@localhost ~]#
            ...
            ..
            [root@localhost data]# cat /var/log/messages | grep mysqld
            ....
            Aug 7 05:38:45 localhost kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
            Aug 7 05:38:45 localhost kernel: mysqld cpuset=/ mems_allowed=0
            Aug 7 05:38:45 localhost kernel: CPU: 0 PID: 18039 Comm: mysqld Kdump: loaded Not tainted 3.10.0-1127.el7.x86_64 #1
            Aug 7 05:38:45 localhost kernel: [17906] 0 17906 28355 1 13 87 0 mysqld_safe
            Aug 7 05:38:45 localhost kernel: [18004] 0 18004 1133133 431832 1713 400514 0 mysqld
            Aug 7 05:38:45 localhost kernel: Out of memory: Kill process 18004 (mysqld) score 890 or sacrifice child
            Aug 7 05:38:45 localhost kernel: Killed process 18004 (mysqld), UID 0, total-vm:4532532kB, anon-rss:1727328kB, file-rss:0kB, shmem-rss:0kB

            [root@localhost ~]# top | grep mysqld
            18103 root 20 0 4399336 1.6g 0 S 46.0 90.6 2:35.61 mysqld
            18103 root 20 0 4464872 1.6g 0 S 88.8 89.9 2:38.38 mysqld
            18103 root 20 0 4530408 1.6g 0 S 85.7 91.3 2:41.01 mysqld
            18103 root 20 0 4530408 1.6g 0 S 19.8 91.9 2:42.33 mysqld
            18167 root 20 0 1253308 83844 11188 S 2.6 4.5 0:00.08 mysqld -- OOM happened and server restarted.
            17906 root 20 0 113424 892 612 S 0.3 0.0 0:00.02 mysqld_safe
            18167 root 20 0 1253308 83844 11188 S 0.3 4.5 0:00.09 mysqld
            {code}


            Required table structures and query is attached with this bug.

            alice Alice Sherepa made changes -

            Reproducible as described – create the tables from lq1.tables and then source the query file lq1.sql.

            To avoid OOM, it may make sense to set max-session-mem-used introduced in MDEV-9764 which alice linked to this report.

            However, the problem itself still needs to be looked at. MySQL 5.7 and 8.0 return the plan in seconds, while all of MariaDB 10.1-10.5 keep processing this EXPLAIN endlessly (I didn't wait till OOM and interrupted each time around 5G and several minutes of execution).

            A couple of consequent selects from performance_schema on 10.5.5:

            +-----------------------------------------------------------------------------+-------------+---------------------------+
            | event_name                                                                  | count_alloc | sum_number_of_bytes_alloc |
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            | memory/sql/QUICK_RANGE_SELECT::alloc                                        |        2527 |                3242690880 |
            | memory/innodb/buf_buf_pool                                                  |           1 |                 134217728 |
            | memory/sql/thd::main_mem_root                                               |         264 |                  92721160 |
            | memory/sql/XID                                                              |           1 |                  19922944 |
            | memory/performance_schema/events_statements_summary_by_thread_by_event_name |           1 |                  10457088 |
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            | event_name                                                                  | count_alloc | sum_number_of_bytes_alloc |
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            | memory/sql/QUICK_RANGE_SELECT::alloc                                        |        2893 |                4249179168 |
            | memory/innodb/buf_buf_pool                                                  |           1 |                 134217728 |
            | memory/sql/thd::main_mem_root                                               |         264 |                  92721160 |
            | memory/sql/XID                                                              |           1 |                  19922944 |
            | memory/performance_schema/events_statements_summary_by_thread_by_event_name |           1 |                  10457088 |
            +-----------------------------------------------------------------------------+-------------+---------------------------+
            

            elenst Elena Stepanova added a comment - Reproducible as described – create the tables from lq1.tables and then source the query file lq1.sql . To avoid OOM, it may make sense to set max-session-mem-used introduced in MDEV-9764 which alice linked to this report. However, the problem itself still needs to be looked at. MySQL 5.7 and 8.0 return the plan in seconds, while all of MariaDB 10.1-10.5 keep processing this EXPLAIN endlessly (I didn't wait till OOM and interrupted each time around 5G and several minutes of execution). A couple of consequent selects from performance_schema on 10.5.5: +-----------------------------------------------------------------------------+-------------+---------------------------+ | event_name | count_alloc | sum_number_of_bytes_alloc | +-----------------------------------------------------------------------------+-------------+---------------------------+ | memory/sql/QUICK_RANGE_SELECT::alloc | 2527 | 3242690880 | | memory/innodb/buf_buf_pool | 1 | 134217728 | | memory/sql/thd::main_mem_root | 264 | 92721160 | | memory/sql/XID | 1 | 19922944 | | memory/performance_schema/events_statements_summary_by_thread_by_event_name | 1 | 10457088 | +-----------------------------------------------------------------------------+-------------+---------------------------+ +-----------------------------------------------------------------------------+-------------+---------------------------+ | event_name | count_alloc | sum_number_of_bytes_alloc | +-----------------------------------------------------------------------------+-------------+---------------------------+ | memory/sql/QUICK_RANGE_SELECT::alloc | 2893 | 4249179168 | | memory/innodb/buf_buf_pool | 1 | 134217728 | | memory/sql/thd::main_mem_root | 264 | 92721160 | | memory/sql/XID | 1 | 19922944 | | memory/performance_schema/events_statements_summary_by_thread_by_event_name | 1 | 10457088 | +-----------------------------------------------------------------------------+-------------+---------------------------+
            elenst Elena Stepanova made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Affects Version/s 10.1 [ 16100 ]
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5 [ 23123 ]
            Assignee Sergei Petrunia [ psergey ]
            elenst Elena Stepanova made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            psergei Sergei Petrunia made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]

            One doesn't need most of the query, this part with a lot of non-equalities is sufficient:

            explain
            SELECT u_flexera_cloud_usage_client0.`sys_id`
            FROM
            u_flexera_cloud_usage_client u_flexera_cloud_usage_client0
            WHERE
            u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND 
            u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' AND 
            u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbc2' AND
            ...
            

            psergei Sergei Petrunia added a comment - One doesn't need most of the query, this part with a lot of non-equalities is sufficient: explain SELECT u_flexera_cloud_usage_client0.`sys_id` FROM u_flexera_cloud_usage_client u_flexera_cloud_usage_client0 WHERE u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbc2' AND ...
            psergei Sergei Petrunia added a comment - - edited

            The tree I'm trying this on has this fix, but it doesn't help (because it's about NOT IN, not individual non-equalities):

            commit 066212d16cc2b3995e2c53de9e2f799fdab557bd
            Author: Sergei Petrunia <psergey@askmonty.org>
            Date:   Tue Dec 15 14:38:30 2020 +0300
             
                MDEV-21958: Query having many NOT-IN clauses running forever
                
                Basic variant of the fix: do not consider conditions in form
                
                  unique_key NOT IN (c1,c2...)
                
                to be sargable. If there are only a few constants, the condition
                is not selective. If there are a lot constants, the overhead of
                processing such a huge range list is not worth it.
                
                (Backport to 10.2)
            

            psergei Sergei Petrunia added a comment - - edited The tree I'm trying this on has this fix, but it doesn't help (because it's about NOT IN, not individual non-equalities): commit 066212d16cc2b3995e2c53de9e2f799fdab557bd Author: Sergei Petrunia <psergey@askmonty.org> Date: Tue Dec 15 14:38:30 2020 +0300   MDEV-21958: Query having many NOT-IN clauses running forever Basic variant of the fix: do not consider conditions in form unique_key NOT IN (c1,c2...) to be sargable. If there are only a few constants, the condition is not selective. If there are a lot constants, the overhead of processing such a huge range list is not worth it. (Backport to 10.2)
            psergei Sergei Petrunia made changes -

            Developed an analog of the fix for MDEV-21958 that handles NOT INs. With it, the query memory usage becomes:

            MAX_MEMORY_USED: 77 141 656
            

            psergei Sergei Petrunia added a comment - Developed an analog of the fix for MDEV-21958 that handles NOT INs. With it, the query memory usage becomes: MAX_MEMORY_USED: 77 141 656
            psergei Sergei Petrunia added a comment - Patch: http://lists.askmonty.org/pipermail/commits/2021-April/014538.html

            bb-10.5-mdev23634

            psergei Sergei Petrunia added a comment - bb-10.5-mdev23634
            julien.fritsch Julien Fritsch made changes -
            Assignee Sergei Petrunia [ psergey ] Julien Fritsch [ julien.fritsch ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Julien Fritsch [ julien.fritsch ] Michael Widenius [ monty ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            psergei Sergei Petrunia made changes -
            Assignee Michael Widenius [ monty ] Sergei Petrunia [ psergey ]
            psergei Sergei Petrunia made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.4.19 [ 25205 ]
            Fix Version/s 10.5.10 [ 25204 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]

            bb-10.4-mdev23634

            psergei Sergei Petrunia added a comment - bb-10.4-mdev23634
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 113119 ] MariaDB v4 [ 158326 ]
            Johnston Rex Johnston made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658 197818
            Zendesk active tickets 201658

            People

              psergei Sergei Petrunia
              niljoshi Nilnandan Joshi
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.