[MDEV-23634] Select query hanged the server and leads to OOM in MariaDB 10.4.14 Created: 2020-09-01  Updated: 2023-04-18  Resolved: 2021-04-08

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1, 10.4.14, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.4.19, 10.5.10

Type: Bug Priority: Critical
Reporter: Nilnandan Joshi Assignee: Sergei Petrunia
Resolution: Fixed Votes: 2
Labels: None

Attachments: File lq1.sql     File lq1.tables    
Issue Links:
Duplicate
duplicates MDEV-26059 Query in the range optimizer doesn't ... Closed
Problem/Incident
causes MDEV-30605 Wrong result while using index for gr... Closed
Relates
relates to MDEV-9764 MariaDB does not limit memory used fo... Closed
relates to MDEV-21958 Query having many NOT-IN clauses runn... Closed

 Description   

Hi Team,

Select query hanged the server with high memory usage and leads to OOM in MariaDB 10.4.14.

[root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 10
Server version: 10.4.14-MariaDB MariaDB Server
...
MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage_client` (
 ...
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
Query OK, 0 rows affected (0.037 sec)
 
MariaDB [test]> CREATE TABLE `u_flexera_cloud_usage` (
...
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.039 sec)
 
MariaDB [test]> quit
Bye
[root@localhost ~]# 

Query is something like

 explain
 SELECT u_flexera_cloud_usage_client0.`sys_id` 
    FROM (u_flexera_cloud_usage_client u_flexera_cloud_usage_client0  
        LEFT JOIN u_flexera_cloud_usage u_flexera_cloud_usage1 ON u_flexera_cloud_usage_client0.`u_usage` = u_flexera_cloud_usage1.`sys_id` )  
    WHERE u_flexera_cloud_usage1.`u_feature_name` = 'ToscaCommander' AND u_flexera_cloud_usage1.`u_server` = 'b319fdf2376a7a0ca30fa16043990e72' AND u_flexera_cloud_usage_client0.`u_host_id` IN ('6a426ad6cef5e0830cb64bfcbff4e0b3ोरोबकहसोी') AND u_flexera_cloud_usage1.`u_collected` > '2020-08-21 08:00:25' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' ....

There is a long list of sys_id != and condition.

[root@localhost ~]# 
[root@localhost ~]# mysql -uroot -p --socket=/tmp/mysql_sandbox10414.sock test < lq1.sql 
Enter password: 
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
[root@localhost ~]# 
...
..
[root@localhost data]# cat /var/log/messages | grep mysqld
....
Aug  7 05:38:45 localhost kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Aug  7 05:38:45 localhost kernel: mysqld cpuset=/ mems_allowed=0
Aug  7 05:38:45 localhost kernel: CPU: 0 PID: 18039 Comm: mysqld Kdump: loaded Not tainted 3.10.0-1127.el7.x86_64 #1
Aug  7 05:38:45 localhost kernel: [17906]     0 17906    28355        1      13       87             0 mysqld_safe
Aug  7 05:38:45 localhost kernel: [18004]     0 18004  1133133   431832    1713   400514             0 mysqld
Aug  7 05:38:45 localhost kernel: Out of memory: Kill process 18004 (mysqld) score 890 or sacrifice child
Aug  7 05:38:45 localhost kernel: Killed process 18004 (mysqld), UID 0, total-vm:4532532kB, anon-rss:1727328kB, file-rss:0kB, shmem-rss:0kB
 
[root@localhost ~]# top | grep mysqld
18103 root      20   0 4399336   1.6g      0 S 46.0 90.6   2:35.61 mysqld                                                                                                               
18103 root      20   0 4464872   1.6g      0 S 88.8 89.9   2:38.38 mysqld                                                                                                               
18103 root      20   0 4530408   1.6g      0 S 85.7 91.3   2:41.01 mysqld                                                                                                               
18103 root      20   0 4530408   1.6g      0 S 19.8 91.9   2:42.33 mysqld                                                                                                               
18167 root      20   0 1253308  83844  11188 S  2.6  4.5   0:00.08 mysqld             -- OOM happened and server restarted.                                                                                                   
17906 root      20   0  113424    892    612 S  0.3  0.0   0:00.02 mysqld_safe                                                                                                          
18167 root      20   0 1253308  83844  11188 S  0.3  4.5   0:00.09 mysqld                                                                                                               

Required table structures and query is attached with this bug.



 Comments   
Comment by Elena Stepanova [ 2020-09-21 ]

Reproducible as described – create the tables from lq1.tables and then source the query file lq1.sql.

To avoid OOM, it may make sense to set max-session-mem-used introduced in MDEV-9764 which alice linked to this report.

However, the problem itself still needs to be looked at. MySQL 5.7 and 8.0 return the plan in seconds, while all of MariaDB 10.1-10.5 keep processing this EXPLAIN endlessly (I didn't wait till OOM and interrupted each time around 5G and several minutes of execution).

A couple of consequent selects from performance_schema on 10.5.5:

+-----------------------------------------------------------------------------+-------------+---------------------------+
| event_name                                                                  | count_alloc | sum_number_of_bytes_alloc |
+-----------------------------------------------------------------------------+-------------+---------------------------+
| memory/sql/QUICK_RANGE_SELECT::alloc                                        |        2527 |                3242690880 |
| memory/innodb/buf_buf_pool                                                  |           1 |                 134217728 |
| memory/sql/thd::main_mem_root                                               |         264 |                  92721160 |
| memory/sql/XID                                                              |           1 |                  19922944 |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name |           1 |                  10457088 |
+-----------------------------------------------------------------------------+-------------+---------------------------+
+-----------------------------------------------------------------------------+-------------+---------------------------+
| event_name                                                                  | count_alloc | sum_number_of_bytes_alloc |
+-----------------------------------------------------------------------------+-------------+---------------------------+
| memory/sql/QUICK_RANGE_SELECT::alloc                                        |        2893 |                4249179168 |
| memory/innodb/buf_buf_pool                                                  |           1 |                 134217728 |
| memory/sql/thd::main_mem_root                                               |         264 |                  92721160 |
| memory/sql/XID                                                              |           1 |                  19922944 |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name |           1 |                  10457088 |
+-----------------------------------------------------------------------------+-------------+---------------------------+

Comment by Sergei Petrunia [ 2021-03-19 ]

One doesn't need most of the query, this part with a lot of non-equalities is sufficient:

explain
SELECT u_flexera_cloud_usage_client0.`sys_id`
FROM
u_flexera_cloud_usage_client u_flexera_cloud_usage_client0
WHERE
u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbb3' AND 
u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbba' AND 
u_flexera_cloud_usage_client0.`sys_id` != '2c6ac48f1b7298d0f518ea0e6e4bcbc2' AND
...

Comment by Sergei Petrunia [ 2021-03-19 ]

The tree I'm trying this on has this fix, but it doesn't help (because it's about NOT IN, not individual non-equalities):

commit 066212d16cc2b3995e2c53de9e2f799fdab557bd
Author: Sergei Petrunia <psergey@askmonty.org>
Date:   Tue Dec 15 14:38:30 2020 +0300
 
    MDEV-21958: Query having many NOT-IN clauses running forever
    
    Basic variant of the fix: do not consider conditions in form
    
      unique_key NOT IN (c1,c2...)
    
    to be sargable. If there are only a few constants, the condition
    is not selective. If there are a lot constants, the overhead of
    processing such a huge range list is not worth it.
    
    (Backport to 10.2)

Comment by Sergei Petrunia [ 2021-04-06 ]

Developed an analog of the fix for MDEV-21958 that handles NOT INs. With it, the query memory usage becomes:

MAX_MEMORY_USED: 77 141 656

Comment by Sergei Petrunia [ 2021-04-07 ]

Patch: http://lists.askmonty.org/pipermail/commits/2021-April/014538.html

Comment by Sergei Petrunia [ 2021-04-07 ]

bb-10.5-mdev23634

Comment by Sergei Petrunia [ 2021-04-08 ]

bb-10.4-mdev23634

Generated at Thu Feb 08 09:23:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.