[MDEV-9764] MariaDB does not limit memory used for range optimization Created: 2016-03-19  Updated: 2021-09-30  Resolved: 2017-01-15

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: 10.1.21, 10.2.4

Type: Bug Priority: Critical
Reporter: Valerii Kravchuk Assignee: Sergei Golubchik
Resolution: Fixed Votes: 2
Labels: upstream-fixed

Attachments: PNG File mariadb-vs-mysql-memory-usage.png    
Issue Links:
Relates
relates to MDEV-11574 Queries with IN clauses that are push... Closed
relates to MDEV-9750 Quick memory exhaustion with 'extende... Closed
relates to MDEV-10046 InnoDB Range Optimizer Regression Open
relates to MDEV-10175 range optimizer calls records_in_rang... Closed
relates to MDEV-11309 MariaDB server restarts every night (... Closed
relates to MDEV-23634 Select query hanged the server and le... Closed
Sprint: 10.2.1-1, 10.2.1-2, 5.5.50, 10.1.15, 10.2.4-1, 10.1.21

 Description   

`Unlike MySQL 5.6 (that provides eq_range_index_dive_limit server variable) and MySQL 5.7 (that additionally provides range_optimizer_max_mem_size server variable) MariaDB 10.x does NOT limit memory used for range optimization in any way (neither directly nor indirectly).

As a result, for some queries with huge list of value in the IN () list in the WHERE clause (having number of items comparable to the number of rows in the table) MariaDB 10.x does not only execute the query longer than MySQL 5.6 and 5.7 with default settings (as it spends more time on query optimization), but also uses a lot of memory in the process (see https://bugs.mysql.com/bug.php?id=78973 for some estimations and links). With many concurrent connections running this kind of queries we may easily end up with OOM condition for the mysqld process.

Consider a table like this:

create table t (id bigint not null auto_increment, ts datetime, primary key(id,ts)) engine=InnoDB;
insert into t(ts) values(now());
insert into t(ts) values(now());
replace into t(ts) select now() from t t1, t t2, t t3, t t4, t t5, t t6, t t7, t t8, t t9, t t10, t t11, t t12, t t13, t t14, t t15,t t16, t t17, t t18;
analyze table t;

and a query like this:

SELECT * FROM t WHERE id IN (10000,10001,10002, ... 54106,54107);

Run the query on different versions of MySQL and MariaDB, compare execution times and memory used in the process. It may be easier to see the memory impact on versions that do not allow to monitor memory used per session or "state" by running many copies of the same query concurrently using mysqlslap, for example.



 Comments   
Comment by Valerii Kravchuk [ 2016-03-21 ]

I've checked today with recent MariaDB 10.1.13 built from current sources on my QuadCore box. I've got the following results:

Execution of the query:

...
| 54107 | 2016-03-21 11:43:03 |
+-------+---------------------+
43238 rows in set (0.45 sec)

Then I've executed mysqlslap like this:

mysqlslap -uroot --concurrency=100 --iterations=100 --create-schema=test --query=/tmp/query.sql --delimiter=";" --socket=/tmp/mysql.sock

and in another session got the following on memory usage:

MariaDB [test]> select sum(memory_used)/1024/1024, state, count(*), max(Time), avg(Time) FROM information_schema.processlist group by state with rollup;
+----------------------------+----------------------+----------+-----------+-----------+
| sum(memory_used)/1024/1024 | state                | count(*) | max(Time) | avg(Time) |
+----------------------------+----------------------+----------+-----------+-----------+
|                 0.06581116 |                      |        1 |         5 |    5.0000 |
|                 0.08060455 | Filling schema table |        1 |         0 |    0.0000 |
|              2235.60366058 | statistics           |      100 |         5 |    5.0000 |
|              2235.75007629 | NULL                 |      102 |         5 |    4.9510 |
+----------------------------+----------------------+----------+-----------+-----------+
4 rows in set (2.09 sec)
 
MariaDB [test]> select sum(memory_used)/1024/1024, state, count(*), max(Time), avg(Time) FROM information_schema.processlist group by state with rollup;
+----------------------------+----------------------+----------+-----------+-----------+
| sum(memory_used)/1024/1024 | state                | count(*) | max(Time) | avg(Time) |
+----------------------------+----------------------+----------+-----------+-----------+
|                 0.06581116 |                      |        1 |        25 |   25.0000 |
|                 0.08060455 | Filling schema table |        1 |         0 |    0.0000 |
|                24.48434448 | preparing            |        2 |        25 |   24.5000 |
|               673.31947327 | Sending data         |       55 |        25 |   24.6909 |
|              1968.40044403 | statistics           |       31 |        25 |   24.2258 |
|              2666.35067749 | NULL                 |       90 |        25 |   24.2556 |
+----------------------------+----------------------+----------+-----------+-----------+
6 rows in set (0.08 sec)

Then mysqld process was killed by OOM killer:

2016-03-21 11:42:12 140198044989504 [Note] /home/openxs/dbs/maria10.1/bin/mysqld: ready for connections.
Version: '10.1.13-MariaDB'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
160321 11:48:51 mysqld_safe Number of processes running now: 0
160321 11:48:51 mysqld_safe mysqld restarted
2016-03-21 11:48:51 140211571005504 [Note] /home/openxs/dbs/maria10.1/bin/mysqld (mysqld 10.1.13-MariaDB) starting as process 11422 ...
 
[openxs@fc23 maria10.1]$ journalctl -b | grep mysqld
Mar 21 11:48:46 fc23 kernel: mysqld invoked oom-killer: gfp_mask=0x24280ca, order=0, oom_score_adj=0
Mar 21 11:48:47 fc23 kernel: mysqld cpuset=/ mems_allowed=0
Mar 21 11:48:47 fc23 kernel: CPU: 3 PID: 11330 Comm: mysqld Not tainted 4.4.4-301.fc23.x86_64 #1
Mar 21 11:48:49 fc23 kernel: [10505]  1000 10505    29781      111      11       3        0             0 mysqld_safe
Mar 21 11:48:49 fc23 kernel: [10567]  1000 10567  2122745  1704741    3440      12        0             0 mysqld
Mar 21 11:48:49 fc23 kernel: Out of memory: Kill process 10567 (mysqld) score 839 or sacrifice child
Mar 21 11:48:49 fc23 kernel: Killed process 10567 (mysqld) total-vm:8490980kB, anon-rss:6818988kB, file-rss:0kB

With Percona Server 5.6.28-76.1 built from source on the same hardware I've got the following execution time:

...
| 54107 | 2016-03-21 11:55:08 |
+-------+---------------------+
43238 rows in set (0.24 sec)

and memory usage while the same mysqlsap was running the query in 100 connections based on top was like this:

11793 openxs    20   0 5409252 4.062g  15772 S 378.4 52.3   0:40.17 mysqld
...
11793 openxs    20   0 5634456 4.509g  15772 S 384.1 58.1   2:12.86 mysqld
...
11793 openxs    20   0 5739204 4.518g  15772 S 379.4 58.2  22:50.10 mysqld
...
11793 openxs    20   0 5853164 4.671g  15772 S 380.7 60.2  53:07.90 mysqld

The command completed successfully:

[openxs@fc23 p5.6]$ bin/mysqlslap -uroot --concurrency=100 --iterations=100 --create-schema=test --query=/tmp/query.sql --delimiter=";" --socket=/tmp/mysql.sock
Benchmark
        Average number of seconds to run all queries: 8.560 seconds
        Minimum number of seconds to run all queries: 8.344 seconds
        Maximum number of seconds to run all queries: 16.744 seconds
        Number of clients running queries: 100
        Average number of queries per client: 1

The system used for testing was like this:

[openxs@fc23 percona-toolkit]$ bin/pt-summary
# Percona Toolkit System Summary Report ######################
        Date | 2016-03-21 10:08:35 UTC (local TZ: EET +0200)
    Hostname | fc23
      Uptime |  1:38,  3 users,  load average: 61.94, 58.26, 36.55
    Platform | Linux
     Release | Fedora release 23 (Twenty Three)
      Kernel | 4.4.4-301.fc23.x86_64
Architecture | CPU = 64-bit, OS = 64-bit
   Threading | NPTL 2.22
     SELinux | Enforcing
 Virtualized | No virtualization detected
# Processor ##################################################
  Processors | physical = 1, cores = 4, virtual = 4, hyperthreading = no
      Speeds | 4x2499.000
      Models | 4xIntel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz
      Caches | 4x2048 KB
# Memory #####################################################
       Total | 7.8G
        Free | 1.7G
        Used | physical = 5.3G, swap allocated = 0.0, swap used = 0.0, virtual = 5.3G
     Buffers | 795.5M
      Caches | 2.2G
       Dirty | 49416 kB
     UsedRSS | 5.3G
  Swappiness | 60
 DirtyPolicy | 20, 10
 DirtyStatus | 0, 0
...

Comment by Sergei Petrunia [ 2016-04-30 ]

Some observations (these are not the direct cause of the problem but related).

One reason for high memory consuption is:

(gdb) p sizeof(SEL_TREE)
  $32 = 600
(gdb) p sizeof(SEL_TREE::keys)
  $36 = 512

MySQL-5.6 also has this big array. I didn't check other versions.

Comment by Sergei Petrunia [ 2016-04-30 ]

opt_range.cc also has some other ways to prevent high memory usage:

  • PARAM::alloced_sel_args (introduced by me in 2007, present in both MySQL and MariaDB)
  • NOT_IN_IGNORE_THRESHOLD(introduced in 2006, present in both MySQL and MariaDB)

these are not directly related to this bug

Comment by Sergei Petrunia [ 2016-04-30 ]

The patch that introduces @@eq_range_index_dive_limit is here:
https://github.com/mysql/mysql-server/commit/7a9d83117f039337a948f918c2c4c5d7eb3f5ccc

it replaces records_in_range() calls with use of index statistics (or some adjusted value?). This will remove [one cause of] the slowdown, but won't help with high memory consumption.

Comment by Valerii Kravchuk [ 2016-04-30 ]

See http://bugs.mysql.com/bug.php?id=79450 also.

Comment by Sergei Petrunia [ 2016-05-01 ]

The patch that limits memory consumption is here:
https://github.com/mysql/mysql-server/commit/a7571080513ed2107919ebb1f5633566d3605189

It is done on a fairly low level, by limiting memory consumption of a MEM_ROOT pool

(the capability to limit memory use of a MEM_ROOT is also used to limit the amount of memory used by the parser:
https://github.com/mysql/mysql-server/commit/d18c716d3e6e25c0444b332b83bed358c3e4b137

Comment by Sergei Petrunia [ 2016-05-10 ]

A somewhat similar case: MDEV-10046

Comment by Sergei Petrunia [ 2016-05-23 ]

Looking at mysql-5.7, found this: if one has a table with extended key

CREATE TABLE t1 (
  pk int(11) NOT NULL,
  a int(11) DEFAULT NULL,
  ...
  PRIMARY KEY (pk),
  KEY a (a)
);

and a query that has equalities on a secondary index as well as PK:

explain select * from t1 
where a in  (1,2,3,4,5,6,7,8,9,10) and 
      pk in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);

Then it enumerates a combinatorial explosion of ranges:

    "1 <= a <= 1 AND 1 <= pk <= 1",
    "1 <= a <= 1 AND 2 <= pk <= 2",
    "1 <= a <= 1 AND 3 <= pk <= 3",
    "1 <= a <= 1 AND 4 <= pk <= 4",
    ...
    "10 <= a <= 10 AND 13 <= pk <= 13",
    "10 <= a <= 10 AND 14 <= pk <= 14",
    "10 <= a <= 10 AND 15 <= pk <= 15"

but also it calls ha_innobase::records_in_range() for each of them.

This goes against the idea of not calling records_in_range() for "equality ranges" (full key ranges on unique indexes).

Comment by Sergei Petrunia [ 2016-05-25 ]

Looking at the testcase in the bug report again:

== 10.1 ==
explain: 2.63 sec
query: 3.58 sec
memory usage:

  • MariaDB's memory tracking
  • information_schema.processlist.memory_used: 67,746,520
  • checking increments in debugger one can see that memory usage is about 9.5M when we enter get_mm_tree.
  • That is, inside get_mm_tree, 58,260,384 bytes of mem are consumed.

== mysql-5.7 ==
explain: 1.05 sec
query: 2.12 sec
range optimizer is terminated early because it runs out of memory.

== mysql-5.7, no memory limit for range optimizer ==
set range_optimizer_max_mem_size=1024*1024*1024;
explain: 1.21 sec
query: 2.43 sec (still faster than 10.1)
memory usage: MySQL 5.7's way:
update performance_schema.setup_instruments set enabled='YES'
where name='memory/sql/test_quick_select';
select * from performance_schema.memory_summary_global_by_event_name
where event_name like '%test_quick_select%';
HIGH_NUMBER_OF_BYTES_USED: 10,241,280

Debugging, one can see that

  • 44,107 SEL_ARG objects are produced
  • eq_range_index_dive_limit limit is hit, and records_in_range() calls are not made.

== mysql-5.7, no eq_range_index_dive_limit ==
set eq_range_index_dive_limit=1000*1000;
explain: 4.0 sec
query: 5.16 sec
HIGH_NUMBER_OF_BYTES_USED: 10,241,280

Comment by Sergei Petrunia [ 2016-05-31 ]

Pushed
https://github.com/MariaDB/server/commit/016790403a4bb6182094870870ce1a1c3e2756dc
into 10.1.

The patch doesn't put any limits on memory usage, but it makes memory usage much lower.
Re-running the test from the previous comments:

== 10.1-patch1 ==
explain: 2.54 sec (was 2.63 sec)
query: 3.48 sec (was: 3.58 sec)

information_schema.processlist.memory_used: 26,091,264 (was: 67,746,520)
That is, inside get_mm_tree, 16,605,264 bytes of memory are consumed (was: 58,260,384)

Comment by Sergei Petrunia [ 2016-05-31 ]

Note that the above is still higher than the HIGH_NUMBER_OF_BYTES_USED: 10,241,280 number we've got for mysql-5.7, no eq_range_index_dive_limit above.

Comment by Sergei Petrunia [ 2016-05-31 ]

MariaDB uses more memory because it allocates more SEL_TREE objects.
It processes t.key IN (c1,c2,c3 ...) in the same way as t.key=c1 OR t.key=c2 OR t.key=c3 OR ... would be processed:

  • constructs SEL_TREE for each t.key=c_i
  • calls tree_or(sel_tree1, sel_tree2)
    tree_or() always allocates a separate SEL_TREE for its result.

This causes MariaDB to allocate 2x more SEL_TREE objects than the number of elements in (c1,c2, ... cN) list. For the example, I see 88213 SEL_TREEs allocated.

MySQL 5.7 with range_optimizer_max_mem_size=1024*1024*1024 setting allocates only 44108 SEL_TREE objects.
It allocates fewer objects, because tree_or(sel_tree1, sel_tree2) call reuses one of SEL_TREE parameters for the result
...

Looking why MariaDB doesn't reuse... it comes from this patch:

commit 3019d3972932931d9c4a14bec4183b232355cdcf
Author:	Igor Babaev <igor@askmonty.org>  Mon Oct 12 08:59:34 2009
The main patch for WL#24: 
"index_merge: fair choice between index_merge union and range access"

Comment by Sergei Petrunia [ 2016-05-31 ]

If we made tree_or not to allocate a SEL_TREE in this case, we would gain (sizeof(SEL_TREE)=120) * 44108=5,292,960 bytes.

get_mm_tree currently uses 16,605,264 bytes so it would one third.

Comment by Sergei Petrunia [ 2016-06-01 ]

Conclusions from yesterday scrum discussion:
it is ok to backport @@eq_range_index_dive_limit.
( https://github.com/mysql/mysql-server/commit/7a9d83117f039337a948f918c2c4c5d7eb3f5ccc )

Comment by Sergei Petrunia [ 2016-06-02 ]

Pushed patch2, 7d3d75895d9d29d52c34dd3559cec59731d8d267.
It reduces memory usage further:

== 10.1-patch2 ==
explain: 2.56 sec (no improvement)
query: 3.50 sec (no improvement)
information_schema.processlist.memory_used: 20,437,336 (was: 26,091,264)
Memory used inside get_mm_tree: 10,951,200 (was: 16,605,264)

The amount of memory used by the range optimizer is now roughly the same what 5.7 would use (see HIGH_NUMBER_OF_BYTES_USED: 10,241,280 above) when one sets the settings to make it apples-to-apples comparison).

Comment by Sergei Petrunia [ 2016-06-03 ]

Pushed a fix for MDEV-10175. It doesn't have a lot of effect for in-memory workloads, but may make a difference when records_in_range() calls have to hit the disk.

Comment by Sergei Golubchik [ 2016-10-10 ]

Still no hard limit on memory used, so still possible to trigger OOM.

Comment by Sergei Petrunia [ 2016-12-08 ]

confirms the earlier findings - recent MariaDB versions take about as much RAM as MySQL does.
MySQL has a size cap in form of the @@range_optimizer_max_mem_size limit.

Without the cap.. we have test_quick_select code use about 10M memory for a 300K query. That is, about 30x the size of the query. You can get the multiplier to be bigger if there are multiple potential indexes.

Comment by Sergei Petrunia [ 2016-12-08 ]

Debugging the query explain-mdev9764-12.sql (query text takes 39K)

explain
SELECT * FROM t WHERE id IN (10000, 10001, 10002, ....15479, 15480)

The IN list has 5480 elements.

I can see 5481 SEL_TREE objects alloced. sizeof(SEL_TREE)=120. This is responsible for 120*5480 = 657,600 = 657K of mem usage

Each SEL_TREE object has

  • one SEL_ARG object taking 104 bytes
  • one key value, which takes 9 bytes
    This adds up to 5481 * (9 + 104)= 619,353 bytes.

619+658=~1.2M, which agrees with the data on the chart.

Conclusions:
In this bug's example

  • memory usage is O(query_size), albeit with a big constant (30x)
  • all consumed memory is used by the final SEL_ARG/SEL_TREE data structure (this means, there is currently no memory waste. There used to be but I've fixed that)
Comment by Valerii Kravchuk [ 2016-12-08 ]

Queries of this kind, with long IN () lists, are usually dynamically generated by software that may be out of DBA's control. Here hard limits (using any of the way implemented in upstream MySQL (eq_range_index_dive_limit and/or range_optimizer_max_mem_size) help DBA to keep server more safe.

Also, what about that part of the original report related to time spent on optimization (for no real gain, when number of items in the IN list is approaching number of rows in the table)? Hard limits solve this problem as well.

Comment by Sergei Petrunia [ 2016-12-10 ]

Ok, finally it became clear that A) other linked MDEV issues are different from this one, and B) although some solutions are possible for this particular issue (will file an MDEV), all of them will be too intrusive to be put into a 10.1 release.

So, thinking of backporting the @@range_optimizer_max_mem_size fix from MySQL.

The patch in MySQL is:
https://github.com/mysql/mysql-server/commit/a7571080513ed2107919ebb1f5633566d3605189

Possible concerns when applying this on MariaDB 10.1:

  • Adding members into MEM_ROOT structure causes the ABI check to fail (I am not sure how much of an issue that actually is)

Warning	3170	Memory capacity of 1536000 bytes for
'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query.

The wording is not entirely correct, because range optimization may be still done for other tables. Do we want the warning to be printed at all?

  • I suppose we want to preserve the current behavior, so the default should be "no limits" (unlike with MySQL 5.7 that has 8M as the default).
Comment by Sergei Golubchik [ 2017-01-15 ]

Added a new server option/variable --max-session-mem-used that implements a
soft limit for the server session status variable mem_used.

Comment by Simone [ 2018-02-21 ]

Is it possible for this bug to affect version 10.0.30?

I'm having some random OOM issues, and I've seend queries with megabytes (literaly) of "IN"/"NOT IN" clause.

Comment by Sergei Golubchik [ 2018-08-15 ]

Yes, it affects all versions below what is listed in FixVersion/s field. So, all 5.5 and 10.0 versions.

Generated at Thu Feb 08 07:37:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.