[MDEV-24117] Memory management problem in statistics state for queries that have a large field IN (values) part Created: 2020-11-04  Updated: 2021-12-29  Resolved: 2020-11-11

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.2.35, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.36, 10.3.27, 10.4.17, 10.5.8

Type: Bug Priority: Blocker
Reporter: Geza Lucz Assignee: Sergei Petrunia
Resolution: Fixed Votes: 11
Labels: regression
Environment:

Centos 7


Attachments: File 1_my.test     File mariadbbug.sql    
Issue Links:
Blocks
is blocked by MDEV-27380 After minor upgrade to 10.5.13 querie... Closed
Duplicate
is duplicated by MDEV-24163 Mariadb is restarting after sending t... Closed
is duplicated by MDEV-24173 After minor upgrade to 10.2.35 querie... Closed
is duplicated by MDEV-24195 Huge memory usage after upgrade Closed
Problem/Incident
is caused by MDEV-23811 With large number of indexes optimize... Closed

 Description   

A query with a large IN () section will be stuck in statistics and use up all RAM. Multiple such queries will crash the server regardless of memory limitations specified in my.cnf

I noticed this in 10.2.35. Downgrading to 10.2.34 fixes the problem

An example is the WordPress "select post_id,meta_value from wp_postmeta where meta_id in (1,2,3 etc ...)" query.



 Comments   
Comment by Niels Hendriks [ 2020-11-04 ]

Same issue here. Server that was previously fine with 12G RAM had to be upgraded to 40G RAM to be "stable" with 12.2.35 on Debian Stretch.
RAM wildly fluctuates.

Has anyone checked by any chance if 10.3/10.4/10.5 are also affected? Considering whether we should downgrade or upgrade to the next major release to fix it.

EDIT: Can confirm memory is fine again in 10.2.34.

Comment by Pavel [ 2020-11-05 ]

I ran into this issue on 10.5. I tested both on Debian and Windows.

10.5.6 is fine, 10.5.7 has the bug. I attached `mariadbbug.sql` with an example table and query.

Comment by Frank Sagurna [ 2020-11-07 ]

I can confirm this issue. I wondered why I see queries in the process list with "Status" "statistics". Never saw this state before. Then I had problems with memory usage up to the server was killed by the kernel OOM killer.
MariaDB 10.2.35 / Ubuntu repositories

Comment by Alice Sherepa [ 2020-11-09 ]

Thanks a lot! The regression appeared after 291be49474 commit (MDEV-23811). (I used 1_my.test while reproducing, the same as mariadbbug.sql, but slightly shorter, for debug server)

Comment by Sergei Petrunia [ 2020-11-09 ]

Debugging 1_my.test.

Counting SEL_ARG::SEL_ARG allocations. Indeed, before the patch for MDEV-23811, it had: total_sel_args=3,294 (this is the number of elements in the IN-list). And after, it has total_sel_args=5,433,451. Too high.

Another odd thing in key_or(): key2_shared==true. This doesn't seem correct - the testcase has just one index and we use only 1 key part from it. How can a SEL_ARG graph be shared in this case?

Comment by Sergei Petrunia [ 2020-11-09 ]

The reference counters are increased here in tree_or:

      if (no_merges1)
        rt1= new SEL_TREE(tree1, TRUE, param);
      if (no_merges2)
        rt2= new SEL_TREE(tree2, TRUE, param);

(denote this as LOC1)

The only place where rt1 and t2 are used is this one:

  if (!(imerge_from_ranges= new SEL_IMERGE()))
    result= NULL;
  else if (!no_ranges1 && !no_ranges2 && !no_imerge_from_ranges)
  {
    /* Build the imerge part of the tree for the formula (1) */
    if (!rt1 || !rt2 ||
    ...
  }

Denote the above as LOC2.

Note that LOC2 has if (...&& !no_imerge_from_ranges)

which gives an idea - if rt1 and and rt2 is only used when no_imerge_from_ranges==FALSE,
maube we can create them only in that case?

@@ -8877,12 +8877,15 @@ tree_or(RANGE_OPT_PARAM *param,SEL_TREE *tree1,SEL_TREE *tree2)
   /* Build the range part of the tree for the formula (1) */ 
   if (sel_trees_can_be_ored(param, tree1, tree2, &ored_keys))
   {
-    if (no_merges1)
-      rt1= new SEL_TREE(tree1, TRUE, param);
-    if (no_merges2)
-      rt2= new SEL_TREE(tree2, TRUE, param);
     bool must_be_ored= sel_trees_must_be_ored(param, tree1, tree2, ored_keys);
     no_imerge_from_ranges= must_be_ored;
+    if (!no_imerge_from_ranges)
+    {
+      if (no_merges1)
+        rt1= new SEL_TREE(tree1, TRUE, param);
+      if (no_merges2)
+        rt2= new SEL_TREE(tree2, TRUE, param);
+    }
 
     if (no_imerge_from_ranges && no_merges1 && no_merges2)
     {

Unfortunately, this doesn't work, it creates some a test failure in main.range_vs_index_merge. Not sure for which reason.

Comment by Sergei Petrunia [ 2020-11-09 ]

Ok, the issue is caused by the wrong patch for MDEV-23811.

The patch that was pushed was this:
https://github.com/MariaDB/server/commit/291be494744abe90f4bdf6b5a35c4c26ee8ddda5

But it is a preliminary version.

During the review, I had requested to remove the problematic part of the patch :
https://jira.mariadb.org/browse/MDEV-23811?focusedCommentId=167592&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167592

which was accepted and done:
http://lists.askmonty.org/pipermail/commits/2020-October/014342.html

But then due to some error, wrong patch was pushed.

Comment by Sergei Petrunia [ 2020-11-09 ]

A testcase for this bug (thanks serg's for the idea) :
https://github.com/MariaDB/server/commit/1404f3bea796c8479cf401cb36d518658600ddca

Comment by Roope Pääkkönen (Inactive) [ 2020-11-10 ]

we were hit by this as well, on testing fortunately - is a hotfix release for this possible ?

Comment by Javier Carvajal [ 2020-11-11 ]

I have same issue. Server was had to be upgraded to 32G RAM with 10.2.35 on Debian 9 (Stretch). RAM wildly fluctuates. The server was killed by the kernel OOM killer.

Any quick fix?

Comment by Ferdi van der Werf [ 2020-11-11 ]

I'd recommend to downgrade to 10.2.34 and apt-mark hold it until there is a fix available. We've seen servers downgraded to 10.2.34 to return to normal RAM usage.

Comment by Sergei Petrunia [ 2020-11-11 ]

Hello,

A set of release is about to be made (10.2.36, 10.3.27, 10.4.17, 10.5.8) which will contain the fix for this.
(EDIT: fix version#)

Comment by Sergei Petrunia [ 2020-11-11 ]

The commits that fix this are (all 3 were made in a row):

https://github.com/MariaDB/server/commit/f81eef62e7742806e5e74b5f37f35b7cd2f82291
https://github.com/MariaDB/server/commit/1404f3bea796c8479cf401cb36d518658600ddca
https://github.com/MariaDB/server/commit/bea84aefb0563a10a310ea81d46c372919345c10

Comment by Daniel Black [ 2020-11-11 ]

For all those watching patiently, there have been packages released https://mariadb.org/mariadb-10-5-8-10-4-17-10-3-27-and-10-2-36-now-available/

Generated at Thu Feb 08 09:27:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.