Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24117

Memory management problem in statistics state for queries that have a large field IN (values) part

Details

    Description

      A query with a large IN () section will be stuck in statistics and use up all RAM. Multiple such queries will crash the server regardless of memory limitations specified in my.cnf

      I noticed this in 10.2.35. Downgrading to 10.2.34 fixes the problem

      An example is the WordPress "select post_id,meta_value from wp_postmeta where meta_id in (1,2,3 etc ...)" query.

      Attachments

        Issue Links

          Activity

            nielsh Niels Hendriks added a comment - - edited

            Same issue here. Server that was previously fine with 12G RAM had to be upgraded to 40G RAM to be "stable" with 12.2.35 on Debian Stretch.
            RAM wildly fluctuates.

            Has anyone checked by any chance if 10.3/10.4/10.5 are also affected? Considering whether we should downgrade or upgrade to the next major release to fix it.

            EDIT: Can confirm memory is fine again in 10.2.34.

            nielsh Niels Hendriks added a comment - - edited Same issue here. Server that was previously fine with 12G RAM had to be upgraded to 40G RAM to be "stable" with 12.2.35 on Debian Stretch. RAM wildly fluctuates. Has anyone checked by any chance if 10.3/10.4/10.5 are also affected? Considering whether we should downgrade or upgrade to the next major release to fix it. EDIT: Can confirm memory is fine again in 10.2.34.
            xPaw Pavel added a comment - - edited

            I ran into this issue on 10.5. I tested both on Debian and Windows.

            10.5.6 is fine, 10.5.7 has the bug. I attached `mariadbbug.sql` with an example table and query.

            xPaw Pavel added a comment - - edited I ran into this issue on 10.5. I tested both on Debian and Windows. 10.5.6 is fine, 10.5.7 has the bug. I attached `mariadbbug.sql` with an example table and query.
            gunni Frank Sagurna added a comment -

            I can confirm this issue. I wondered why I see queries in the process list with "Status" "statistics". Never saw this state before. Then I had problems with memory usage up to the server was killed by the kernel OOM killer.
            MariaDB 10.2.35 / Ubuntu repositories

            gunni Frank Sagurna added a comment - I can confirm this issue. I wondered why I see queries in the process list with "Status" "statistics". Never saw this state before. Then I had problems with memory usage up to the server was killed by the kernel OOM killer. MariaDB 10.2.35 / Ubuntu repositories
            alice Alice Sherepa added a comment - - edited

            Thanks a lot! The regression appeared after 291be49474 commit (MDEV-23811). (I used 1_my.test while reproducing, the same as mariadbbug.sql, but slightly shorter, for debug server)

            alice Alice Sherepa added a comment - - edited Thanks a lot! The regression appeared after 291be49474 commit ( MDEV-23811 ). (I used 1_my.test while reproducing, the same as mariadbbug.sql, but slightly shorter, for debug server)

            Debugging 1_my.test.

            Counting SEL_ARG::SEL_ARG allocations. Indeed, before the patch for MDEV-23811, it had: total_sel_args=3,294 (this is the number of elements in the IN-list). And after, it has total_sel_args=5,433,451. Too high.

            Another odd thing in key_or(): key2_shared==true. This doesn't seem correct - the testcase has just one index and we use only 1 key part from it. How can a SEL_ARG graph be shared in this case?

            psergei Sergei Petrunia added a comment - Debugging 1_my.test. Counting SEL_ARG::SEL_ARG allocations. Indeed, before the patch for MDEV-23811 , it had: total_sel_args=3,294 (this is the number of elements in the IN-list). And after, it has total_sel_args=5,433,451 . Too high. Another odd thing in key_or(): key2_shared==true . This doesn't seem correct - the testcase has just one index and we use only 1 key part from it. How can a SEL_ARG graph be shared in this case?

            The reference counters are increased here in tree_or:

                  if (no_merges1)
                    rt1= new SEL_TREE(tree1, TRUE, param);
                  if (no_merges2)
                    rt2= new SEL_TREE(tree2, TRUE, param);
            

            (denote this as LOC1)

            The only place where rt1 and t2 are used is this one:

              if (!(imerge_from_ranges= new SEL_IMERGE()))
                result= NULL;
              else if (!no_ranges1 && !no_ranges2 && !no_imerge_from_ranges)
              {
                /* Build the imerge part of the tree for the formula (1) */
                if (!rt1 || !rt2 ||
                ...
              }
            

            Denote the above as LOC2.

            Note that LOC2 has if (...&& !no_imerge_from_ranges)

            which gives an idea - if rt1 and and rt2 is only used when no_imerge_from_ranges==FALSE,
            maube we can create them only in that case?

            @@ -8877,12 +8877,15 @@ tree_or(RANGE_OPT_PARAM *param,SEL_TREE *tree1,SEL_TREE *tree2)
               /* Build the range part of the tree for the formula (1) */ 
               if (sel_trees_can_be_ored(param, tree1, tree2, &ored_keys))
               {
            -    if (no_merges1)
            -      rt1= new SEL_TREE(tree1, TRUE, param);
            -    if (no_merges2)
            -      rt2= new SEL_TREE(tree2, TRUE, param);
                 bool must_be_ored= sel_trees_must_be_ored(param, tree1, tree2, ored_keys);
                 no_imerge_from_ranges= must_be_ored;
            +    if (!no_imerge_from_ranges)
            +    {
            +      if (no_merges1)
            +        rt1= new SEL_TREE(tree1, TRUE, param);
            +      if (no_merges2)
            +        rt2= new SEL_TREE(tree2, TRUE, param);
            +    }
             
                 if (no_imerge_from_ranges && no_merges1 && no_merges2)
                 {
            

            Unfortunately, this doesn't work, it creates some a test failure in main.range_vs_index_merge. Not sure for which reason.

            psergei Sergei Petrunia added a comment - The reference counters are increased here in tree_or: if (no_merges1) rt1= new SEL_TREE(tree1, TRUE, param); if (no_merges2) rt2= new SEL_TREE(tree2, TRUE, param); (denote this as LOC1) The only place where rt1 and t2 are used is this one: if (!(imerge_from_ranges= new SEL_IMERGE())) result= NULL; else if (!no_ranges1 && !no_ranges2 && !no_imerge_from_ranges) { /* Build the imerge part of the tree for the formula (1) */ if (!rt1 || !rt2 || ... } Denote the above as LOC2. Note that LOC2 has if (...&& !no_imerge_from_ranges) which gives an idea - if rt1 and and rt2 is only used when no_imerge_from_ranges==FALSE, maube we can create them only in that case? @@ -8877,12 +8877,15 @@ tree_or(RANGE_OPT_PARAM *param,SEL_TREE *tree1,SEL_TREE *tree2) /* Build the range part of the tree for the formula (1) */ if (sel_trees_can_be_ored(param, tree1, tree2, &ored_keys)) { - if (no_merges1) - rt1= new SEL_TREE(tree1, TRUE, param); - if (no_merges2) - rt2= new SEL_TREE(tree2, TRUE, param); bool must_be_ored= sel_trees_must_be_ored(param, tree1, tree2, ored_keys); no_imerge_from_ranges= must_be_ored; + if (!no_imerge_from_ranges) + { + if (no_merges1) + rt1= new SEL_TREE(tree1, TRUE, param); + if (no_merges2) + rt2= new SEL_TREE(tree2, TRUE, param); + } if (no_imerge_from_ranges && no_merges1 && no_merges2) { Unfortunately, this doesn't work, it creates some a test failure in main.range_vs_index_merge. Not sure for which reason.
            psergei Sergei Petrunia added a comment - - edited

            Ok, the issue is caused by the wrong patch for MDEV-23811.

            The patch that was pushed was this:
            https://github.com/MariaDB/server/commit/291be494744abe90f4bdf6b5a35c4c26ee8ddda5

            But it is a preliminary version.

            During the review, I had requested to remove the problematic part of the patch :
            https://jira.mariadb.org/browse/MDEV-23811?focusedCommentId=167592&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167592

            which was accepted and done:
            http://lists.askmonty.org/pipermail/commits/2020-October/014342.html

            But then due to some error, wrong patch was pushed.

            psergei Sergei Petrunia added a comment - - edited Ok, the issue is caused by the wrong patch for MDEV-23811 . The patch that was pushed was this: https://github.com/MariaDB/server/commit/291be494744abe90f4bdf6b5a35c4c26ee8ddda5 But it is a preliminary version. During the review, I had requested to remove the problematic part of the patch : https://jira.mariadb.org/browse/MDEV-23811?focusedCommentId=167592&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167592 which was accepted and done: http://lists.askmonty.org/pipermail/commits/2020-October/014342.html But then due to some error, wrong patch was pushed.
            psergei Sergei Petrunia added a comment - A testcase for this bug (thanks serg 's for the idea) : https://github.com/MariaDB/server/commit/1404f3bea796c8479cf401cb36d518658600ddca

            we were hit by this as well, on testing fortunately - is a hotfix release for this possible ?

            rlam Roope Pääkkönen (Inactive) added a comment - we were hit by this as well, on testing fortunately - is a hotfix release for this possible ?
            jaedcar74 Javier Carvajal added a comment - - edited

            I have same issue. Server was had to be upgraded to 32G RAM with 10.2.35 on Debian 9 (Stretch). RAM wildly fluctuates. The server was killed by the kernel OOM killer.

            Any quick fix?

            jaedcar74 Javier Carvajal added a comment - - edited I have same issue. Server was had to be upgraded to 32G RAM with 10.2.35 on Debian 9 (Stretch). RAM wildly fluctuates. The server was killed by the kernel OOM killer. Any quick fix?

            I'd recommend to downgrade to 10.2.34 and apt-mark hold it until there is a fix available. We've seen servers downgraded to 10.2.34 to return to normal RAM usage.

            fuegas Ferdi van der Werf added a comment - I'd recommend to downgrade to 10.2.34 and apt-mark hold it until there is a fix available. We've seen servers downgraded to 10.2.34 to return to normal RAM usage.
            psergei Sergei Petrunia added a comment - - edited

            Hello,

            A set of release is about to be made (10.2.36, 10.3.27, 10.4.17, 10.5.8) which will contain the fix for this.
            (EDIT: fix version#)

            psergei Sergei Petrunia added a comment - - edited Hello, A set of release is about to be made (10.2.36, 10.3.27, 10.4.17, 10.5.8) which will contain the fix for this. (EDIT: fix version#)
            psergei Sergei Petrunia added a comment - The commits that fix this are (all 3 were made in a row): https://github.com/MariaDB/server/commit/f81eef62e7742806e5e74b5f37f35b7cd2f82291 https://github.com/MariaDB/server/commit/1404f3bea796c8479cf401cb36d518658600ddca https://github.com/MariaDB/server/commit/bea84aefb0563a10a310ea81d46c372919345c10
            danblack Daniel Black added a comment -

            For all those watching patiently, there have been packages released https://mariadb.org/mariadb-10-5-8-10-4-17-10-3-27-and-10-2-36-now-available/

            danblack Daniel Black added a comment - For all those watching patiently, there have been packages released https://mariadb.org/mariadb-10-5-8-10-4-17-10-3-27-and-10-2-36-now-available/

            People

              psergei Sergei Petrunia
              glucz Geza Lucz
              Votes:
              11 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.