Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21958

Query having many NOT-IN clauses running forever and causing available free memory to use completely

Details

    Description

      One of the customer reported that running a huge query (25KB) having a NOT-IN clause with many values running forever, causing the server to use available free memory and invoking the oom-killer to kill the mysqld process. Same behavior when tried to get the `EXPLAIN PLAN`.

      Initially, checking the existing bugs, found following two:

      MDEV-9764 - MariaDB does not limit memory used for range optimization
      MDEV-9750 - extended_keys=on causing the memory exhaustion on queries having multiple IN/NOT IN using INNODB

      As per those two bugs, we have told customer to turn off the "extended_keys" and check, but it's not helping out.

      Further, locally on the MariaDB Server versions 10.2.21, 10.3.14 and 10.4.12(ES), I can reproduce the same behavior on an empty table. Here as well, neither setting "extended_keys=off" helped nor the "max_session_mem_used" system variable worked to avoid the exhaustion of available free memory.

      Repro steps, I will share it separately.

      Maybe, we need to understand how can we handle this case?

      Thank You.

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia added a comment - - edited

            So, let's consider what the optimizer does for

            key NOT IN (c11, c12, ..., c1N) AND
            key NOT IN (c21, c22, ..., c2M) AND
            ...
            key NOT IN (cM1, cM2, ..., cMN)
            

            The first line

            key NOT IN (c11, c12, ..., c1N)
            

            produces N+1 ranges:

            (-inf; c11) U (c11, c12) U (c12, ...) ... U (..., c1N) U (c1N; +inf)
            

            Let's assume it's N ranges to simplify computations.

            The same goes for every line.

            Now, let's try to computing the AND, assuming all constants unequal.

            AND-ing two interval lists with X and Y intervals in the worst case produces X+Y-1 intervals.

            This means we will finally produce an interval list with

            N + N + N + ... (M times) =  N*M 
            

            intervals.

            This is already bad enough, because it's 790 *900=711000 intervals.

            Another possible is intermediate results. They are allocated on MEM_ROOT. If the space is not reused (and I think it is not), this means range optimizer will consume the memory for N *M/2 * M intervals

            psergei Sergei Petrunia added a comment - - edited So, let's consider what the optimizer does for key NOT IN (c11, c12, ..., c1N) AND key NOT IN (c21, c22, ..., c2M) AND ... key NOT IN (cM1, cM2, ..., cMN) The first line key NOT IN (c11, c12, ..., c1N) produces N+1 ranges: (-inf; c11) U (c11, c12) U (c12, ...) ... U (..., c1N) U (c1N; +inf) Let's assume it's N ranges to simplify computations. The same goes for every line. Now, let's try to computing the AND, assuming all constants unequal. AND-ing two interval lists with X and Y intervals in the worst case produces X+Y-1 intervals. This means we will finally produce an interval list with N + N + N + ... (M times) = N*M intervals. This is already bad enough, because it's 790 *900=711000 intervals. Another possible is intermediate results. They are allocated on MEM_ROOT. If the space is not reused (and I think it is not), this means range optimizer will consume the memory for N *M/2 * M intervals

            An experiment with counting # of SEL_AGS allocated:

            The queries have form:

            explain select * from t1 where 1 and a not in (0,10000,20000,30000,40000,50000,60000,70000,80000,90000)
            and a not in (2,10002,20002,30002,40002,50002,60002,70002,80002,90002)
            and a not in (4,10004,20004,30004,40004,50004,60004,70004,80004,90004)
            ...
            

            n_ranges	SEL_ARGS	SEL_ARGs per range
            11	11	1.00
            21	43	2.05
            51	199	3.90
            101	659	6.52
            201	2329	11.59
            401	8669	21.62
            601	19009	31.63
            801	33349	41.63
            1001	51689	51.64
            2001	203389	101.64
            4001	806789	201.65
            

            The dependency is clearly non-linear. This agrees with the no-reuse hypothesis.

            Another observation: PARAM::alloced_sel_args=0 for all queries. Apparently it was not counting all the places where SEL_ARGs are allocated?

            psergei Sergei Petrunia added a comment - An experiment with counting # of SEL_AGS allocated: The queries have form: explain select * from t1 where 1 and a not in (0,10000,20000,30000,40000,50000,60000,70000,80000,90000) and a not in (2,10002,20002,30002,40002,50002,60002,70002,80002,90002) and a not in (4,10004,20004,30004,40004,50004,60004,70004,80004,90004) ... n_ranges SEL_ARGS SEL_ARGs per range 11 11 1.00 21 43 2.05 51 199 3.90 101 659 6.52 201 2329 11.59 401 8669 21.62 601 19009 31.63 801 33349 41.63 1001 51689 51.64 2001 203389 101.64 4001 806789 201.65 The dependency is clearly non-linear. This agrees with the no-reuse hypothesis. Another observation: PARAM::alloced_sel_args=0 for all queries. Apparently it was not counting all the places where SEL_ARGs are allocated?

            As a side observation, what merit there is in trying to consider conditions in form

            unique_col NOT IN ( ... )
            

            sargable? They are obviously either non selective, or put a great strain on the range optimizer...

            psergei Sergei Petrunia added a comment - As a side observation, what merit there is in trying to consider conditions in form unique_col NOT IN ( ... ) sargable? They are obviously either non selective, or put a great strain on the range optimizer...
            psergei Sergei Petrunia added a comment - - edited

            A question one might have is why does the range optimizer allocate so many SEL_ARG objects? There are PARAM::alloced_sel_args and the SEL_ARG::MAX_SEL_ARGS=16000 limit, shouldn't this code have hit them?

            Looking where SEL_ARG::MAX_SEL_ARGS was introduced...

            commit 9639eb3dda5347ecc342de94da3cc0e025e5c058
            Author:	unknown <sergefp@mysql.com>  Wed Mar 28 20:16:01 2007
            Committer:	unknown <sergefp@mysql.com>  Wed Mar 28 20:16:01 2007
             
            BUG#26624: high mem usage (crash) in range optimizer
            - Added PARAM::alloced_sel_args where we count the # of SEL_ARGs
              created by SEL_ARG tree cloning operations.
            - Made the range analyzer to shortcut and not do any more cloning 
              if we've already created MAX_SEL_ARGS SEL_ARG objects in cloning.
            - Added comments about space complexity of SEL_ARG-graph 
              representation
            

            https://github.com/mariadb/server/commit/9639eb3dda5347ecc342de94da3cc0e025e5c058

            and there it says:

            • Added PARAM::alloced_sel_args where we count the # of SEL_ARGs
              created by SEL_ARG tree cloning operations.

            ANDing intervals is technically not cloning (as the new interval is not the same as the old ones).

            psergei Sergei Petrunia added a comment - - edited A question one might have is why does the range optimizer allocate so many SEL_ARG objects? There are PARAM::alloced_sel_args and the SEL_ARG::MAX_SEL_ARGS=16000 limit, shouldn't this code have hit them? Looking where SEL_ARG::MAX_SEL_ARGS was introduced... commit 9639eb3dda5347ecc342de94da3cc0e025e5c058 Author: unknown <sergefp@mysql.com> Wed Mar 28 20:16:01 2007 Committer: unknown <sergefp@mysql.com> Wed Mar 28 20:16:01 2007   BUG#26624: high mem usage (crash) in range optimizer - Added PARAM::alloced_sel_args where we count the # of SEL_ARGs created by SEL_ARG tree cloning operations. - Made the range analyzer to shortcut and not do any more cloning if we've already created MAX_SEL_ARGS SEL_ARG objects in cloning. - Added comments about space complexity of SEL_ARG-graph representation https://github.com/mariadb/server/commit/9639eb3dda5347ecc342de94da3cc0e025e5c058 and there it says: Added PARAM::alloced_sel_args where we count the # of SEL_ARGs created by SEL_ARG tree cloning operations. ANDing intervals is technically not cloning (as the new interval is not the same as the old ones).

            The non-linear memory usage comes from this code in key_and:

              SEL_ARG *e1=key1->first(), *e2=key2->first(), *new_tree=0;
            

            ^ Start from the first range in both range lists

              uint max_part_no= MY_MAX(key1->max_part_no, key2->max_part_no);
              uint new_weight= 0;
             
              while (e1 && e2)
              {
                int cmp=e1->cmp_min_to_min(e2);
            

            ...

                if (!next || next->type != SEL_ARG::IMPOSSIBLE)
                {
                  SEL_ARG *new_arg= e1->clone_and(param->thd, e2);
            

            Each clone_and() call allocates a SEL_ARG.
            In the worst case, this will get called key1->elements + key2->elements times.

            psergei Sergei Petrunia added a comment - The non-linear memory usage comes from this code in key_and: SEL_ARG *e1=key1->first(), *e2=key2->first(), *new_tree=0; ^ Start from the first range in both range lists uint max_part_no= MY_MAX(key1->max_part_no, key2->max_part_no); uint new_weight= 0;   while (e1 && e2) { int cmp=e1->cmp_min_to_min(e2); ... if (!next || next->type != SEL_ARG::IMPOSSIBLE) { SEL_ARG *new_arg= e1->clone_and(param->thd, e2); Each clone_and() call allocates a SEL_ARG. In the worst case, this will get called key1->elements + key2->elements times.
            psergei Sergei Petrunia added a comment - - edited

            Coming back to the example of doing range analysis for

            key NOT IN (c11, c12, ..., c1N) AND
            key NOT IN (c21, c22, ..., c2M) AND
            ...
            key NOT IN (cM1, cM2, ..., cMN)
            

            Suppose we've processed K lines.
            When comping key_and() with the next line, we will allocate K*(M+1) SEL_ARG objects.

            Summing it up for all lines, one gets this many SEL_ARG objects:

            SUM (from 1 to K) (K * (M+1)) = (M+1) * K^2/2
            

            this is where degree-two dependency comes from.

            psergei Sergei Petrunia added a comment - - edited Coming back to the example of doing range analysis for key NOT IN (c11, c12, ..., c1N) AND key NOT IN (c21, c22, ..., c2M) AND ... key NOT IN (cM1, cM2, ..., cMN) Suppose we've processed K lines. When comping key_and() with the next line, we will allocate K*(M+1) SEL_ARG objects. Summing it up for all lines, one gets this many SEL_ARG objects: SUM (from 1 to K) (K * (M+1)) = (M+1) * K^2/2 this is where degree-two dependency comes from.
            psergei Sergei Petrunia added a comment - - edited

            Ideas for fix:

            Reuse SEL_ARG objects from AND-ed SEL_ARG trees

            Re-use the SEL_ARG objects from one or both source SEL_ARG trees. This is possible if the trees are not used anywhere.
            This might be a bit complex to do if we want to keep the R-B tree structure.
            We could of course ignore the R-B structure and just do a linear merge: traverse both range lists starting from keyX->first() and going through keyX->next chain but this might be inefficient.

            Reuse the SEL_ARG objects (garbage heap)

            A bit simpler approach: there are calls to SEL_ARG::free_tree() for SEL_ARG trees that are no longer used. We could collect those (e.g. in a linked list) and then get the new SEL_ARG objects from that linked list, instead.

            Rely on SEL_ARG::MAX_SEL_ARGs

            Make SEL_ARG::clone_and() increment RANGE_OPT_PARAM::alloced_sel_args. When too may SEL_ARG objects are allocated, range optimization will be terminated.
            This has a downside that no range plan will be produced AT ALL.

            (Very basic) Make NOT-IN non-sargable

            As mentioned above: there is no much point in making conditions in form

            unique_col NOT IN ( ... )
            

            sargable. We can just make them non-sargable. This is only a partial solution, as one can still observe the problem with
            non_unique_col NOT IN (...)

            ... but it will solve this case.

            psergei Sergei Petrunia added a comment - - edited Ideas for fix: Reuse SEL_ARG objects from AND-ed SEL_ARG trees Re-use the SEL_ARG objects from one or both source SEL_ARG trees. This is possible if the trees are not used anywhere. This might be a bit complex to do if we want to keep the R-B tree structure. We could of course ignore the R-B structure and just do a linear merge: traverse both range lists starting from keyX->first() and going through keyX->next chain but this might be inefficient. Reuse the SEL_ARG objects (garbage heap) A bit simpler approach: there are calls to SEL_ARG::free_tree() for SEL_ARG trees that are no longer used. We could collect those (e.g. in a linked list) and then get the new SEL_ARG objects from that linked list, instead. Rely on SEL_ARG::MAX_SEL_ARGs Make SEL_ARG::clone_and() increment RANGE_OPT_PARAM::alloced_sel_args. When too may SEL_ARG objects are allocated, range optimization will be terminated. This has a downside that no range plan will be produced AT ALL. (Very basic) Make NOT-IN non-sargable As mentioned above: there is no much point in making conditions in form unique_col NOT IN ( ... ) sargable. We can just make them non-sargable. This is only a partial solution, as one can still observe the problem with non_unique_col NOT IN (...) ... but it will solve this case.
            psergei Sergei Petrunia added a comment - A patch implementing the "basic" solution. http://lists.askmonty.org/pipermail/commits/2020-November/014350.html

            igor, please review

            psergei Sergei Petrunia added a comment - igor , please review

            Patch reviewed, ok to push.
            I plan to work with Igor on creating a upper level of SEL_ARG arguments created by the range optimizer

            monty Michael Widenius added a comment - Patch reviewed, ok to push. I plan to work with Igor on creating a upper level of SEL_ARG arguments created by the range optimizer

            Pushed the "conservative" fix.

            psergei Sergei Petrunia added a comment - Pushed the "conservative" fix.

            ccalender "conservative" means this fix:
            https://jira.mariadb.org/browse/MDEV-21958?focusedCommentId=171309&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-171309
            It implements the "(Very basic) Make NOT-IN non-sargable" suggestion. It solves the issue when the column col in the "col NOT IN (...)" is the primary (or the unique secondary) key. This was the case in the customer's complaint.

            psergei Sergei Petrunia added a comment - ccalender "conservative" means this fix: https://jira.mariadb.org/browse/MDEV-21958?focusedCommentId=171309&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-171309 It implements the "(Very basic) Make NOT-IN non-sargable" suggestion. It solves the issue when the column col in the "col NOT IN (...)" is the primary (or the unique secondary) key. This was the case in the customer's complaint.

            A question from yesterday's optimizer call: Will the fix for MDEV-9750 help with this?
            The answer is: "not much", unless one sets optimizer_max_sel_arg_weight really low.

            The problem in this MDEV is N^2 memory usage. Let's look the example data from here:

            https://jira.mariadb.org/browse/MDEV-21958?focusedCommentId=170462&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-170462

            n_ranges	SEL_ARGS	SEL_ARGs per range
            11	11	1.00
            21	43	2.05
            51	199	3.90
            101	659	6.52
            201	2329	11.59
            401	8669	21.62
            601	19009	31.63
            801	33349	41.63
            1001	51689	51.64
            2001	203389	101.64
            4001	806789	201.65
            

            here n_ranges=4000 , SEL_ARGs=800K.

            The default max_sel_arg_weight is 32K. Before we construct an OR-list of 32K elements (and hit the limit),we may allocate up to

            32K ^ 2 = 1024 * KK = 1024M= 1G
            

            SEL_ARG objects. Each is ~100 bytes, so we'll use up to ~100G memory.

            psergei Sergei Petrunia added a comment - A question from yesterday's optimizer call: Will the fix for MDEV-9750 help with this? The answer is: "not much", unless one sets optimizer_max_sel_arg_weight really low . The problem in this MDEV is N^2 memory usage. Let's look the example data from here: https://jira.mariadb.org/browse/MDEV-21958?focusedCommentId=170462&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-170462 n_ranges SEL_ARGS SEL_ARGs per range 11 11 1.00 21 43 2.05 51 199 3.90 101 659 6.52 201 2329 11.59 401 8669 21.62 601 19009 31.63 801 33349 41.63 1001 51689 51.64 2001 203389 101.64 4001 806789 201.65 here n_ranges=4000 , SEL_ARGs=800K. The default max_sel_arg_weight is 32K. Before we construct an OR-list of 32K elements (and hit the limit),we may allocate up to 32K ^ 2 = 1024 * KK = 1024M= 1G SEL_ARG objects. Each is ~100 bytes, so we'll use up to ~100G memory.

            Ok, the "conservative" fix that was pushed now has its own MDEV: MDEV-24711

            psergei Sergei Petrunia added a comment - Ok, the "conservative" fix that was pushed now has its own MDEV: MDEV-24711

            Closing this one.
            Followup task: MDEV-26856.

            psergei Sergei Petrunia added a comment - Closing this one. Followup task: MDEV-26856 .

            People

              psergei Sergei Petrunia
              suresh.ramagiri@mariadb.com suresh ramagiri
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.