[MDEV-21958] Query having many NOT-IN clauses running forever and causing available free memory to use completely Created: 2020-03-17 Updated: 2023-08-24 Resolved: 2021-10-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.2.21, 10.4.12 |
| Fix Version/s: | 10.2.37, 10.3.28, 10.4.18, 10.5.10, 10.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | suresh ramagiri | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | range-optimizer | ||
| Environment: |
CentOS/RHEL 7 |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Description |
|
One of the customer reported that running a huge query (25KB) having a NOT-IN clause with many values running forever, causing the server to use available free memory and invoking the oom-killer to kill the mysqld process. Same behavior when tried to get the `EXPLAIN PLAN`. Initially, checking the existing bugs, found following two:
As per those two bugs, we have told customer to turn off the "extended_keys" and check, but it's not helping out. Further, locally on the MariaDB Server versions 10.2.21, 10.3.14 and 10.4.12(ES), I can reproduce the same behavior on an empty table. Here as well, neither setting "extended_keys=off" helped nor the "max_session_mem_used" system variable worked to avoid the exhaustion of available free memory. Repro steps, I will share it separately. Maybe, we need to understand how can we handle this case? Thank You. |
| Comments |
| Comment by Sergei Petrunia [ 2020-10-28 ] | ||||||||||||||||
|
So, let's consider what the optimizer does for
The first line
produces N+1 ranges:
Let's assume it's N ranges to simplify computations. The same goes for every line. Now, let's try to computing the AND, assuming all constants unequal. AND-ing two interval lists with X and Y intervals in the worst case produces X+Y-1 intervals. This means we will finally produce an interval list with
intervals. This is already bad enough, because it's 790 *900=711000 intervals. Another possible is intermediate results. They are allocated on MEM_ROOT. If the space is not reused (and I think it is not), this means range optimizer will consume the memory for N *M/2 * M intervals | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-29 ] | ||||||||||||||||
|
An experiment with counting # of SEL_AGS allocated: The queries have form:
The dependency is clearly non-linear. This agrees with the no-reuse hypothesis. Another observation: PARAM::alloced_sel_args=0 for all queries. Apparently it was not counting all the places where SEL_ARGs are allocated? | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-29 ] | ||||||||||||||||
|
As a side observation, what merit there is in trying to consider conditions in form
sargable? They are obviously either non selective, or put a great strain on the range optimizer... | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-29 ] | ||||||||||||||||
|
A question one might have is why does the range optimizer allocate so many SEL_ARG objects? There are PARAM::alloced_sel_args and the SEL_ARG::MAX_SEL_ARGS=16000 limit, shouldn't this code have hit them? Looking where SEL_ARG::MAX_SEL_ARGS was introduced...
https://github.com/mariadb/server/commit/9639eb3dda5347ecc342de94da3cc0e025e5c058 and there it says:
ANDing intervals is technically not cloning (as the new interval is not the same as the old ones). | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-03 ] | ||||||||||||||||
|
The non-linear memory usage comes from this code in key_and:
^ Start from the first range in both range lists
...
Each clone_and() call allocates a SEL_ARG. | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-03 ] | ||||||||||||||||
|
Coming back to the example of doing range analysis for
Suppose we've processed K lines. Summing it up for all lines, one gets this many SEL_ARG objects:
this is where degree-two dependency comes from. | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-03 ] | ||||||||||||||||
|
Ideas for fix: Reuse SEL_ARG objects from AND-ed SEL_ARG treesRe-use the SEL_ARG objects from one or both source SEL_ARG trees. This is possible if the trees are not used anywhere. Reuse the SEL_ARG objects (garbage heap)A bit simpler approach: there are calls to SEL_ARG::free_tree() for SEL_ARG trees that are no longer used. We could collect those (e.g. in a linked list) and then get the new SEL_ARG objects from that linked list, instead. Rely on SEL_ARG::MAX_SEL_ARGsMake SEL_ARG::clone_and() increment RANGE_OPT_PARAM::alloced_sel_args. When too may SEL_ARG objects are allocated, range optimization will be terminated. (Very basic) Make NOT-IN non-sargableAs mentioned above: there is no much point in making conditions in form
sargable. We can just make them non-sargable. This is only a partial solution, as one can still observe the problem with ... but it will solve this case. | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-05 ] | ||||||||||||||||
|
A patch implementing the "basic" solution. http://lists.askmonty.org/pipermail/commits/2020-November/014350.html | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-05 ] | ||||||||||||||||
|
igor, please review | ||||||||||||||||
| Comment by Michael Widenius [ 2020-12-01 ] | ||||||||||||||||
|
Patch reviewed, ok to push. | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-12-15 ] | ||||||||||||||||
|
Pushed the "conservative" fix. | ||||||||||||||||
| Comment by Sergei Petrunia [ 2021-01-27 ] | ||||||||||||||||
|
ccalender "conservative" means this fix: | ||||||||||||||||
| Comment by Sergei Petrunia [ 2021-01-27 ] | ||||||||||||||||
|
A question from yesterday's optimizer call: Will the fix for The problem in this MDEV is N^2 memory usage. Let's look the example data from here:
here n_ranges=4000 , SEL_ARGs=800K. The default max_sel_arg_weight is 32K. Before we construct an OR-list of 32K elements (and hit the limit),we may allocate up to
SEL_ARG objects. Each is ~100 bytes, so we'll use up to ~100G memory. | ||||||||||||||||
| Comment by Sergei Petrunia [ 2021-01-27 ] | ||||||||||||||||
|
Ok, the "conservative" fix that was pushed now has its own MDEV: | ||||||||||||||||
| Comment by Sergei Petrunia [ 2021-10-19 ] | ||||||||||||||||
|
Closing this one. |