[MDEV-28852] Improve optimization of joins with many tables, including eq_ref tables Created: 2022-06-15 Updated: 2023-07-25 Resolved: 2023-06-01 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Fix Version/s: | 10.10.1, 11.0.1 |
| Type: | Task | Priority: | Major |
| Reporter: | Sergei Petrunia | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | Preview_10.10, optimizer-feature | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
This is a followup to Problem descriptionJoin optimizer can be very slow when the join has many tables and join prefix pruning feature doesn't manage to prune join prefixes. An important special case is when a subset of join tables are joined with an equi-join condition on a primary key, and so can be joined using eq_ref in any order. Then, query plans that use different permutations of these tables have the same cost. The optimizer will consider all possible permutations. This can take a lot of CPU but provide little benefit. The fix for SolutionTODO description. Copy from: Join prefix pruning is now controlled by @@optimizer_prune_level variable. The new pruning is done when @@optimizer_prune_level=2, which is the new default. One can disable certain aspects (TODO elaborate) of the new behavior by setting @@optimizer_prune_level=1 (the default before 10.10). The treeIt is preview-10.10- |
| Comments |
| Comment by Sergei Golubchik [ 2022-06-18 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The branch is preview-10.10-optimizer | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2022-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The final version of the patch to the last known bug related to the feature is here: bb-10.10-mdev28929-v4 | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2022-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I have no arguments against pushing the feature as of bb-10.10-mdev28929-v4 ac3b3807cc into 10.10 main and releasing with 10.10.1. Minority report As discussed on different occasions before, I disagree with the line our optimizer takes for distinguishing use cases of interest from ignorable ones, bug-wise. There is a lot of confusion in regard to what is a "realistic" use case, and it becomes especially extreme when performance is concerned. We do run lots of truly unrealistic queries in our tests, like when a bit column would be compared to a timestamp, or a meaningless chain of functions would be used, and so on. But I'm not talking about those. What optimizer tends to demand for "realistic", when the data is nearly normally distributed, all tables are populated with a suitable number of rows, all needed indexes exist, and all join conditions are fulfilled, is not a realistic use case, it is an ideal use case. They do exist of course, but in the real world, for every ideal setup there will probably be hundreds of non-ideal ones. There are plenty of empty tables out there, the data is distributed in all possible ways, indexes are missing or misused, statistics aren't up-to-date, and filtering conditions miss the mark. It doesn't make them unrealistic or corner cases. While it is totally fine to limit new improvements and fine-tuning to as narrow of a scope as we want, and use the ideal setup for demonstrating our superiority, it cannot be applied the same way to regression considerations. Small regressions can probably be tolerated, or at least it can be reasonably argued that if an application is so sensitive to performance fluctuations, it should be first of all optimized itself. But if an application had been doing the same things the same way for years and was performing reasonably well, and suddenly the latency of its query jumps from milliseconds to tens of minutes, there is no justification which can convince the affected users that it's not a bug but their fault. As we would consider a new crash or a wrong result on such a query to be a real regression and wouldn't claim that the user needs to rewrite the query, severe performance degradations should be treated the same way. So, while officially testing the feature for wrong results and stability regressions, I was also taking sideline notes in regard to the performance (latency) comparison between the feature tree and the baseline. Unlike official benchmark and other "good" test configurations, mine is in no way tuned to highlight the strengths of the optimizer, if anything it's the contrary. I expect this approach to backfire when/if similar problems start affecting real users, but since that's the reality – the cases are not going to be fixed at least until real users complain – and the new feature seems to be doing more good than harm on average, I see no valid reasons to object against releasing it. Performance testing is not my specialty anyway, I cannot claim any expertise and have a decisive opinion on the subject. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Muhammad Irfan [ 2022-10-21 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
one customer, impacted by same issue when there are good no. of tables to join then queries are stuck in "statistics" state and take long to generate EXPLAIN output. Lowering optimizer_search_depth to small value worked well and setting optimizer_search_depth=0 result as good workaround. https://mariadb.com/kb/en/server-system-variables/#optimizer_search_depth | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by suresh ramagiri [ 2023-04-21 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
One of our customers, also having this problem at 10.6.10 release. Locally, I can repro this case, with their shared data and query. For now, asked to set optimizer_search_depth=0 as a workaround, which made the query to execute faster though. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2023-06-01 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
ralf.gebhardt, I've checked. There were multiple commits pushed that were marked with The fix for
Then, there were a series of commits pushed into 10.10.1+:
... a few more related commits follow, until this one:
Then, there is this commit (also in 10.10.1+) which is a fix for patch for this MDEV and is the only one mentioning
The commits that fix this MDEV are b3c74bdc1f09b2b8babd7f2bd9d52df2749ddcc3..515b9ad05a6de9dac3871ef2769dde1b5834c6e3 , endpoints inclusive. |