[MDEV-8289] Semijoin inflates number of rows in query result Created: 2015-06-09 Updated: 2015-08-18 Resolved: 2015-08-18 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 5.3.12, 5.5, 10.0, 10.1 |
| Fix Version/s: | 10.0.22 |
| Type: | Bug | Priority: | Major |
| Reporter: | Mateusz Michalowski | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
mysql Ver 15.1 Distrib 10.0.18-MariaDB, for osx10.10 (x86_64) using readline 5.1 |
||
| Attachments: |
|
| Sprint: | 5.5.45, 10.1.7-1 |
| Description |
|
Turning semijoin optimization on/off can affect number of rows returned by a query. Enabling semojoin optimisation can cause a query like that:
to return more rows than:
Please, find a full test case setup attached. Just to sum up the test case:
Then the query returns:
|
| Comments |
| Comment by Elena Stepanova [ 2015-06-10 ] | |||||||||||
|
Thanks for the report and the test case. | |||||||||||
| Comment by Sergei Petrunia [ 2015-08-04 ] | |||||||||||
|
EXPLAIN:
if I use optimizer_switch='loosescan=off' , I get the correct result. It seems, Loose Scan strategy is the cause of the problem. | |||||||||||
| Comment by Sergei Petrunia [ 2015-08-04 ] | |||||||||||
|
Indeed, why does it use range over owners_idx with LooseScan. The subquery is
and the query doesn't have an equality between manufacturers_id and owners_idx. The query plan seems to be incorrect. | |||||||||||
| Comment by Sergei Petrunia [ 2015-08-05 ] | |||||||||||
|
Debugging, I see the following to happen: 1. Join optimization is run. It chooses to use LooseScan for manufacturers table, the index is manufacturers_idx. Correct. 2. We end up here:
Here, the code choses to switch to a quick select. It totally ignores the fact that LooseScan doesn't work for the index that the quick select is using. | |||||||||||
| Comment by Sergei Petrunia [ 2015-08-05 ] | |||||||||||
|
The first reaction is to make the code in make_join_select to not invoke SQL_SELECT::test_quick_select when the table in question is used for LooseScan. It is a bit difficult to do: At this point in execution, information about whether Loose Scan is used is only stored in JOIN::best_positions. (It is moved into JOIN_TAB by setup_semijoin_dups_elimination which is called by make_join_readinfo which is called by JOIN::optimize_inner after the call to make_join_select. | |||||||||||
| Comment by Sergei Petrunia [ 2015-08-05 ] | |||||||||||
|
Having fixed that, I get this (incorrect) plan (and a wrong query result):
| |||||||||||
| Comment by Sergei Petrunia [ 2015-08-05 ] | |||||||||||
|
http://lists.askmonty.org/pipermail/commits/2015-August/008227.html. elenst, I would need testing for this commit. The test should check semi-join subqueries for wrong query results and crashes. | |||||||||||
| Comment by Elena Stepanova [ 2015-08-17 ] | |||||||||||
|
I have not got any regressions from 10.0 to 10.0 + your patch. Please push. |