[MDEV-8440] Left Outer Join creates Cartesian join Created: 2015-07-10  Updated: 2015-07-16  Resolved: 2015-07-16

Status: Closed
Project: MariaDB Server
Component/s: Parser
Affects Version/s: 10.0.16
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Peter McLarty Assignee: Sergei Golubchik
Resolution: Not a Bug Votes: 0
Labels: cartesian, join, left
Environment:

Linux Redhat 2.6.32-504.12.2.el6.x86_64 IBM Server 96GB RAM



 Description   

I created an SQL using a Left Join with resulted in a cartesian join as a result.
The format is

select 
a.col1, a.col2, b.col1, b.col1, b.col3, b.col4 
from tablea a 
left outer join tableb b on b.col4 and b.col1 
where b.col5 < current_date()

The erroneously placed and (should be an = ) creates a cartesian join. Some other database technology calls a parser error on this.
In the case of the system, it happened in on column in the join condition is a primary key and the other is the lead column of a multi-column index of a primary key



 Comments   
Comment by Elena Stepanova [ 2015-07-11 ]

Meerkat63,

So, what would be the expected result from your point of view, a syntax error?
Because if we accept the syntax, the cartesian product seems to be the only reasonable result set, do you agree?

Regarding the syntax, I suppose that technically the complaint is valid. I'm not very fluent at reading the SQL standard, but if I understand the description of search conditions correctly, col1 AND col2 is not one, because colX is not a predicate.

However, it has been accepted in MySQL for very long time, so disabling it now would be a disaster for existing applications.

Do you suggest to make the change anyway?

Comment by Peter McLarty [ 2015-07-13 ]

I would think that in this case changing this to block the cartesian join is preferable.

The below is what SQL Server responds with and I sort of get that in this case. I am not saying SQL Server is right, just an alternate point of view

An expression of non-boolean type specified in a context where a condition is expected, near 'and'.

I hope that no one actually uses the erroneous left join syntax. Maybe I am not understanding some SQL standards information either but clearly a Left Join states that all from the left table (table 1) and the matching rows on the right table(table 2) to me doesn't leave much scope to turn this into a cartesian join due to this method of handling a boolean situation.

Comment by Elena Stepanova [ 2015-07-13 ]

FWIW, while the corner case in the description is indeed questionable, there are perfectly legal ways to get a cartesian product from the left join, fully compliant with the SQL standard, e.g. (assuming col4 is not nullable)

select 
a.col1, a.col2, b.col1, b.col1, b.col3, b.col4 
from tablea a 
left outer join tableb b on b.col4 is not null

I don't expect SQL Server refuses it either, or does it?

And yes, the experience shows, whatever weird syntax is available, a number of people use it.

Comment by Peter McLarty [ 2015-07-13 ]

I defer to those greater than me on making this change. Personally I think the syntax as I originally presented is out of
Thanks for your feedback

using AdventureWorks2014 database

use AdventureWorks2014;
 
select [Name],[ProductNumber], [ReviewerName], [ReviewDate], [Comments]
from [Production].[Product]
 left join [Production].[ProductReview] pr on pr.[ProductID] is not null;
 

does, in fact, get accepted by SQL Server and does create a cartesian join. Your point is taken. Is mine an edge case I am not such an expert. If teh community agrees my case is an acceptable form then it might be something to go into odd examples of things which can go wrong. I had someone mention to me about a MySQL assumption of the col1 is true and col2 is true and therefore the SQL validates, that is a stretch to me on interpretation but if true and deemed acceptable we live with it.

Comment by Elena Stepanova [ 2015-07-13 ]

Assigning to serg for further consideration.

Comment by Sergei Golubchik [ 2015-07-16 ]

MariaDB works as expected, it seems. AND is a boolean operator. In MariaDB (and in MySQL and in Percona Server) it implicitly casts its argument to the boolean type. So, your query should be equivalent to

select  a.col1, a.col2, b.col1, b.col1, b.col3, b.col4  from tablea a  left outer join tableb b on
   (b.col4<>0) and (b.col1<>0)
where b.col5 < current_date()

(for the sake of the example, I assumed here that col1 and col4 are integers).
This can result in a cartesian join, indeed. But MariaDB (and MySQL before it) always did this kind of implicit casting, changing that behavior will break many thousands of applications.

Generated at Thu Feb 08 07:27:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.