[MDEV-12491] JOIN: solve field name ambiguity when possible Created: 2017-04-12  Updated: 2017-05-24

Status: Open
Project: MariaDB Server
Component/s: Parser
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Federico Razzoli Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: join, semi-join


 Description   

With JOINs, sometimes we get this error:

ERROR 1052 (23000): Column 'ENGINE' in where clause is ambiguous

It happens because multiple tables share the same column name, and we did not reference it in the form `table`.`column`. But sometimes the user didn't because he doesn't think it's necessary, and from a purely logical point of view he's right. I can think at least 2 cases.

1) Think this ON:

ON t1.id = t2.id

If we are only joining t1 and t2, the values are identical.

2) Think about a semi-join:

ON t1.id = t2.id
WHERE t2.name IS NULL
ORDER BY name

Unless we are joining more tables, we mean ORDER BY t1.name. I know that the language allows us to order by a constant NULL value. But in a real case semi-join it seems to me perfectly safe to assume that we are referencing a column from t1.



 Comments   
Comment by Daniel Black [ 2017-04-12 ]

Case 1) you can use USING

Case 2)

When SQL becomes a compiled language perhaps this can be optimised out, but until then CPU cycles are required to replace your programmers brain making stuff slower. Until then lets stick to Postel's Law

Comment by Federico Razzoli [ 2017-04-12 ]

Honestly, I don't understand why this is a problem (the error message is clear). But it is, for a lot of developers. I won't insist on this, but before just saying "no", please take a look at StackOverflow or similar sites.

Comment by Elena Stepanova [ 2017-04-12 ]

No matter how it works, there will always be complaints on resources like StackOverflow, because people only care about their scenario and ignore any other possibilities.

Say you introduce the "optimization" for the first case. The first complaint will immediately be something like "why does the query work with t1.id = t2.id, but fails with t1.id = t2.id or t1.id is NULL?!" , especially if the query is generated by an application and can take various forms. Same with the second example.
There are so many possibilities and variations, "what if this", "what if that", some of which are fairly subjective, it will be a mess if the server starts trying to make such decisions.

Comment by Sergei Golubchik [ 2017-05-24 ]

The first example, with ON is also ambiguous:

create table t1 (a int, b int);
insert t1 values (1,1),(2,2),(3,3);
create table t2 (a char(10), c int);
insert t2 values ('1 hundred', 100), ('2 hundreds', 200);
select * from t1 join t2 on (t1.a=t2.a);

But even USING doesn't solve it:

select * from t1 join t2 using(a);
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    1 |  100 |
|    2 |    2 |  200 |
+------+------+------+
select * from t2 join t1 using(a);
+------------+------+------+
| a          | c    | b    |
+------------+------+------+
| 1 hundred  |  100 |    1 |
| 2 hundreds |  200 |    2 |
+------------+------+------+

So, I'd agree that ON can do the same as USING when resolving column names.

Generated at Thu Feb 08 07:58:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.