[MDEV-13648] Add FULL OUTER JOIN to MariaDB Created: 2017-08-25  Updated: 2023-11-30

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: None

Type: New Feature Priority: Critical
Reporter: Juan Telleria Assignee: Igor Babaev
Resolution: Unresolved Votes: 14
Labels: Compatibility, gsoc19, gsoc20

Attachments: PNG File dplyr Combine Tables.PNG    
Issue Links:
Blocks
blocks MCOL-3303 Add FULL OUTER JOIN to MariaDB Column... Closed
blocks MDEV-15041 Implement MERGE statement Open
blocks MDEV-20018 sql_mode="oracle" does not support FU... Open
PartOf
is part of MDEV-10872 Providing compatibility Oracle database Open
Relates
relates to MCOL-3303 Add FULL OUTER JOIN to MariaDB Column... Closed

 Description   

Add support for FULL OUTER JOIN

https://www.w3schools.com/sql/sql_join_full.asp

One of the way how to implement is to re-write the query

select t1.*, t2.* from t1 full outer join t2 on P(t1,t2)

into the following union all:

select t1.*, t2.* from t1 left outer join t2 on P(t1,t2)
union all
select t1.*,t2.* from t2 left outer join t1 on P(t1,t2) where t1.a is null

Here t1.a is some non-nullable column of t1 (e.g. the column of single column primary key).



 Comments   
Comment by Juan Telleria [ 2017-09-26 ]

PostgreSQL indeed has this feature:

PostgreSQL FULL OUTER JOIN

I am a little bit disappointed that MariaDB does not have this basic feature, as it forces users to write more verbose scripts.

Comment by Alexander Barkov [ 2018-12-06 ]

Hi ralf.gebhardt@mariadb.com. Sorry, my knowledge in this area is minimal. I suggest to ask sanja.

Comment by Oleksandr Byelkin [ 2018-12-06 ]

We can discuss it on next optimiser call (Tuesday).

Comment by Sergei Petrunia [ 2018-12-06 ]

Full outer join is used by two queries in TPC-DS:

https://jira.mariadb.org/browse/MDEV-17802?focusedCommentId=119693&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-119693

I would like to know what is the typical use case for FULL OUTER JOIN? (For LEFT JOIN, it is clear: optional attributes. e.g. SELECT * FROM people LEFT JOIN cars ON... - this returns people, some of them have cars, some don't).

Comment by Daniel Black [ 2018-12-06 ]

A case could be where the direction of the object-subject relation is bidirectional. Like searching for cities without airport and airports without cities in the same query (Stack overflow question). Anything that displays two sets of information where there is an occasional link.

Comment by Juan Telleria [ 2018-12-10 ]

It is mostly useful for analytical workloads, using Aria or Columnstore Engines:

Stackoverflow: When to use FULL OUTER JOIN?

An example, which does not limit `Type` Column to `t_type_datasetA` values:

SELECT
     `Type`,
     CASE
          WHEN
               A.`Value` IS NOT NULL
          THEN
               A.`Value`
          WHEN
               B.`Value` IS NOT NULL
          THEN
               B.`Value`
          ELSE
               NULL
          END AS `Updated_Value`
FROM
     t_type_datasetA AS A
     FULL OUTER JOIN t_type_datasetB AS B
          USING(`Type`)

In general, for analytical worloads (Aria / Columnstore), I always find useful:

  • FULL JOIN.
  • SEMI JOIN.
  • ANTI JOIN.

(See attached dplyr R Package Cheatsheet)

Comment by Krishna Maheshwari [ 2020-03-02 ]

Hello! I learned about MariaDB through GSoC. I believe that I understand this issue stated here and that is why I want to solve this as my GSoC goal. Please guide me so as to where I can start contributing to this and related issue and be of some help.

Comment by Igor Babaev [ 2020-03-02 ]

Hi Krishna,

For the beginning try to implement this transformation:

select t1.*, t2.*,t3.* from t1 left outer join t2 on P(t1,t2) join t3 on P1(t1,t2,t3) 
=> (transformed into)
select t.*,t3.* from (select t1.*,t2.* from t1 left outer join t2 on P(t1,t2)) as t join on P1(t,t3)

This transformation is called nesting of a join operation.

Each full outer join expression must be first nested. Nesting does not depend of the type of the join to be nested.
We already do nesting for set expressions like (SELECT... INTERSECT SELECT ...). You can look how it's done.

Comment by Jeeukrishnan Kayshyap (Inactive) [ 2020-03-03 ]

Hi ,I want to solve this issue as GSoC project 2020.Please guide me about where and how to start so that I can contibute to this organisation.Looking forward to get the guidance as soon as possible and work with this organization.

Comment by Igor Babaev [ 2020-03-03 ]

Hi Jeeukrishnan,
First of all you need to install git on your computer and create local git tree for git@github.com:MariaDB/server.git.
After this try to build the branch 10.5.

Comment by Parth Arora [ 2020-03-25 ]

Hello, Everyone! I got to know about MariaDB through GSoC.
I have good knowledge of SQL.
I would really love to contribute to the code base.
Any advice/suggestions/directions to help me understand this issue clearly?
I am interested in 1)Add full outer join to MariaDB.

Comment by Desikan Sundararajan [ 2020-03-27 ]

Hi,
I'd like an opportunity to contribute to this organization. I have experience working in C/C++ and am currently already working on a C++ project that implements a relational database system. I would like to know if this project is still available for taking up and if yes, would like to discuss more about this task with a potential mentor. I'm applying to this organization as a part of GSoC 2020.

Comment by Igor Babaev [ 2020-03-27 ]

Hi Desikan,

How do you prefer to discuss this task? By Skype or in Zoom?

Regards,

Igor.

Comment by Desikan Sundararajan [ 2020-03-27 ]

Hi Igor,

Thank you for the quick response. I would like to discuss it over Zoom.
Please let me know what time works for you. I'm available post 6 PM ET
today (27th March) and anytime tomorrow.

Regards,

On Fri, Mar 27, 2020 at 12:56 AM Igor Babaev (Jira) <jira@mariadb.org>


S.Desikan

Comment by Igor Babaev [ 2020-03-27 ]

Ok, 6:00 PM (march 27). You have to use this number 617-623-132.

Comment by Yitan Ze [ 2020-03-27 ]

Hello,

I'm applying for GSoP, and this project caught my eyes. I've done a similar project in a big data search/analytics platform too. On that platform, I implemented 40+ search commands including join, lookup, stats, top etc. I've heard about MariaDB for a long time and I hope to contribute to this community. Are there still any available spots for this position? If yes, I would like to know more about this task. My email is yitanze@gmail.com.

Comment by Desikan Sundararajan [ 2020-03-27 ]

Hi Igor,

I have joined the meeting on Zoom. Waiting for you to join.

Regards,


S.Desikan

Comment by Igor Babaev [ 2020-03-27 ]

Please join again, I'm there.

Comment by Igor Babaev [ 2020-03-28 ]

Hi Yitan,

Do you have zoom or skype ?

Regards,

Igor.

Comment by Yitan Ze [ 2020-03-28 ]

Yes, I have zoom

Comment by Igor Babaev [ 2020-03-28 ]

Let's setup a zoom session.

Comment by Igor Babaev [ 2020-03-30 ]

For those of you who are preparing proposals for this project:
In your proposal you are expected to cover the following issues:
1. How to build the tree for the result of re-writing for a join expression with FULL OUTER JOIN.
(It's not trivial as the parser provides you only with one tree that can be easily transformed into a
tree either for LEFT OUTER JOIN or for RIGHT OUTER JOIN, but not for both and you need the
trees for two SELECTs, not just for one. You can get an idea how you could build the
tree for the second SELECT in sql_view.cc).
2. At what stage to do the re-writing: at the parser stage, at the beginning of the context analysis
stage (see JOIN::prepare or at the beginning of the optimization stage (see JOIN::optimizer).
3. How to handle embedded FULL OUTER JOINs.
4. How and when to check possible conversion to LEFT OUTER JOIN or RIGHT OUTER JOIN.
5. What adjustments must be done for different references to the tables used in the original join
expression with FULL OUTER JOIN that may occur outside of this expression.

Comment by Jens Schauder [ 2023-03-21 ]

If it makes a difference: I'm working on a very interesting feature for Spring Data JDBC which will require full outer joins: https://github.com/spring-projects/spring-data-relational/issues/1445

Comment by Weijun Huang [ 2023-04-06 ]

If no one else is currently assigned to this task, I would be happy to take ownership of this ticket and work on it.

Comment by Igor Babaev [ 2023-04-06 ]

Weijun Huang: This is a pretty complex task. Are you sure you can handle it?

Comment by Weijun Huang [ 2023-04-06 ]

Hi Igor Babaev, this issue is interesting. And I am glad to pick this up and follow the advice above. I will implement one native join first, then optimize it.

Comment by Weijun Huang [ 2023-04-10 ]

Hi Igor Babaev,
I went through `sql_view.cc` to figure out how to create a new select tree. In the view clause, it maintains a select list. At the beginning, it needs to implement a select query, so it invokes `parse_sql` and `lex->first_select_lex()`, and processes the table. The View clause does this in the `mysql_make_view` function.

So my idea is to push table A and table B in `sql_yacc.yy`, then do a left join first to create a temporary table followed by a right join to create another temporary table. After that, push two tables in the table list. When we get the two tables, we could union them and get the result. WDYT?

Generated at Thu Feb 08 08:07:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.