Details

    Description

      Currently MariaDB does not allow to use lateral derived tables in queries though it
      uses builds them internally for when employing split optimization.
      Consider the following query

      SELECT t.b FROM t1, LATERAL (SELECT * FROM t2 WHERE t1.a=t2.a) AS t 
      

      Here the reference t1.a in the WHERE condition of derived table t refers to the field a
      of the table t1 used in the same FROM list as t.

      SQL standard allows lateral derived tables and MySQL 8.0 supports them.

      In MySQL the table can depend only on other tables in the same SELECT. If record in dependent table changes it invalidate table.

      My idea is use Item_cache to store external references values on which current content of the table was generated.
      Also maintain table map of dependence (including OUTER_REF_TABLE_BIT) and put check outer values/invalidate of the tables as we get and approved value executing the step in JOIN_TAB array processing (all dependent tables defined and got values in the records )

      In case of only outer dependence do not forget "join->outer_ref_cond->val_int()" in do_select()

      Attachments

        Issue Links

          Activity

            nunop Nuno added a comment - - edited

            Any example where I need to do a subquery with an "ORDER BY" that needs to return 2 or more columns, used by other JOINs in the main query.

            Example (just randomly from top of my mind):

            SELECT
              users.user_id
            , users.user_name
            , some_extra_data.whatever
            , most_popular_post.post_id
            , most_popular_post.title
            , most_popular_post.time
            , most_popular_post.community_id
            , most_popular_post.community_name
            FROM users
            JOIN some_extra_data USING(user_id)
            LATERAL (
              SELECT 
                posts.post_id
              , posts.title
              , posts.time
              , communities.community_id
              , communities.community_name
              FROM posts
              JOIN communities USING(community_id)
              WHERE posts.from=user_id
              ORDER BY posts.total_replies DESC LIMIT 1
            ) AS most_popular_post
            ORDER BY most_popular_post.time DESC
            

            There are plenty of examples like these, and many times I'm forced to create multiple subqueries in the SELECT, for each field I need, or subqueries in subqueries in a JOIN, etc, just because LATERAL isn't available.

            You might suggest the above could be converted by starting the query with posts/communities, instead of users, but please note that in this case, I want ALL users to appear, even if "most_popular_post" is NULL

            nunop Nuno added a comment - - edited Any example where I need to do a subquery with an "ORDER BY" that needs to return 2 or more columns, used by other JOINs in the main query. Example (just randomly from top of my mind): SELECT users.user_id , users.user_name , some_extra_data.whatever , most_popular_post.post_id , most_popular_post.title , most_popular_post.time , most_popular_post.community_id , most_popular_post.community_name FROM users JOIN some_extra_data USING(user_id) LATERAL ( SELECT posts.post_id , posts.title , posts.time , communities.community_id , communities.community_name FROM posts JOIN communities USING(community_id) WHERE posts.from=user_id ORDER BY posts.total_replies DESC LIMIT 1 ) AS most_popular_post ORDER BY most_popular_post.time DESC There are plenty of examples like these, and many times I'm forced to create multiple subqueries in the SELECT, for each field I need, or subqueries in subqueries in a JOIN, etc, just because LATERAL isn't available. You might suggest the above could be converted by starting the query with posts/communities, instead of users, but please note that in this case, I want ALL users to appear, even if "most_popular_post" is NULL
            rjasdfiii Rick James added a comment -

            @Nuno - Thanks. I'll research how LATERAL can be used to solve the "groupwise-max" and "Top-N per group" problems. There are many inefficient algorithms out there and some return extra rows when the max value (eg, total_replies) occurs more than once.

            rjasdfiii Rick James added a comment - @Nuno - Thanks. I'll research how LATERAL can be used to solve the "groupwise-max" and "Top-N per group" problems. There are many inefficient algorithms out there and some return extra rows when the max value (eg, total_replies) occurs more than once.
            kuno@projectpartner.ee Kuno Vaik added a comment -

            I have also optimized many queries using LATERAL JOIN in postgresql and I really miss it in mariadb.

            kuno@projectpartner.ee Kuno Vaik added a comment - I have also optimized many queries using LATERAL JOIN in postgresql and I really miss it in mariadb.
            GavinRay Gavin Ray added a comment -

            Just to add some extra context, `LATERAL` is essentially a window function under the hood.

            The transformation for the following `LATERAL` query. "Fetch the first 5 Artists, and for each Artist, fetch their first 3 Albums":

            select art.*, alb.*
            from (select * from artists limit 5) art
                   cross join lateral (select *
                                       from    albums
                                       where  albums.artist_id = art.artist_id
                                       limit 3) alb
            

            Is:

            SELECT art.*, alb.*
            FROM (
              SELECT *, row_number() OVER (PARTITION BY artist_id ORDER BY album_id) as rn
              FROM albums
            ) alb
            LEFT JOIN (
              SELECT *
              FROM artists
              LIMIT 5
            ) art ON alb.artist_id = art.artist_id
            WHERE alb.rn <= 3;
            

            With the results:

            artist_id	name	album_id	title	artist_id
            1	Artist 1	1	Album 1	1
            1	Artist 1	2	Album 2	1
            1	Artist 1	3	Album 3	1
            2	Artist 2	4	Album 4	2
            2	Artist 2	5	Album 5	2
            2	Artist 2	6	Album 6	2
            3	Artist 3	7	Album 7	3
            3	Artist 3	8	Album 8	3
            3	Artist 3	9	Album 9	3
            

            In fact, in CockroachDB this is exactly how the transformation/rewriting is implemented:

            $> explain select * from abc, lateral (select * from xyz where x = a limit 2);
                                                       info
            -------------------------------------------------------------------------------------------
              distribution: local
              vectorized: true
             
              • filter
              │ estimated row count: 1
              │ filter: row_num <= 2
              │
              └── • window
                  │ estimated row count: 2
                  │
                  └── • hash join
                      │ estimated row count: 2
                      │ equality: (x) = (a)
                      │
                      ├── • scan
                      │     estimated row count: 6 (100% of the table; stats collected 2 minutes ago)
                      │     table: xyz@xyz_pkey
                      │     spans: FULL SCAN
                      │
                      └── • scan
                            estimated row count: 1 (100% of the table; stats collected 3 minutes ago)
                            table: abc@abc_pkey
                            spans: FULL SCAN
            

            GavinRay Gavin Ray added a comment - Just to add some extra context, `LATERAL` is essentially a window function under the hood. The transformation for the following `LATERAL` query. "Fetch the first 5 Artists, and for each Artist, fetch their first 3 Albums" : select art.*, alb.* from ( select * from artists limit 5) art cross join lateral ( select * from albums where albums.artist_id = art.artist_id limit 3) alb Is: SELECT art.*, alb.* FROM ( SELECT *, row_number() OVER (PARTITION BY artist_id ORDER BY album_id) as rn FROM albums ) alb LEFT JOIN ( SELECT * FROM artists LIMIT 5 ) art ON alb.artist_id = art.artist_id WHERE alb.rn <= 3; With the results: artist_id name album_id title artist_id 1 Artist 1 1 Album 1 1 1 Artist 1 2 Album 2 1 1 Artist 1 3 Album 3 1 2 Artist 2 4 Album 4 2 2 Artist 2 5 Album 5 2 2 Artist 2 6 Album 6 2 3 Artist 3 7 Album 7 3 3 Artist 3 8 Album 8 3 3 Artist 3 9 Album 9 3 In fact, in CockroachDB this is exactly how the transformation/rewriting is implemented: $> explain select * from abc, lateral ( select * from xyz where x = a limit 2); info ------------------------------------------------------------------------------------------- distribution: local vectorized: true   • filter │ estimated row count : 1 │ filter: row_num <= 2 │ └── • window │ estimated row count : 2 │ └── • hash join │ estimated row count : 2 │ equality: (x) = (a) │ ├── • scan │ estimated row count : 6 (100% of the table ; stats collected 2 minutes ago) │ table : xyz@xyz_pkey │ spans: FULL SCAN │ └── • scan estimated row count : 1 (100% of the table ; stats collected 3 minutes ago) table : abc@abc_pkey spans: FULL SCAN
            psergei Sergei Petrunia added a comment - - edited

            GavinRay, thanks for an interesting example...

            As far as I understand, window functions come from "decorrelation", not from LATERAL syntax as such....
            row_number here is introduced by TryDecorrelateLimit (I'm guessing but I'm fairly confident):
            https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/norm/rules/decorrelate.opt#L633

            Note that the rule has InnerJoin | InnerJoinApply - it would work for inner join in a similar way...

            I think it's a good point that a good implementation of LATERAL should make an to de-correlate the LATERAL subquery and see if this gives the optimizer new options.

            However, the first milestone in this task is to get the basic (dependent) execution working. Decorrelation is not always possible (or advantageous) so we'll need to handle the case where it is not done, anyway.

            psergei Sergei Petrunia added a comment - - edited GavinRay , thanks for an interesting example... As far as I understand, window functions come from "decorrelation", not from LATERAL syntax as such.... row_number here is introduced by TryDecorrelateLimit (I'm guessing but I'm fairly confident): https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/norm/rules/decorrelate.opt#L633 Note that the rule has InnerJoin | InnerJoinApply - it would work for inner join in a similar way... I think it's a good point that a good implementation of LATERAL should make an to de-correlate the LATERAL subquery and see if this gives the optimizer new options. However, the first milestone in this task is to get the basic (dependent) execution working. Decorrelation is not always possible (or advantageous) so we'll need to handle the case where it is not done, anyway.

            People

              monty Michael Widenius
              igor Igor Babaev
              Votes:
              48 Vote for this issue
              Watchers:
              41 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.