Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4901

Allow pushdown of queries involving UNIONs in outer select to ColumnStore

Details

    Description

      Preparation:

      DROP TABLE IF EXISTS `r1`;
      DROP TABLE IF EXISTS `r2`;
       
      CREATE TABLE IF NOT EXISTS  `r1` (`id1` int) ENGINE=Columnstore;
      CREATE TABLE IF NOT EXISTS  `r2` (`id2` int) ENGINE=Columnstore;
       
      INSERT INTO `r1`  (
      with recursive series as (
      select 1 as id union all
      select id +1 as id from series
      where id < 10000)
      select id from series);
       
       
      INSERT INTO `r2`  (
      with recursive series as (
      select 1 as id union all
      select id +1 as id from series
      where id < 10000)
      select id from series);
      
      

      A simple join will executed as usual (no table mode)

      SELECT   r1.id1,r2.id2
      FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
      

      Adding a simple union select 1,2
      change the behaviour.

       select calSetTrace(1);
      (SELECT   r1.id1,r2.id2
      FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
      union select 1,2;
        select calGetTrace();
      

       select calSetTrace(1);
      (SELECT   r1.id1,r2.id2
      FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
      union select 1,2;
        select calGetTrace();
      

      debug log:

      Oct 20 16:17:39 localhost ExeMgr[961]: 39.070160 |31|0|0| D 16 CAL0041: Start SQL statement: (SELECT   r1.id1,r2.id2#012FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)#012union select 1,2; |test1|
      Oct 20 16:17:39 localhost ExeMgr[961]: 39.123944 |31|0|0| D 16 CAL0041: Start SQL statement: <part of the query executed in table mode>; |test1|
      Oct 20 16:17:39 localhost ExeMgr[961]: 39.146723 |31|0|0| D 16 CAL0042: End SQL statement
      Oct 20 16:17:39 localhost ExeMgr[961]: 39.147224 |31|0|0| D 16 CAL0042: End SQL statement
      

      With big tables, it can cause huge performance impact.

      If I wrap the whole statement, columnstore shows, he can it without table mode.

      Oct 20 16:17:52 localhost ExeMgr[961]: 52.051270 |31|0|0| D 16 CAL0041: Start SQL statement: select * from (#012(SELECT   r1.id1,r2.id2#012FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)#012union select 1,2) a; |test1|
      Oct 20 16:17:52 localhost ExeMgr[961]: 52.062831 |31|0|0| D 16 CAL0042: End SQL statement
      
      

      calGetTrace()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
      +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | 
      Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows 
      BPS  PM   r2    3056     (id2)             0   2   0   0.002   1001 
      BPS  PM   r1    3054     (id1)             0   2   0   0.011   1001 
      HJS  PM   r1-r2 3054     -                 -   -   -   -----   -    
      TNS  UM   -     -        -                 -   -   -   0.000   10   
      TCS  UM   -     -        -                 -   -   -   0.000   1    
      TUS  UM   -     -        -                 -   -   -   0.016   11   
      TNS  UM   -     -        -                 -   -   -   0.000   11   
       |
      +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      Debug log:

      Oct 20 16:17:52 localhost ExeMgr[961]: 52.051270 |31|0|0| D 16 CAL0041: Start SQL statement: select * from (#012(SELECT   r1.id1,r2.id2#012FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)#012union select 1,2) a; |test1|
      Oct 20 16:17:52 localhost ExeMgr[961]: 52.062831 |31|0|0| D 16 CAL0042: End SQL statement
      

      Attachments

        Issue Links

          Activity

            Waiting on fix for MDEV-25080.

            ccalender Chris Calender (Inactive) added a comment - Waiting on fix for MDEV-25080 .

            For QA:

            For the before and after comparison of the query execution involving UNION in outer select, you can either use the affected version marked in this ticket, which is 6.1.1 or the last released version which is 22.08.8. Run the queries in the ticket description and compare the debug.log output with and without this feature. EXPLAIN output on the query would also be different.

            Please also run a simple performance test to compare outer UNION query execution times. An example performance test is in MCOL-4584.

            tntnatbry Gagan Goel (Inactive) added a comment - For QA: For the before and after comparison of the query execution involving UNION in outer select, you can either use the affected version marked in this ticket, which is 6.1.1 or the last released version which is 22.08.8. Run the queries in the ticket description and compare the debug.log output with and without this feature. EXPLAIN output on the query would also be different. Please also run a simple performance test to compare outer UNION query execution times. An example performance test is in MCOL-4584 .
            dleeyh Daniel Lee (Inactive) added a comment - - edited

            Build verified:
            engine: 4d4e4ad30dd2ec494ea9d323e1fa7fa69e97243e
            server: 1916028f898cf672a8b79d2b585b1d74f8bcd7db
            buildNo: 6965

            Used release 22.08.8 as the before case base line.
            Verified test case in the description, as well as the test case in MCOL-4584.
            1000000 row dataset was used on a 3PM cluster, with 24gb memory in each node.

            Over all, performance of the union push down is much faster, 2,046 times faster.

            Commands used to load tables:

            echo "dummy" | awk '

            {for(i=1; i<=1000000; i++)print i}'|cpimport mytest r1
            echo "dummy" | awk '{for(i=1; i<=1000000; i++)print i}

            '|cpimport mytest r2

            Attempted to use larger datasets, but release 22.08.8 would take too long to
            process the union query.

            Explain plan

            22.08.8
             
            MariaDB [mytest]> explain (SELECT   r1.id1,r2.id2 FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10) union select 1,2;
            +------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
            | id   | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
            +------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
            |    1 | PRIMARY      | r1         | ALL  | NULL          | NULL | NULL    | NULL | 2000 |                                                 |
            |    1 | PRIMARY      | r2         | ALL  | NULL          | NULL | NULL    | NULL | 2000 | Using where; Using join buffer (flat, BNL join) |
            |    2 | UNION        | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL | No tables used                                  |
            | NULL | UNION RESULT | <union1,2> | ALL  | NULL          | NULL | NULL    | NULL | NULL |                                                 |
            +------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
            4 rows in set (0.001 sec)
             
             
            23.02.1
             
            MariaDB [mytest]> explain (SELECT   r1.id1,r2.id2 FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10) union select 1,2;
            +------+--------------+-------+------+---------------+------+---------+------+------+-------+
            | id   | select_type  | table | type | possible_keys | key  | key_len | ref  | rows | Extra |
            +------+--------------+-------+------+---------------+------+---------+------+------+-------+
            | NULL | PUSHED UNION | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | NULL  |
            +------+--------------+-------+------+---------------+------+---------+------+------+-------+
            1 row in set (0.001 sec)
            

            Performance

            22.08.8
             
            MariaDB [mytest]> (SELECT   r1.id1,r2.id2
                -> FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
                -> union select 1,2;
            +--------+--------+
            | id1    | id2    |
            +--------+--------+
            | 773953 | 773953 |
            | 773954 | 773954 |
            | 773955 | 773955 |
            | 773956 | 773956 |
            | 773957 | 773957 |
            | 773958 | 773958 |
            | 773959 | 773959 |
            | 773960 | 773960 |
            | 773961 | 773961 |
            | 773962 | 773962 |
            |      1 |      2 |
            +--------+--------+
            11 rows in set (7 min 20.034 sec)
             
            MariaDB [mytest]> select count(id1) from r1 union all select 1;
            +------------+
            | count(id1) |
            +------------+
            |    1000000 |
            |          1 |
            +------------+
            2 rows in set (0.181 sec)
             
             
            23.02.1
             
            MariaDB [mytest]> (SELECT   r1.id1,r2.id2
                -> FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
                -> union select 1,2;
            +-------+-------+
            | id1   | id2   |
            +-------+-------+
            |     1 |     2 |
            | 20001 | 20001 |
            | 20002 | 20002 |
            | 20003 | 20003 |
            | 20004 | 20004 |
            | 20005 | 20005 |
            | 20006 | 20006 |
            | 20007 | 20007 |
            | 20008 | 20008 |
            | 20009 | 20009 |
            | 20010 | 20010 |
            +-------+-------+
            11 rows in set (0.215 sec)
             
             
            MariaDB [mytest]> select count(id1) from r1 union all select 1;
            +------------+
            | count(id1) |
            +------------+
            |          1 |
            |    1000000 |
            +------------+
            2 rows in set (0.079 sec)
            

            calgettrace()

            query:
            (SELECT   r1.id1,r2.id2
             FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
             union select 1,2;
             
            22.08.0
             
            MariaDB [mytest]> select calGetTrace();
            +---------------+
            | calGetTrace() |
            +---------------+
            | NULL          |
            +---------------+
            1 row in set (0.001 sec)
             
             
            23.02.1
             
            MariaDB [mytest]> select calGetTrace();
            +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | calGetTrace()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
            +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            | 
            Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows    
            BPS  PM   r1    3040     (id1)             630 648 0   0.025   1000000 
            BPS  PM   r2    3043     (id2)             166 70  0   0.059   98304   
            HJS  PM   r2-r1 3043     -                 -   -   -   -----   -       
            TNS  UM   -     -        -                 -   -   -   0.000   10      
            TCS  UM   -     -        -                 -   -   -   0.000   1       
            TUS  UM   -     -        -                 -   -   -   0.133   11      
             |
            +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            1 row in set (0.001 sec)
            

            debug.log

            22.08.8
             
            Mar  1 16:12:09 rocky8 ExeMgr[13104]: 09.498622 |38|0|0| D 16 CAL0028: DbProf: Start Statement: Statement-0 Ver-11 SQL-<part of the query executed in table mode>
            Mar  1 16:12:09 rocky8 ExeMgr[13104]: 09.498673 |38|0|0| D 16 CAL0041: Start SQL statement: <part of the query executed in table mode>; |mytest|
             
             
            23.02.1
             
            Mar  1 15:57:31 rocky8 ExeMgr[13010]: 31.993180 |18|0|0| D 16 CAL0041: Start SQL statement: (SELECT   r1.id1,r2.id2#012FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)#012union select 1,2; |mytest|
            

            dleeyh Daniel Lee (Inactive) added a comment - - edited Build verified: engine: 4d4e4ad30dd2ec494ea9d323e1fa7fa69e97243e server: 1916028f898cf672a8b79d2b585b1d74f8bcd7db buildNo: 6965 Used release 22.08.8 as the before case base line. Verified test case in the description, as well as the test case in MCOL-4584 . 1000000 row dataset was used on a 3PM cluster, with 24gb memory in each node. Over all, performance of the union push down is much faster, 2,046 times faster. Commands used to load tables: echo "dummy" | awk ' {for(i=1; i<=1000000; i++)print i}'|cpimport mytest r1 echo "dummy" | awk '{for(i=1; i<=1000000; i++)print i} '|cpimport mytest r2 Attempted to use larger datasets, but release 22.08.8 would take too long to process the union query. Explain plan 22.08.8   MariaDB [mytest]> explain (SELECT r1.id1,r2.id2 FROM r1 JOIN r2 ON r1.id1 = r2.id2 limit 10) union select 1,2; +------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+ | 1 | PRIMARY | r1 | ALL | NULL | NULL | NULL | NULL | 2000 | | | 1 | PRIMARY | r2 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where; Using join buffer (flat, BNL join) | | 2 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | | +------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+ 4 rows in set (0.001 sec)     23.02.1   MariaDB [mytest]> explain (SELECT r1.id1,r2.id2 FROM r1 JOIN r2 ON r1.id1 = r2.id2 limit 10) union select 1,2; +------+--------------+-------+------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------+-------+------+---------------+------+---------+------+------+-------+ | NULL | PUSHED UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | +------+--------------+-------+------+---------------+------+---------+------+------+-------+ 1 row in set (0.001 sec) Performance 22.08.8   MariaDB [mytest]> (SELECT r1.id1,r2.id2 -> FROM r1 JOIN r2 ON r1.id1 = r2.id2 limit 10) -> union select 1,2; +--------+--------+ | id1 | id2 | +--------+--------+ | 773953 | 773953 | | 773954 | 773954 | | 773955 | 773955 | | 773956 | 773956 | | 773957 | 773957 | | 773958 | 773958 | | 773959 | 773959 | | 773960 | 773960 | | 773961 | 773961 | | 773962 | 773962 | | 1 | 2 | +--------+--------+ 11 rows in set (7 min 20.034 sec)   MariaDB [mytest]> select count(id1) from r1 union all select 1; +------------+ | count(id1) | +------------+ | 1000000 | | 1 | +------------+ 2 rows in set (0.181 sec)     23.02.1   MariaDB [mytest]> (SELECT r1.id1,r2.id2 -> FROM r1 JOIN r2 ON r1.id1 = r2.id2 limit 10) -> union select 1,2; +-------+-------+ | id1 | id2 | +-------+-------+ | 1 | 2 | | 20001 | 20001 | | 20002 | 20002 | | 20003 | 20003 | | 20004 | 20004 | | 20005 | 20005 | | 20006 | 20006 | | 20007 | 20007 | | 20008 | 20008 | | 20009 | 20009 | | 20010 | 20010 | +-------+-------+ 11 rows in set (0.215 sec)     MariaDB [mytest]> select count(id1) from r1 union all select 1; +------------+ | count(id1) | +------------+ | 1 | | 1000000 | +------------+ 2 rows in set (0.079 sec) calgettrace() query: (SELECT r1.id1,r2.id2 FROM r1 JOIN r2 ON r1.id1 = r2.id2 limit 10) union select 1,2;   22.08.0   MariaDB [mytest]> select calGetTrace(); +---------------+ | calGetTrace() | +---------------+ | NULL | +---------------+ 1 row in set (0.001 sec)     23.02.1   MariaDB [mytest]> select calGetTrace(); +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | calGetTrace() | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows BPS PM r1 3040 (id1) 630 648 0 0.025 1000000 BPS PM r2 3043 (id2) 166 70 0 0.059 98304 HJS PM r2-r1 3043 - - - - ----- - TNS UM - - - - - - 0.000 10 TCS UM - - - - - - 0.000 1 TUS UM - - - - - - 0.133 11 | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.001 sec) debug.log 22.08.8   Mar 1 16:12:09 rocky8 ExeMgr[13104]: 09.498622 |38|0|0| D 16 CAL0028: DbProf: Start Statement: Statement-0 Ver-11 SQL-<part of the query executed in table mode> Mar 1 16:12:09 rocky8 ExeMgr[13104]: 09.498673 |38|0|0| D 16 CAL0041: Start SQL statement: <part of the query executed in table mode>; |mytest|     23.02.1   Mar 1 15:57:31 rocky8 ExeMgr[13010]: 31.993180 |18|0|0| D 16 CAL0041: Start SQL statement: (SELECT r1.id1,r2.id2#012FROM r1 JOIN r2 ON r1.id1 = r2.id2 limit 10)#012union select 1,2; |mytest|

            People

              tntnatbry Gagan Goel (Inactive)
              Richard Richard Stracke
              Daniel Lee Daniel Lee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.