[MCOL-4901] Allow pushdown of queries involving UNIONs in outer select to ColumnStore Created: 2021-10-20  Updated: 2023-11-17  Resolved: 2023-03-01

Status: Closed
Project: MariaDB ColumnStore
Component/s: Columnstore Select Handler
Affects Version/s: 6.1.1
Fix Version/s: columnstore-10.11.1, 23.02.1

Type: Task Priority: Blocker
Reporter: Richard Stracke Assignee: Gagan Goel (Inactive)
Resolution: Fixed Votes: 0
Labels: mcs_rel_2302_1

Issue Links:
Blocks
blocks MDEV-25080 Allow pushdown of queries involving U... Closed
Duplicate
is duplicated by MCOL-4569 Queries with UNION ALL perform dispro... Closed
is duplicated by MCOL-4584 Significant performance degradation w... Closed
Relates
relates to MCOL-5222 ORDER BY on UNIONs in outer selects d... Stalled
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

Preparation:

DROP TABLE IF EXISTS `r1`;
DROP TABLE IF EXISTS `r2`;
 
CREATE TABLE IF NOT EXISTS  `r1` (`id1` int) ENGINE=Columnstore;
CREATE TABLE IF NOT EXISTS  `r2` (`id2` int) ENGINE=Columnstore;
 
INSERT INTO `r1`  (
with recursive series as (
select 1 as id union all
select id +1 as id from series
where id < 10000)
select id from series);
 
 
INSERT INTO `r2`  (
with recursive series as (
select 1 as id union all
select id +1 as id from series
where id < 10000)
select id from series);

A simple join will executed as usual (no table mode)

SELECT   r1.id1,r2.id2
FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)

Adding a simple union select 1,2
change the behaviour.

 select calSetTrace(1);
(SELECT   r1.id1,r2.id2
FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
union select 1,2;
  select calGetTrace();

 select calSetTrace(1);
(SELECT   r1.id1,r2.id2
FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
union select 1,2;
  select calGetTrace();

debug log:

Oct 20 16:17:39 localhost ExeMgr[961]: 39.070160 |31|0|0| D 16 CAL0041: Start SQL statement: (SELECT   r1.id1,r2.id2#012FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)#012union select 1,2; |test1|
Oct 20 16:17:39 localhost ExeMgr[961]: 39.123944 |31|0|0| D 16 CAL0041: Start SQL statement: <part of the query executed in table mode>; |test1|
Oct 20 16:17:39 localhost ExeMgr[961]: 39.146723 |31|0|0| D 16 CAL0042: End SQL statement
Oct 20 16:17:39 localhost ExeMgr[961]: 39.147224 |31|0|0| D 16 CAL0042: End SQL statement

With big tables, it can cause huge performance impact.

If I wrap the whole statement, columnstore shows, he can it without table mode.

Oct 20 16:17:52 localhost ExeMgr[961]: 52.051270 |31|0|0| D 16 CAL0041: Start SQL statement: select * from (#012(SELECT   r1.id1,r2.id2#012FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)#012union select 1,2) a; |test1|
Oct 20 16:17:52 localhost ExeMgr[961]: 52.062831 |31|0|0| D 16 CAL0042: End SQL statement

calGetTrace()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows 
BPS  PM   r2    3056     (id2)             0   2   0   0.002   1001 
BPS  PM   r1    3054     (id1)             0   2   0   0.011   1001 
HJS  PM   r1-r2 3054     -                 -   -   -   -----   -    
TNS  UM   -     -        -                 -   -   -   0.000   10   
TCS  UM   -     -        -                 -   -   -   0.000   1    
TUS  UM   -     -        -                 -   -   -   0.016   11   
TNS  UM   -     -        -                 -   -   -   0.000   11   
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Debug log:

Oct 20 16:17:52 localhost ExeMgr[961]: 52.051270 |31|0|0| D 16 CAL0041: Start SQL statement: select * from (#012(SELECT   r1.id1,r2.id2#012FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)#012union select 1,2) a; |test1|
Oct 20 16:17:52 localhost ExeMgr[961]: 52.062831 |31|0|0| D 16 CAL0042: End SQL statement



 Comments   
Comment by Chris Calender (Inactive) [ 2022-06-16 ]

Waiting on fix for MDEV-25080.

Comment by Gagan Goel (Inactive) [ 2023-02-28 ]

For QA:

For the before and after comparison of the query execution involving UNION in outer select, you can either use the affected version marked in this ticket, which is 6.1.1 or the last released version which is 22.08.8. Run the queries in the ticket description and compare the debug.log output with and without this feature. EXPLAIN output on the query would also be different.

Please also run a simple performance test to compare outer UNION query execution times. An example performance test is in MCOL-4584.

Comment by Daniel Lee (Inactive) [ 2023-03-01 ]

Build verified:
engine: 4d4e4ad30dd2ec494ea9d323e1fa7fa69e97243e
server: 1916028f898cf672a8b79d2b585b1d74f8bcd7db
buildNo: 6965

Used release 22.08.8 as the before case base line.
Verified test case in the description, as well as the test case in MCOL-4584.
1000000 row dataset was used on a 3PM cluster, with 24gb memory in each node.

Over all, performance of the union push down is much faster, 2,046 times faster.

Commands used to load tables:

echo "dummy" | awk '

{for(i=1; i<=1000000; i++)print i}'|cpimport mytest r1
echo "dummy" | awk '{for(i=1; i<=1000000; i++)print i}

'|cpimport mytest r2

Attempted to use larger datasets, but release 22.08.8 would take too long to
process the union query.

Explain plan

22.08.8
 
MariaDB [mytest]> explain (SELECT   r1.id1,r2.id2 FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10) union select 1,2;
+------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id   | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
+------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
|    1 | PRIMARY      | r1         | ALL  | NULL          | NULL | NULL    | NULL | 2000 |                                                 |
|    1 | PRIMARY      | r2         | ALL  | NULL          | NULL | NULL    | NULL | 2000 | Using where; Using join buffer (flat, BNL join) |
|    2 | UNION        | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL | No tables used                                  |
| NULL | UNION RESULT | <union1,2> | ALL  | NULL          | NULL | NULL    | NULL | NULL |                                                 |
+------+--------------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
4 rows in set (0.001 sec)
 
 
23.02.1
 
MariaDB [mytest]> explain (SELECT   r1.id1,r2.id2 FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10) union select 1,2;
+------+--------------+-------+------+---------------+------+---------+------+------+-------+
| id   | select_type  | table | type | possible_keys | key  | key_len | ref  | rows | Extra |
+------+--------------+-------+------+---------------+------+---------+------+------+-------+
| NULL | PUSHED UNION | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | NULL  |
+------+--------------+-------+------+---------------+------+---------+------+------+-------+
1 row in set (0.001 sec)

Performance

22.08.8
 
MariaDB [mytest]> (SELECT   r1.id1,r2.id2
    -> FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
    -> union select 1,2;
+--------+--------+
| id1    | id2    |
+--------+--------+
| 773953 | 773953 |
| 773954 | 773954 |
| 773955 | 773955 |
| 773956 | 773956 |
| 773957 | 773957 |
| 773958 | 773958 |
| 773959 | 773959 |
| 773960 | 773960 |
| 773961 | 773961 |
| 773962 | 773962 |
|      1 |      2 |
+--------+--------+
11 rows in set (7 min 20.034 sec)
 
MariaDB [mytest]> select count(id1) from r1 union all select 1;
+------------+
| count(id1) |
+------------+
|    1000000 |
|          1 |
+------------+
2 rows in set (0.181 sec)
 
 
23.02.1
 
MariaDB [mytest]> (SELECT   r1.id1,r2.id2
    -> FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
    -> union select 1,2;
+-------+-------+
| id1   | id2   |
+-------+-------+
|     1 |     2 |
| 20001 | 20001 |
| 20002 | 20002 |
| 20003 | 20003 |
| 20004 | 20004 |
| 20005 | 20005 |
| 20006 | 20006 |
| 20007 | 20007 |
| 20008 | 20008 |
| 20009 | 20009 |
| 20010 | 20010 |
+-------+-------+
11 rows in set (0.215 sec)
 
 
MariaDB [mytest]> select count(id1) from r1 union all select 1;
+------------+
| count(id1) |
+------------+
|          1 |
|    1000000 |
+------------+
2 rows in set (0.079 sec)

calgettrace()

query:
(SELECT   r1.id1,r2.id2
 FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)
 union select 1,2;
 
22.08.0
 
MariaDB [mytest]> select calGetTrace();
+---------------+
| calGetTrace() |
+---------------+
| NULL          |
+---------------+
1 row in set (0.001 sec)
 
 
23.02.1
 
MariaDB [mytest]> select calGetTrace();
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| calGetTrace()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows    
BPS  PM   r1    3040     (id1)             630 648 0   0.025   1000000 
BPS  PM   r2    3043     (id2)             166 70  0   0.059   98304   
HJS  PM   r2-r1 3043     -                 -   -   -   -----   -       
TNS  UM   -     -        -                 -   -   -   0.000   10      
TCS  UM   -     -        -                 -   -   -   0.000   1       
TUS  UM   -     -        -                 -   -   -   0.133   11      
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.001 sec)

debug.log

22.08.8
 
Mar  1 16:12:09 rocky8 ExeMgr[13104]: 09.498622 |38|0|0| D 16 CAL0028: DbProf: Start Statement: Statement-0 Ver-11 SQL-<part of the query executed in table mode>
Mar  1 16:12:09 rocky8 ExeMgr[13104]: 09.498673 |38|0|0| D 16 CAL0041: Start SQL statement: <part of the query executed in table mode>; |mytest|
 
 
23.02.1
 
Mar  1 15:57:31 rocky8 ExeMgr[13010]: 31.993180 |18|0|0| D 16 CAL0041: Start SQL statement: (SELECT   r1.id1,r2.id2#012FROM r1 JOIN r2  ON r1.id1 = r2.id2 limit 10)#012union select 1,2; |mytest|

Generated at Thu Feb 08 02:53:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.