[MCOL-2266] Make ColumnStore "flatten" ("merge") subqueries when appropriate Created: 2019-04-02  Updated: 2023-10-25  Resolved: 2023-10-25

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 1.2.3
Fix Version/s: Icebox

Type: New Feature Priority: Critical
Reporter: Adrian Zhelev (Inactive) Assignee: Gagan Goel (Inactive)
Resolution: Won't Do Votes: 1
Labels: performance

Epic Link: ColumnStore Performance Improvements
Sprint: 2021-3

 Description   

Queries like SELECT count(a) FROM (SELECT a,b,c,d ..., z FROM table) t are presently insufficiently optimized in columnstore, the way they are optimized in every RDBMS.

Normally, the execution plan for a query like that should be identical to SELECT count(a) FROM table. But it isn't. For example:

select count(year) from (select * from flights) q;
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM flights 3028 (year) 0 3176 0 0.378 12717658
TNS UM - - - - - - 0.373 12717658
TAS UM - - - - - - 0.361 1
TNS UM - - - - - - 0.000 1

All rows from all nodes are passed to the PM1 and the entire aggregation is conducted by ExeMgr on it. By contrast:

MariaDB [bts]> select count(year) from flights;

Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM flights 3028 (year) 0 3176 0 0.075 1588
TAS UM - - - - - - 0.067 1
TNS UM - - - - - - 0.000 1

Nodes aggregate locally and in parallel, and send partial aggregated rows of which there are a lot less to the primary which completes the aggregation.

The difference in response time is obviously very significant (17x in the two queries above).



 Comments   
Comment by Gregory Dorman (Inactive) [ 2021-02-18 ]

The problem as reported appears corrected in 5.5.1 (maybe earlier).
"SELECT a,b FROM (SELECT a,b,c,d ..., z FROM table) t" only retrieve columns a and b from table. For example:

The ticket target is changed into the cases where it is still problematic.

MariaDB [bts]> select city, state from (select city, state, country from airports) q;
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM airports 3016 (city,state) 0 4 0 0.002 390
TNS UM - - - - - - 0.000 390
TNS UM - - - - - - 0.000 390

This is a correct plan, and right number of blocks touched.

Comment by David Hall (Inactive) [ 2022-03-04 ]

This is an optimizer issue. We are working on an optimizer design. When a Jira is created for it, this JIRA may be linked or removed.

Generated at Thu Feb 08 02:34:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.