Details
-
New Feature
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Won't Do
-
1.2.3
Description
Queries like SELECT count(a) FROM (SELECT a,b,c,d ..., z FROM table) t are presently insufficiently optimized in columnstore, the way they are optimized in every RDBMS.
Normally, the execution plan for a query like that should be identical to SELECT count(a) FROM table. But it isn't. For example:
select count(year) from (select * from flights) q;
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM flights 3028 (year) 0 3176 0 0.378 12717658
TNS UM - - - - - - 0.373 12717658
TAS UM - - - - - - 0.361 1
TNS UM - - - - - - 0.000 1
All rows from all nodes are passed to the PM1 and the entire aggregation is conducted by ExeMgr on it. By contrast:
MariaDB [bts]> select count(year) from flights;
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM flights 3028 (year) 0 3176 0 0.075 1588
TAS UM - - - - - - 0.067 1
TNS UM - - - - - - 0.000 1
Nodes aggregate locally and in parallel, and send partial aggregated rows of which there are a lot less to the primary which completes the aggregation.
The difference in response time is obviously very significant (17x in the two queries above).