Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
The current sort based GROUP BY implementation performs badly when source data is large (much larger than available RAM) and output is small. An alternative method, supported by Oracle, PostgreSQL, SQL Server, etc. is to use hash aggregation, as described here: http://blogs.msdn.com/b/craigfr/archive/2006/09/20/hash-aggregate.aspx
Because the optimizer may not have sufficient stats to choose hash vs. sort aggregation, an optimization hint should be available for use in SELECT statements.
Honestly I don't understand what you mean.
When we aggregate with group by we use temporary tables in memory, that are actually hash tables.