[MDEV-4166] Hash based aggregation for large input, small output GROUP BY - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Fix Version/s: None
Component/s: None
Labels:
None

Description

The current sort based GROUP BY implementation performs badly when source data is large (much larger than available RAM) and output is small. An alternative method, supported by Oracle, PostgreSQL, SQL Server, etc. is to use hash aggregation, as described here: http://blogs.msdn.com/b/craigfr/archive/2006/09/20/hash-aggregate.aspx

Because the optimizer may not have sufficient stats to choose hash vs. sort aggregation, an optimization hint should be available for use in SELECT statements.

Attachments

Activity

Ascending order - Click to sort in descending order

Igor Babaev (Inactive) added a comment - 2013-02-12 04:13

Honestly I don't understand what you mean.
When we aggregate with group by we use temporary tables in memory, that are actually hash tables.

Igor Babaev (Inactive) added a comment - 2013-02-12 04:13 Honestly I don't understand what you mean. When we aggregate with group by we use temporary tables in memory, that are actually hash tables.

Mathew Johnston added a comment - 2013-02-12 04:21

Hi Igor,

The use of temporary tables is actually something I've had a hard time finding documentation on. While I'm at it - the behaviour of count(distinct X) seems to be undocumented too. In building the temporary table, is it creating a 'unique' HASH index on the grouped fields, and doing an INSERT INTO tmp ... ON DUPLICATE KEY UPDATE tmp_sum = tmp_sum+values(tmp_sum), etc? I.e. the temp table should only ever contain 1 row for each GROUP BY key?

Anyway, the described algorithm (linked) uses an iterative approach, writing overflow to disk and merging it on subsequent passes. It means the algorithm doesn't have to fall back to filesort if the results don't fit in RAM.

Thanks for your interest!

Mathew Johnston added a comment - 2013-02-12 04:21 Hi Igor, The use of temporary tables is actually something I've had a hard time finding documentation on. While I'm at it - the behaviour of count(distinct X) seems to be undocumented too. In building the temporary table, is it creating a 'unique' HASH index on the grouped fields, and doing an INSERT INTO tmp ... ON DUPLICATE KEY UPDATE tmp_sum = tmp_sum+values(tmp_sum), etc? I.e. the temp table should only ever contain 1 row for each GROUP BY key? Anyway, the described algorithm (linked) uses an iterative approach, writing overflow to disk and merging it on subsequent passes. It means the algorithm doesn't have to fall back to filesort if the results don't fit in RAM. Thanks for your interest!

Sergei Golubchik added a comment - 2013-09-29 17:45

MariaDB can never fall back to filesort. Optimizer decides up front whether to use temporary tables or filesort and it cannot change the decision later.

When a temporary table in meory overflows, it is converted to on-disk temporary table (MyISAM or Aria) and it uses btree indexes instead of hash.

Sergei Golubchik added a comment - 2013-09-29 17:45 MariaDB can never fall back to filesort. Optimizer decides up front whether to use temporary tables or filesort and it cannot change the decision later. When a temporary table in meory overflows, it is converted to on-disk temporary table (MyISAM or Aria) and it uses btree indexes instead of hash.

People

Assignee:: Unassigned

Reporter:: Mathew Johnston

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Due:: 2013-02-26

Created:: 2013-02-12 03:35

Updated:: 2013-09-29 17:45

Resolved:: 2013-09-29 17:45

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server