Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Computing COUNT(DISTINCT <col1>, <col2>, <col3>) traditionally is done via storing the values in a UNIQUE object. The UNIQUE object uses a Red-Black tree with logarithmic complexity for insert and lookup.
This means that we achieve a theoretical complexity of O(rows * log(rows))
There is a way to speed this up, if we know that <col1>, <col2>, <col3> are already sorted.
In that case we only need to check when the tuple (col1, col2, col3) changes and increment the count value. This would lead to a complexity of O(rows), similar to the complexity of COUNT(*)
Test case to show the performance penalty:
analyze format=JSON
|
select count(DISTINCT c,d,e),b from tt where a = 1 group by b\G |
*************************** 1. row ***************************
|
ANALYZE: {
|
"query_block": { |
"select_id": 1, |
"r_loops": 1, |
"r_total_time_ms": 518.9, |
"table": { |
"table_name": "tt", |
"access_type": "ref", |
"possible_keys": ["tt_a_IDX"], |
"key": "tt_a_IDX", |
"key_length": "4", |
"used_key_parts": ["a"], |
"ref": ["const"], |
"r_loops": 1, |
"rows": 248763, |
"r_rows": 500000, |
"r_total_time_ms": 99.395, |
"filtered": 100, |
"r_filtered": 100, |
"attached_condition": "tt.a <=> 1", |
"using_index": true |
}
|
}
|
}
|
1 row in set (0.523 sec) |
And here we can see the performance of count(*) case:
analyze format=JSON
|
select count(*),b from tt where a = 1 group by b\G |
*************************** 1. row ***************************
|
ANALYZE: {
|
"query_block": { |
"select_id": 1, |
"r_loops": 1, |
"r_total_time_ms": 114.54, |
"table": { |
"table_name": "tt", |
"access_type": "ref", |
"possible_keys": ["tt_a_IDX"], |
"key": "tt_a_IDX", |
"key_length": "4", |
"used_key_parts": ["a"], |
"ref": ["const"], |
"r_loops": 1, |
"rows": 248763, |
"r_rows": 500000, |
"r_total_time_ms": 97.696, |
"filtered": 100, |
"r_filtered": 100, |
"attached_condition": "tt.a <=> 1", |
"using_index": true |
}
|
}
|
}
|
1 row in set (0.115 sec) |
Data:
CREATE TABLE tt (a int, b varchar(32), c varchar(64), d varchar(64), e varchar(64), f int) |
CREATE INDEX tt_a_IDX USING BTREE ON tt (a,b,c,d,e); |
insert into tt select 1, mod(seq, 5), mod(seq, 11), mod(seq, 15), mod(seq, 22), seq from seq_1_to_1000000; |
Attachments
Issue Links
- relates to
-
MDEV-30660 Aggregation functions fail to leverage uniqueness property
- Closed
- links to