Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
5.5(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
-
None
Description
GROUP_CONCAT(DISTINCT) uses Unique object to filter out duplicates.
It does so using a very interesting approach — it adds a value and checks whether Unique::elements_in_tree() has changed. If it did — it means a value was distinct.
/* Filter out duplicate rows. */ |
uint count= unique_filter->elements_in_tree();
|
unique_filter->unique_add(table->record[0] + table->s->null_bytes);
|
if (count == unique_filter->elements_in_tree()) |
row_eligible= FALSE;
|
But Unique works even if there is not enough memory to store all values — if flushes them to disk as needed. And Unique::elements_in_tree() only counts the memory resident part. That is, as soon as Unique starts flushing, GROUP_CONCAT(DISTINCT) starts returning incorrect results.
For example:
--source include/have_sequence.inc
|
create table t1 (a varchar(2000)) as select concat(seq, repeat('.', 1998)) as a from seq_1_to_30; |
set @@tmp_memory_table_size=1000, @@max_heap_table_size=1000; |
--replace_result '.' ''
|
select group_concat(distinct a) from t1; |
set @@tmp_memory_table_size=default, @@max_heap_table_size=default; |
--replace_result '.' ''
|
select group_concat(distinct a) from t1; |
drop table t1; |
Attachments
Issue Links
- is duplicated by
-
MDEV-11563 GROUP_CONCAT(DISTINCT ...) may produce a non-distinct list
- Closed