[MDEV-21879] GROUP_CONCAT(DISTINCT) with little memory Created: 2020-03-05  Updated: 2020-03-05  Resolved: 2020-03-05

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 5.5, 10.1, 10.2, 10.3, 10.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Sergei Golubchik Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-11563 GROUP_CONCAT(DISTINCT ...) may produ... Closed

 Description   

GROUP_CONCAT(DISTINCT) uses Unique object to filter out duplicates.
It does so using a very interesting approach — it adds a value and checks whether Unique::elements_in_tree() has changed. If it did — it means a value was distinct.

    /* Filter out duplicate rows. */
    uint count= unique_filter->elements_in_tree();
    unique_filter->unique_add(table->record[0] + table->s->null_bytes);
    if (count == unique_filter->elements_in_tree())
      row_eligible= FALSE;

But Unique works even if there is not enough memory to store all values — if flushes them to disk as needed. And Unique::elements_in_tree() only counts the memory resident part. That is, as soon as Unique starts flushing, GROUP_CONCAT(DISTINCT) starts returning incorrect results.

For example:

--source include/have_sequence.inc
create table t1 (a varchar(2000)) as select concat(seq, repeat('.', 1998)) as a from seq_1_to_30;
set @@tmp_memory_table_size=1000, @@max_heap_table_size=1000;
--replace_result '.' ''
select group_concat(distinct a) from t1;
set @@tmp_memory_table_size=default, @@max_heap_table_size=default;
--replace_result '.' ''
select group_concat(distinct a) from t1;
drop table t1;


Generated at Thu Feb 08 09:10:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.