[MDEV-21879] GROUP_CONCAT(DISTINCT) with little memory - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 5.5(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
Fix Version/s: N/A
Component/s: OTHER
Labels:
None

Description

GROUP_CONCAT(DISTINCT) uses Unique object to filter out duplicates.
It does so using a very interesting approach — it adds a value and checks whether Unique::elements_in_tree() has changed. If it did — it means a value was distinct.

    /* Filter out duplicate rows. */

    uint count= unique_filter->elements_in_tree();

    unique_filter->unique_add(table->record[0] + table->s->null_bytes);

    if (count == unique_filter->elements_in_tree())

      row_eligible= FALSE;

But Unique works even if there is not enough memory to store all values — if flushes them to disk as needed. And Unique::elements_in_tree() only counts the memory resident part. That is, as soon as Unique starts flushing, GROUP_CONCAT(DISTINCT) starts returning incorrect results.

For example:

--source include/have_sequence.inc

create table t1 (a varchar(2000)) as select concat(seq, repeat('.', 1998)) as a from seq_1_to_30;

set @@tmp_memory_table_size=1000, @@max_heap_table_size=1000;

--replace_result '.' ''

select group_concat(distinct a) from t1;

set @@tmp_memory_table_size=default, @@max_heap_table_size=default;

--replace_result '.' ''

select group_concat(distinct a) from t1;

drop table t1;

Attachments

Issue Links

is duplicated by

MDEV-11563 GROUP_CONCAT(DISTINCT ...) may produce a non-distinct list

Closed

Activity

There are no comments yet on this issue.

People

Assignee:: Unassigned

Reporter:: Sergei Golubchik

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2020-03-05 19:59

Updated:: 2020-03-05 20:07

Resolved:: 2020-03-05 20:07

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server