[MDEV-11563] GROUP_CONCAT(DISTINCT ...) may produce a non-distinct list Created: 2016-12-14  Updated: 2020-06-12  Resolved: 2020-06-09

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: 10.5.4, 10.3.24, 10.4.14

Type: Bug Priority: Major
Reporter: Varun Gupta (Inactive) Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MDEV-21879 GROUP_CONCAT(DISTINCT) with little me... Closed
Relates
relates to MDEV-11297 Add support for LIMIT clause in GROUP... Closed
relates to MDEV-22011 DISTINCT with JSON_ARRAYAGG gives wro... Closed
relates to MDEV-22089 Group concat returns incorrect result... Stalled
Sprint: 10.2.4-5, 10.2.7-1

 Description   

Here is a simple test case demonstrating the problem. This problem is hit only in the case when records from all
the rows cannot fit in the Unique object

create table t1 (
a VARCHAR(1000),
b int
);
insert into t1 values
( 0 , 1 ),
( 1 , 2 ),
( 2 , 3 ),
( 3 , 4 ),
( 4 , 5 ),
( 5 , 6 ),
( 0 , 11 ),
( 5 , 16 );
set tmp_table_size=1024;
select group_concat(distinct a) from t1;
drop table t1;



 Comments   
Comment by Varun Gupta (Inactive) [ 2016-12-18 ]

The above test Works on MYSQL-5.7

Comment by Sergei Petrunia [ 2016-12-25 ]

The unique-ness is used by Item_func_group_concat::unique_filter.

Grep for 'unique_filter' in sql/*.{h,cc}. One can find

  • code to create Unique object
  • code to clear it
  • code to delete it
  • unique_filter->unique_add() call.

And that's it. There is no Unique::get() call anywhere. Checking unique_filter->elements_in_tree() before/after the Unique::unique_add() works only when the Unique object has one tree.

When multiple trees are present, DISTINCT will not work for elements in the different trees.

Comment by Sergei Petrunia [ 2016-12-25 ]

I've tried digging in the history of this code. It has been written ages ago (about 2006); it doesn't look like DISTINCT was ever properly supported in GROUP_CONCAT.

Comment by Sergei Petrunia [ 2016-12-25 ]

A general approach to compute

DISTINCT col1 ORDER BY col2 [LIMIT n]

is to

  • Put all source data into the Unique object
  • Read it from there and sort it by `col2`.

One can take shortcuts in various special cases.

One of them is @@group_concat_max_len, which is defined as

The maximum permitted result length in bytes for the GROUP_CONCAT() function. The default is 1024.

This allows to infer that Unique object will accomodate all the data in one tree
(Q: however the code will attempt to flush it out if we run out of memory).

Comment by Varun Gupta (Inactive) [ 2017-08-03 ]

When we limit the size of memory for the tree used for distinct in GROUP, we see a non-unique list

create table t1 (
a VARCHAR(1000),
b int
);
insert into t1 values
( 0 , 1 ),
( 1 , 2 ),
( 2 , 3 ),
( 3 , 4 ),
( 4 , 5 ),
( 5 , 6 ),
( 0 , 11 ),
( 5 , 16 );
set local tmp_table_size=1024;
select group_concat(distinct a) from t1;
drop table t1;

Comment by Varun Gupta (Inactive) [ 2017-08-03 ]

After porting the patch from mysql, I see that distinct with GROUP_CONCAT now crashes.
The stack trace is

Thread 1 (Thread 0x7ff9a2b7cb00 (LWP 24160)):
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x000055f383444d5b in my_write_core (sig=6) at /home/temporary/mariadb-10.2/mysys/stacktrace.c:477
#2  0x000055f382de2fcf in handle_fatal_signal (sig=6) at /home/temporary/mariadb-10.2/sql/signal_handler.cc:296
#3  <signal handler called>
#4  0x00007ff9a0668428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00007ff9a066a02a in __GI_abort () at abort.c:89
#6  0x00007ff9a0660bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55f3835bd751 "0", file=file@entry=0x55f3835bd440 "/home/temporary/mariadb-10.2/sql/filesort.cc", line=line@entry=1533, function=function@entry=0x55f3835bde60 <reuse_freed_buff(st_queue*, st_buffpek*, unsigned int)::__PRETTY_FUNCTION__> "void reuse_freed_buff(QUEUE*, BUFFPEK*, uint)") at assert.c:92
#7  0x00007ff9a0660c82 in __GI___assert_fail (assertion=0x55f3835bd751 "0", file=0x55f3835bd440 "/home/temporary/mariadb-10.2/sql/filesort.cc", line=1533, function=0x55f3835bde60 <reuse_freed_buff(st_queue*, st_buffpek*, unsigned int)::__PRETTY_FUNCTION__> "void reuse_freed_buff(QUEUE*, BUFFPEK*, uint)") at assert.c:101
#8  0x000055f382de0c55 in reuse_freed_buff (queue=0x7ff9a2b79a60, reuse=0x7ff99785b510, key_length=1002) at /home/temporary/mariadb-10.2/sql/filesort.cc:1533
#9  0x000055f382de13dc in merge_buffers (param=0x7ff9a2b79b70, from_file=0x7ff997845f48, to_file=0x7ff997999270, sort_buffer=0x7ff9979fe070 "\001", lastbuff=0x7ff99785b3f0, Fb=0x7ff99785b3f0, Tb=0x7ff99785b570, flag=1) at /home/temporary/mariadb-10.2/sql/filesort.cc:1721
#10 0x000055f382de1823 in merge_index (param=0x7ff9a2b79b70, sort_buffer=0x7ff9979fe070 "\001", buffpek=0x7ff99785b3f0, maxbuffer=8, tempfile=0x7ff997845f48, outfile=0x7ff997999270) at /home/temporary/mariadb-10.2/sql/filesort.cc:1827
#11 0x000055f382caf618 in Unique::merge (this=0x7ff997845f18, table=0x7ff997929088, buff=0x7ff9979fe070 "\001", without_last_merge=false) at /home/temporary/mariadb-10.2/sql/uniques.cc:730
#12 0x000055f382caf252 in Unique::walk (this=0x7ff997845f18, table=0x7ff997929088, action=0x55f382ea218a <dump_leaf_key(void*, element_count, void*)>, walk_action_arg=0x7ff997843288) at /home/temporary/mariadb-10.2/sql/uniques.cc:649
#13 0x000055f382ea403b in Item_func_group_concat::val_str (this=0x7ff997843288, str=0x7ff9a2b79d70) at /home/temporary/mariadb-10.2/sql/item_sum.cc:3561
#14 0x000055f382e077dd in Item::send (this=0x7ff997843288, protocol=0x7ff9994e1600, buffer=0x7ff9a2b79d70) at /home/temporary/mariadb-10.2/sql/item.cc:6527
#15 0x000055f382af16bf in Protocol::send_result_set_row (this=0x7ff9994e1600, row_items=0x7ff997844258) at /home/temporary/mariadb-10.2/sql/protocol.cc:914
#16 0x000055f382b6989e in select_send::send_data (this=0x7ff997843e38, items=...) at /home/temporary/mariadb-10.2/sql/sql_class.cc:2822
#17 0x000055f382c15b05 in end_send_group (join=0x7ff997843e58, join_tab=0x7ff997845198, end_of_records=true) at /home/temporary/mariadb-10.2/sql/sql_select.cc:19676
#18 0x000055f382c127d2 in sub_select (join=0x7ff997843e58, join_tab=0x7ff997844e50, end_of_records=true) at /home/temporary/mariadb-10.2/sql/sql_select.cc:18350
#19 0x000055f382c122ac in do_select (join=0x7ff997843e58, fields=0x7ff997844258, table=0x0, procedure=0x0) at /home/temporary/mariadb-10.2/sql/sql_select.cc:18056
#20 0x000055f382bed110 in JOIN::exec_inner (this=0x7ff997843e58) at /home/temporary/mariadb-10.2/sql/sql_select.cc:3225
#21 0x000055f382bea3a3 in JOIN::exec (this=0x7ff997843e58) at /home/temporary/mariadb-10.2/sql/sql_select.cc:2512
#22 0x000055f382bed931 in mysql_select (thd=0x7ff9994e1070, rref_pointer_array=0x7ff9994e5448, tables=0x7ff997843770, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7ff997843e38, unit=0x7ff9994e4ab8, select_lex=0x7ff9994e51b8) at /home/temporary/mariadb-10.2/sql/sql_select.cc:3449
#23 0x000055f382be337b in handle_select (thd=0x7ff9994e1070, lex=0x7ff9994e49f0, result=0x7ff997843e38, setup_tables_done_option=0) at /home/temporary/mariadb-10.2/sql/sql_select.cc:384
#24 0x000055f382bb3085 in execute_sqlcom_select (thd=0x7ff9994e1070, all_tables=0x7ff997843770) at /home/temporary/mariadb-10.2/sql/sql_parse.cc:5923
#25 0x000055f382ba92a0 in mysql_execute_command (thd=0x7ff9994e1070) at /home/temporary/mariadb-10.2/sql/sql_parse.cc:2980
#26 0x000055f382bb67a4 in mysql_parse (thd=0x7ff9994e1070, rawbuf=0x7ff997843088 "select group_concat(distinct a) from t1", length=39, parser_state=0x7ff9a2b7b5e0) at /home/temporary/mariadb-10.2/sql/sql_parse.cc:7339
#27 0x000055f382ba5466 in dispatch_command (command=COM_QUERY, thd=0x7ff9994e1070, packet=0x7ff99b15d071 "", packet_length=39) at /home/temporary/mariadb-10.2/sql/sql_parse.cc:1490
#28 0x000055f382ba41be in do_command (thd=0x7ff9994e1070) at /home/temporary/mariadb-10.2/sql/sql_parse.cc:1109
#29 0x000055f382cdc281 in do_handle_one_connection (thd_arg=0x7ff9994e1070) at /home/temporary/mariadb-10.2/sql/sql_connect.cc:1349
#30 0x000055f382cdbfd0 in handle_one_connection (arg=0x7ff9994e1070) at /home/temporary/mariadb-10.2/sql/sql_connect.cc:1261
#31 0x000055f3833eece8 in pfs_spawn_thread (arg=0x7ff99fc39ef0) at /home/temporary/mariadb-10.2/storage/perfschema/pfs.cc:1860
#32 0x00007ff9a108f6ba in start_thread (arg=0x7ff9a2b7cb00) at pthread_create.c:333
#33 0x00007ff9a073a3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Comment by Varun Gupta (Inactive) [ 2017-08-04 ]

In the function Unique::Unique

max_elements= (ulong) (max_in_memory_size /
                        ALIGN_SIZE(sizeof(TREE_ELEMENT)+size));

 
(lldb) p sizeof(TREE_ELEMENT)+size
(unsigned long) $7 = 1046
(lldb) p size
(uint) $8 = 1022
(lldb) p sizeof(TREE_ELEMENT)
(unsigned long) $9 = 24
(lldb) p max_in_memory_size
(ulonglong) $10 = 1024
(lldb) p max_elements
(ulong) $11 = 0

Comment by Varun Gupta (Inactive) [ 2019-05-02 ]

Patch
http://lists.askmonty.org/pipermail/commits/2017-August/011376.html

Comment by Sergei Golubchik [ 2020-03-05 ]

Another test case. The effect is the opposite, distinct values are removed from the results set. Same root cause:

--source include/have_sequence.inc
create table t1 (a varchar(2000)) as select concat(seq, repeat('.', 1998)) as a from seq_1_to_30;
set @@tmp_memory_table_size=1000, @@max_heap_table_size=1000;
--replace_result '.' ''
select group_concat(distinct a) from t1;
set @@tmp_memory_table_size=default, @@max_heap_table_size=default;
--replace_result '.' ''
select group_concat(distinct a) from t1;
drop table t1;

Comment by Sergei Petrunia [ 2020-06-07 ]

Review input provided over email. Ok to push after it is addressed.

Comment by Varun Gupta (Inactive) [ 2020-06-08 ]

Have made a patch for 10.3 currently, the patch for 10.3 needs extra handling for the LIMIT clause of GROUP_CONCAT which was added in MDEV-11297.

Comment by Varun Gupta (Inactive) [ 2020-06-09 ]

Fix is pushed to 10.3, if needed for earlier versions this can be backported to 10.1 and 10.2 (minor changes are required for that)

Generated at Thu Feb 08 07:50:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.