[MCOL-890] group_contact returns garbage Created: 2017-08-24  Updated: 2020-08-25  Resolved: 2017-09-01

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 1.0.10
Fix Version/s: 1.0.12, 1.1.0

Type: Bug Priority: Major
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 2
Labels: None


 Description   

The following returns garbage:
CREATE TABLE `phased_info` (
`animalid` varchar(18) DEFAULT NULL,
`chipid` tinyint(4) DEFAULT NULL,
`sha1sum` varchar(40) DEFAULT NULL
) ENGINE=Columnstore DEFAULT CHARSET=utf8;

// Default CHARSET=latin1 has same behavior

insert into phased_info values ('HO840F124135526679', 1, '1');
insert into phased_info values ('HO840F124135526679', 1, '2');
insert into phased_info values ('HO840F124135526679', 1, '3');
insert into phased_info values ('HO840F124135526679', 2, '1');
insert into phased_info values ('HO840F124135526679', 2, '2');
insert into phased_info values ('HO840F124135526679', 2, '3');

select group_concat(animalid),count(animalid) as ct from phased_info group by sha1sum;
--------------------------+

group_concat(animalid) ct

--------------------------+

,, 3
,,, 4
,,,, 5

--------------------------+
3 rows in set (0.15 sec)

However:
select group_concat(cast(animalid as char)),count(animalid) as ct from phased_info group by sha1sum;
--------------------------------------------------------------------------------------------------+

group_concat(cast(animalid as char)) ct

--------------------------------------------------------------------------------------------------+

HOUSAM124135526669,HO840F124135526679,HO840F124135526679 3
HOUSAM124135526669,HOUSAM124135526669,HO840F124135526679,HO840F124135526679 4
HOUSAM124135526669,HOUSAM124135526669,HOUSAM124135526669,HO840F124135526679,HO840F124135526679 5

--------------------------------------------------------------------------------------------------+
3 rows in set (0.18 sec)

and
select group_concat(animalid),count(sha1sum) as ct from phased_info group by sha1sum;--------------------------------------------------------------------------------------------------+

group_concat(animalid) ct

--------------------------------------------------------------------------------------------------+

HOUSAM124135526669,HO840F124135526679,HO840F124135526679 3
HOUSAM124135526669,HOUSAM124135526669,HO840F124135526679,HO840F124135526679 4
HOUSAM124135526669,HOUSAM124135526669,HOUSAM124135526669,HO840F124135526679,HO840F124135526679 5

--------------------------------------------------------------------------------------------------+
3 rows in set (0.06 sec)

So casting the column before group_concat, OR not using the same column in another aggregate, seems to fix it.



 Comments   
Comment by David Hall (Inactive) [ 2017-08-25 ]

GROUP_CONCAT is not like any other aggregate function. It may have its own internal ORDER BY clause. In the columnstore code, GROUP_CONCAT is handled via very specific code just for it.

Columnstore only retrieves any given row once, regardless of the number of functions that use it.

In joblistfactory, around line 975, there's optimization logic to turn off dictionary lookup for count fields, the thought being that the time required is a waste as count doesn't care about the contents. There's also logic to turn the lookup on if any other function, field or where clause uses the column. Unfortunately, since GROUP_CONCAT is handled differently, the generic logic never turns the dictionary lookup back on for the fields involved. This results in the string tokens, rather than the strings, being concatenated,

Comment by David Hall (Inactive) [ 2017-08-29 ]

Git pull requests 245 and 246

Comment by Daniel Lee (Inactive) [ 2017-09-01 ]

Build verified: 1.1.0 GitHub source

/root/columnstore/mariadb-columnstore-server
commit 6ed33d194819aaa5f2521c888639f44546fb7ce2
Merge: 97284ea 770537e
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Thu Aug 3 20:54:13 2017 +0100

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 44ea79d49c0354a5dd1d5c97d95ad2b8f366bc8b
Author: david hill <david.hill@mariadb.com>
Date: Thu Aug 31 11:34:17 2017 -0500

Comment by Daniel Lee (Inactive) [ 2017-09-01 ]

Build verified: 1.0.12 source

/root/columnstore/mariadb-columnstore-server
commit 2965fc8e72f9b2a28d43ee2e700bc430ceae038e
Author: david hill <david.hill@mariadb.com>
Date: Mon Aug 28 13:40:46 2017 -0500

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit a6ae09dedb24872e3a9a19f76b87ebfef7c856cc
Author: david hill <david.hill@mariadb.com>
Date: Fri Sep 1 14:45:26 2017 -0500

Generated at Thu Feb 08 02:24:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.