[MCOL-254]  ColumnStore is limited to two columns in a GROUP BY statement. Created: 2016-07-15  Updated: 2016-08-23  Resolved: 2016-08-23

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: None
Fix Version/s: 1.0.2

Type: Bug Priority: Major
Reporter: David Hill (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Sprint: 1.0.2-2

 Description   

Reported by an alpha customer:

Today’s issue, is that my developer is having some issues with GROUP BY statements, and seems to think that ColumnStore is limited to two columns in a GROUP BY statement. The documentation suggests that there’s a 10-column maximum on this page https://mariadb.com/kb/en/mariadb/columnstore-select/

I get this error:
Internal error: IDB-2021: 'gbd.output_cod_single_year_v265.sex_id' is not in GROUP BY clause. All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.

For the following select statement.

SELECT
measure_id,
location_id,
sex_id,
age_group_id,
cause_id,
metric_id,
MIN(val) AS min,
MAX(val) AS max,
MIN(lower) AS min_ui,
MAX(upper) AS max_ui
FROM output_cod_single_year_v265
WHERE measure_id IN(1)
AND metric_id IN(3)
AND cause_id IN(294)
AND year_id IN(
1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,
1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,
2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,
2010,2011,2012,2013,2014,2015
)
AND location_id IN(
184,183,182,185,187,191,180,181,435,186,189,190,198,179,178,177,176,175,173,172,204,215,170,171,202,169,168,148,147,522,154,139,195,197,194,196,193,218,217,216,214,213,212,211,210,209,206,208,207,203,205,201,200,110,111,105,109,112,115,119,106,116,117,107,114,385,422,132,127,131,130,129,128,126,108,102,349,305,125,113,122,136,118,99,133,123,98,135,121,97,40,41,39,37,36,38,8,68,7,67,6,15,10,20,12,13,18,66,16,11,69,19,14,164,165,142,17,162,161,160,150,145,149,146,151,152,155,157,143,144,153,85,35,77,140,34,33,156,61,48,51,52,63,57,47,62,54,45,90,93,60,59,84,83,95,79,58,78,49,50,46,88,55,82,86,91,53,92,44,74,43,89,87,80,81,94,76,75,72,71,26,30,28,22,351,376,24,25,23,27,29,298,101,141,163
)
AND sex_id IN(3)
AND age_group_id IN(22)
GROUP BY measure_id, location_id, sex_id, age_group_id, cause_id, metric_id

This is somewhat of a showstopper for us, if this is a bug (which I suspect it might be).

The table schema is as follows:

CREATE TABLE `output_cod_single_year_v265` (
`measure_id` int(11) NOT NULL,
`year_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`sex_id` int(11) NOT NULL,
`age_group_id` int(11) NOT NULL,
`cause_id` int(11) NOT NULL,
`metric_id` int(11) NOT NULL,
`val` double NOT NULL,
`upper` double DEFAULT NULL,
`lower` double DEFAULT NULL
) ENGINE=Columnstore



 Comments   
Comment by David Hill (Inactive) [ 2016-07-15 ]

info from D lee

I tried it. It did not work.

this syntax is ok

SELECT
measure_id,
location_id,
sex_id,
age_group_id,
cause_id,
metric_id,
MIN(val) AS min,
MAX(val) AS max,
MIN(lower) AS min_ui,
MAX(upper) AS max_ui
FROM output_cod_single_year_v265
WHERE measure_id IN(1)
GROUP BY measure_id, location_id, sex_id, age_group_id, cause_id, metric_id;

but as soon as I add another column in the where clause, it errors out

MariaDB [mytest]> SELECT
-> measure_id,
-> location_id,
-> sex_id,
-> age_group_id,
-> cause_id,
-> metric_id,
-> MIN(val) AS min,
-> MAX(val) AS max,
-> MIN(lower) AS min_ui,
-> MAX(upper) AS max_ui
-> FROM output_cod_single_year_v265
-> WHERE measure_id IN(1) AND metric_id IN(3)
-> GROUP BY measure_id, location_id, sex_id, age_group_id, cause_id, metric_id;
ERROR 1815 (HY000): Internal error: IDB-2021: 'mytest.output_cod_single_year_v265.metric_id' is not in GROUP BY clause. All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.
MariaDB [mytest]>

Putting () around the columns in the where clause did not help.

Comment by David Hill (Inactive) [ 2016-07-15 ]

customer reported this worked in infinidb

Comment by David Hall (Inactive) [ 2016-08-09 ]

A test shows that this does indeed appear to work in InfiniDB

Comment by David Hall (Inactive) [ 2016-08-09 ]

This similar query works fine against tpch:
select
l_orderkey,
l_partkey,
l_suppkey,
l_linenumber,
l_discount,
l_shipdate,
min(l_quantity) as min,
max(l_quantity) as max,
min(l_extendedprice) as min_ex,
max(l_extendedprice) as max_ex
from lineitem
where l_orderkey=5986562
and l_partkey=181890
and l_suppkey=6927
group by l_orderkey, l_partkey, l_suppkey, l_linenumber, l_discount, l_shipdate;

By paring the where clause in the reporter's query to just one = statement, it works. Adding another = statement breaks it. The IN clauses seem to not be an issue. However this statement with three where conditions works.

By removing just one of the fields from the original query, it works with two where conditions. Seems there's some sort of heterodyning between the number of group by conditions and the number of where conditions.

Comment by David Hall (Inactive) [ 2016-08-18 ]

The optimizer sees items in the where clause that are basically compares against a constant – IN with one value is treated as a constant. It then decides that having a group by clause for this field is a waste of time and optimizes it out. Columnstore doesn't know it got optimized out and complains about the lack of a group by for that field.

WHERE measure_id IN(1)
and
WHERE measure_id = 1

Are both optimized out and cause the issue.

WHERE measure_id IN(1,2)
would not be optimized out and no problem would occur.

Comment by David Hall (Inactive) [ 2016-08-19 ]

During the changes from mysql 5.1.73 and mariadb 10.1, the optimizer was changed – an added call to remove_const was added. remove_const is a function written for ORDER BY and was repurposed for GROUP BY. It's supposed to remove any duplicate phrases and optimize out any GROUP BY that have a corresponding constant WHERE clause.

The Columnstore Engine edits to be sure there is a GROUP BY for every non aggregated return column. I don't know what happens if this edit is taken out. InfiniDB put a 'if !InfiniDB' statement around the previously existing call to remove_const, so I added such a check to the new call.

Comment by Andrew Hutchings (Inactive) [ 2016-08-19 ]

Looks good to me, approved. Also good catch with the README fix.

Moving to Daniel for QA

Comment by Dipti Joshi (Inactive) [ 2016-08-23 ]

dleeyh Has this been QAed and added to AutoPilot ?

Comment by Daniel Lee (Inactive) [ 2016-08-23 ]

Verified the query from the user.

Generated at Thu Feb 08 02:19:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.