[MCOL-4686] count(DINSTINCT) with a large number of distinct values runs forever Created: 2021-04-20  Updated: 2021-06-29  Resolved: 2021-06-29

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 6.1.1
Fix Version/s: 6.1.1

Type: Bug Priority: Blocker
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Relates
relates to MCOL-4759 DISTINCT and UNION performance degrad... Closed
Sprint: 2021-8, 2021-9

 Description   

When attempting test200.sh, the test never finished.
The query

select count(distinct c1) as count2 from test200 where c1 <= 139000000;

never returns. It takes around 3 minutes in 5.6.1.
A little investigation shows an internal deadlock in fifo.h.

Additionally, there appears to be a slowdown in lesser counts:

select count(distinct c1) as count2 from test200 where c1 <= 13900000;

takes almost 3 minutes in 6.1.1, but 15 seconds in 5.6.1.



 Comments   
Comment by David Hall (Inactive) [ 2021-04-20 ]

The table is created as part of test200.sh. To create the table,

For any database <db>
use <db>
create table test200(c1 int, c2 bigint, c3 varchar(1000))engine=columnstore;

Then run
echo "dummy" | awk -v rows=$rows '

{for(i=1;i<=rows;i++)print i "|" i "|" i}

' | cpimport <db> test200

Comment by Roman [ 2021-06-28 ]

test200.log converges at my VM for a reasonable time. It produces a log that is different comparing with the ref, some examples that went over 4G limit previously now don't throw an error and converge however there are two queries that previously run with 4G RAM limit but now don't.
Here is the diff b/ the ref and actual logs.

root@drrtuy-u20-3:/data/regr/mysql/queries/nightly/alltest/test200# diff -u  ./memLimits.sql.ref.log  ./memLimits.sql.log 
--- ./memLimits.sql.ref.log	2021-05-20 10:27:33.412761826 +0000
+++ ./memLimits.sql.log	2021-06-28 12:11:09.586385778 +0000
@@ -1,10 +1,9 @@
 count1
 139000000
-count2
-139000000
+ERROR 1815 (HY000) at line 6: Internal error: TupleAggregateStep::threadedAggregateRowGroups()[1] IDB-2003: Aggregation/Distinct memory limit is exceeded.
 sleep(30)
 0
-ERROR 1815 (HY000) at line 10: Internal error: TupleAggregateStep::threadedAggregateRowGroups() IDB-2003: Aggregation/Distinct memory limit is exceeded.
+ERROR 1815 (HY000) at line 10: Internal error: TupleAggregateStep::threadedAggregateRowGroups()[2] IDB-2003: Aggregation/Distinct memory limit is exceeded.
 sleep(30)
 0
 count4
@@ -13,7 +12,8 @@
 104579000
 sleep(30)
 0
-ERROR 1815 (HY000) at line 19: Internal error: IDB-2001: Join or subselect exceeds memory limit.
+count6
+104579125
 sleep(30)
 0
 count7
@@ -22,7 +22,8 @@
 92276432
 sleep(30)
 0
-ERROR 1815 (HY000) at line 28: Internal error: IDB-2001: Join or subselect exceeds memory limit.
+count9
+92276450
 sleep(30)
 0
 count10
@@ -31,7 +32,8 @@
 136000000
 sleep(30)
 0
-ERROR 1815 (HY000) at line 37: Internal error: IDB-2002: Union memory limit exceeded.
+count12
+140000000
 sleep(60)
 0
 count13
@@ -40,10 +42,10 @@
 1	68000000
 sleep(30)
 0
-ERROR 1815 (HY000) at line 47: Internal error: TupleAggregateStep::threadedAggregateRowGroups() IDB-2003: Aggregation/Distinct memory limit is exceeded.
+F	count15
+1	70000000
 sleep(30)
 0
 ERROR 1815 (HY000) at line 55: Internal error: CAL0002: Update Failed:  IDB-2001: Join or subselect exceeds memory limit.
-count16
-43210010
+ERROR 1815 (HY000) at line 58: Internal error: IDB-2001: Join or subselect exceeds memory limit.
 ERROR 1815 (HY000) at line 61: Internal error: IDB-2001: Join or subselect exceeds memory limit.

The offending queries are:

select count(distinct c1) as count2 from test200 where c1 <= 139000000;
select count(*) as count16 from test200 a join test200 b using (c3) where a.c1 <= 43210010;

Comment by Roman [ 2021-06-28 ]

We need to redefine the test ref b/c of the changes in Aggregate code(MCOL-563). Currently used hashmap implementation roughly doubles its mem allocation. This allocation model contradicts with that used previously so some queries triggers mem allocation and the allocated mem goes over the 4G limit.

Comment by David Hall (Inactive) [ 2021-06-28 ]

QA: This appears to be fixed by MCOL-4759

Comment by Daniel Lee (Inactive) [ 2021-06-29 ]

Build verified: 6.1.1 ( Drone #2669 )

Executed the test on a 200000000 rows table. The query finished in 1 min 4.502 seconds. Timing is comparable to that in 5.6.1.

Generated at Thu Feb 08 02:52:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.