[MCOL-5153] Disk-based aggregation fails with ERROR 1815 (HY000): Internal error: TupleAggregateStep::threadedAggregateRowGroups()[24] MCS-2054: Unknown error while aggregation. (part 1) Created: 2022-07-07  Updated: 2022-10-26  Resolved: 2022-08-17

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 6.2.3, 6.3.1, 6.4.1
Fix Version/s: 22.08.1, 6.4.4-dompe

Type: Bug Priority: Major
Reporter: Roman Assignee: Alexey Antipovsky (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MCOL-5199 Follow-up for the hash calculation pe... Closed
Sprint: 2021-17
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

The aggregation on VARCHAR(128) column(number of distinct values is aproximately 31 bln) fails with an obscure error.

ERROR 1815 (HY000): Internal error: TupleAggregateStep::threadedAggregateRowGroups()[24] MCS-2054: Unknown error while aggregation.  

The current implementation of RowAggStorage::increaseSize() can raise RowAggStorage::Data::fMask 4 times before rehashing happens. The guarding check in increaseSize() is too restrictive and fails easily with big numbers in fCurData->fMask and fCurData->fSize(see RowAggStorage::increaseSize() for details).

The suggested solution is to increase the multiplier in the expression:

if (fCurData->fSize * 2 < calcMaxSize(fCurData->fMask + 1))



 Comments   
Comment by Roman [ 2022-07-07 ]

Plz review.

Comment by Roman [ 2022-07-09 ]

4QA I have seen it in the wild on a beefy hardware with 1.5 TB RAM with S3-based cluster on NVME. The issue happens with aggregation on VARCHAR(30) column when the number of DISTINCT values equals to 31 bln.
With the data I have the reproduction can be to aggregate on 5 mln distinct VARCHAR(30) values, e.g. SELECT c1 FROM t1 GROUP BY c1, where c1 is the mentioned VARCHAR(30) with 5 mln distinct values.

Comment by Daniel Lee (Inactive) [ 2022-07-18 ]

Build tested: 6.4.2-1 (Jenkins build bb-10.6.8-4-cs-6.4.2-1)

storage: local
3PM cluster, with 30gb memory in each node.
Dataset size: 10g, lineitem, l_comment is archer(44)
row = 59,986,052 (close to 60 millions)
distinct rows = 19,439,546 rows (19 millions)
query: select l_comment from lineitem group by l_comment;

With disk-join disabled, the query would run out of memory.
With disk-join enabled, the query executed successfully.

Also tested "select count from lineitem, orders where l_orderkey = o_orderkey" on 100gb, 200gb, and 300gb datasets. All succeeded.

Is this test for S3 only? or local storage would be sufficient?

Comment by Roman [ 2022-08-04 ]

Another iteration on the disk-based aggregation code. This try replaces MariaDB collation aware hashing with a combination of strnxfrm(converts bytes array into collation aware weights array) + MM3 byte array hash. There is also an optimization borrowed from Robin Hood that is triggered when RowStorage::increaseSize() is called when there are plenty of space available in the current fCurData w/o taking more RAM(see the patch for the details).

Comment by Daniel Lee (Inactive) [ 2022-08-10 ]

Build tested: 22.08-1 (#5243)

Executed the same 300gb DBT3 database above successfully.

Generated at Thu Feb 08 02:55:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.