Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5153

Disk-based aggregation fails with ERROR 1815 (HY000): Internal error: TupleAggregateStep::threadedAggregateRowGroups()[24] MCS-2054: Unknown error while aggregation. (part 1)

Details

    • 2021-17

    Description

      The aggregation on VARCHAR(128) column(number of distinct values is aproximately 31 bln) fails with an obscure error.

      ERROR 1815 (HY000): Internal error: TupleAggregateStep::threadedAggregateRowGroups()[24] MCS-2054: Unknown error while aggregation.  
      

      The current implementation of RowAggStorage::increaseSize() can raise RowAggStorage::Data::fMask 4 times before rehashing happens. The guarding check in increaseSize() is too restrictive and fails easily with big numbers in fCurData->fMask and fCurData->fSize(see RowAggStorage::increaseSize() for details).

      The suggested solution is to increase the multiplier in the expression:

      if (fCurData->fSize * 2 < calcMaxSize(fCurData->fMask + 1))
      

      Attachments

        Issue Links

          Activity

            drrtuy Roman added a comment -

            Plz review.

            drrtuy Roman added a comment - Plz review.
            drrtuy Roman added a comment - - edited

            4QA I have seen it in the wild on a beefy hardware with 1.5 TB RAM with S3-based cluster on NVME. The issue happens with aggregation on VARCHAR(30) column when the number of DISTINCT values equals to 31 bln.
            With the data I have the reproduction can be to aggregate on 5 mln distinct VARCHAR(30) values, e.g. SELECT c1 FROM t1 GROUP BY c1, where c1 is the mentioned VARCHAR(30) with 5 mln distinct values.

            drrtuy Roman added a comment - - edited 4QA I have seen it in the wild on a beefy hardware with 1.5 TB RAM with S3-based cluster on NVME. The issue happens with aggregation on VARCHAR(30) column when the number of DISTINCT values equals to 31 bln. With the data I have the reproduction can be to aggregate on 5 mln distinct VARCHAR(30) values, e.g. SELECT c1 FROM t1 GROUP BY c1, where c1 is the mentioned VARCHAR(30) with 5 mln distinct values.
            dleeyh Daniel Lee (Inactive) added a comment - - edited

            Build tested: 6.4.2-1 (Jenkins build bb-10.6.8-4-cs-6.4.2-1)

            storage: local
            3PM cluster, with 30gb memory in each node.
            Dataset size: 10g, lineitem, l_comment is archer(44)
            row = 59,986,052 (close to 60 millions)
            distinct rows = 19,439,546 rows (19 millions)
            query: select l_comment from lineitem group by l_comment;

            With disk-join disabled, the query would run out of memory.
            With disk-join enabled, the query executed successfully.

            Also tested "select count from lineitem, orders where l_orderkey = o_orderkey" on 100gb, 200gb, and 300gb datasets. All succeeded.

            Is this test for S3 only? or local storage would be sufficient?

            dleeyh Daniel Lee (Inactive) added a comment - - edited Build tested: 6.4.2-1 (Jenkins build bb-10.6.8-4-cs-6.4.2-1) storage: local 3PM cluster, with 30gb memory in each node. Dataset size: 10g, lineitem, l_comment is archer(44) row = 59,986,052 (close to 60 millions) distinct rows = 19,439,546 rows (19 millions) query: select l_comment from lineitem group by l_comment; With disk-join disabled, the query would run out of memory. With disk-join enabled, the query executed successfully. Also tested "select count from lineitem, orders where l_orderkey = o_orderkey" on 100gb, 200gb, and 300gb datasets. All succeeded. Is this test for S3 only? or local storage would be sufficient?
            drrtuy Roman added a comment -

            Another iteration on the disk-based aggregation code. This try replaces MariaDB collation aware hashing with a combination of strnxfrm(converts bytes array into collation aware weights array) + MM3 byte array hash. There is also an optimization borrowed from Robin Hood that is triggered when RowStorage::increaseSize() is called when there are plenty of space available in the current fCurData w/o taking more RAM(see the patch for the details).

            drrtuy Roman added a comment - Another iteration on the disk-based aggregation code. This try replaces MariaDB collation aware hashing with a combination of strnxfrm(converts bytes array into collation aware weights array) + MM3 byte array hash. There is also an optimization borrowed from Robin Hood that is triggered when RowStorage::increaseSize() is called when there are plenty of space available in the current fCurData w/o taking more RAM(see the patch for the details).

            Build tested: 22.08-1 (#5243)

            Executed the same 300gb DBT3 database above successfully.

            dleeyh Daniel Lee (Inactive) added a comment - Build tested: 22.08-1 (#5243) Executed the same 300gb DBT3 database above successfully.

            People

              alexey.antipovsky Alexey Antipovsky
              drrtuy Roman
              Daniel Lee Daniel Lee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.