Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5187

OOM happening when querying large datasets and using distinct

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Won't Fix
    • 22.08.1, 6.4.6
    • N/A
    • N/A
    • ububts 20.04 LTS

    Description

      I set TotalUmMemory to 25%, but ExeMgr use far beyond usage memory.
      Finally ExeMgr was killed by oom-killer.
      I tried two sql on large datasets.

      1. select distinct columnA, columnB, columnC from table.
      2. select columnA, columnB, columnC from table group by columnA, columnB, columnC.

      1. is OOM,
      but 2. is MCS-2003: Aggregation/Distinct memory limit is exceeded.

      I think 1. and 2. are similar query.
      I think that 1. should return same message (MCS-2003: Aggregation/Distinct memory limit is exceeded.)

      Attachments

        Issue Links

          Activity

            drrtuy Roman added a comment - - edited

            Thx for your suggestion. We will look into implementing it.
            JFYI SELECT DISTINCT is not a simple rename for GROUP BY and it is completely different comparing to GROUP BY processing-wise. I suggest you to prefer GROUP BY than DISTINCT.
            Did you try to enable disk-based aggregation to manage with the error message?

            drrtuy Roman added a comment - - edited Thx for your suggestion. We will look into implementing it. JFYI SELECT DISTINCT is not a simple rename for GROUP BY and it is completely different comparing to GROUP BY processing-wise. I suggest you to prefer GROUP BY than DISTINCT. Did you try to enable disk-based aggregation to manage with the error message?
            Atsushi Kugimiya added a comment -

            Thanks for your answer.

            Yes, I did.
            I already done disk-based aggregation. When I use disk-based aggregation, Both sqls can be processed.

            DISTINCT problem is using memory without getting an error.
            I set 25% to TotalUmMemory, but ExeMgr use far beyond usage memory.
            DISTINCT implementation ignores TotalUmMemory.

            I think that DISTINCT implementation should check TotalUmMemory.
            What do you think?

            Atsushi Kugimiya added a comment - Thanks for your answer. Yes, I did. I already done disk-based aggregation. When I use disk-based aggregation, Both sqls can be processed. DISTINCT problem is using memory without getting an error. I set 25% to TotalUmMemory, but ExeMgr use far beyond usage memory. DISTINCT implementation ignores TotalUmMemory. I think that DISTINCT implementation should check TotalUmMemory. What do you think?
            drrtuy Roman added a comment -

            It surely must obey the limitations of TotalUmMemory. I will look into the difference b/w distinct and group by in terms of memory consumption limitation.

            drrtuy Roman added a comment - It surely must obey the limitations of TotalUmMemory. I will look into the difference b/w distinct and group by in terms of memory consumption limitation.
            JIraAutomate JiraAutomate added a comment -

            Automated message:
            ----------------------------
            Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            JIraAutomate JiraAutomate added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            People

              drrtuy Roman
              Atsushi Kugimiya
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.