Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5250

Disk-based DISTINCT

    XMLWordPrintable

Details

    • 2021-17, 2022-22, 2022-23, 2023-4, 2023-5, 2023-6, 2023-7, 2023-8, 2023-10, 2023-11, 2025-6, 2025-8, 2025-9

    Description

      As of 22.08.01 MCS does DISTINCT processing TupleAnnexStep. This step leverages hashmap for the purpose. This solution is simple but it:

      • lacks scalability
      • can't leverage disk-based capabilities of RowStorage class used by GROUP BY
      • ResourceManager that accounts RAM consumption doesn't counts the hashmap

      This issue is about a new DISTINCT implementation(presumably based on RowStorage) that:

      • can do external DISTINCT spilling on disk if necessary,
      • ResourceManager counts the implemenation RAM consumption
      • scales(this might be tricky since DISTINCT processing overlaps with ORDER BY)

      sprint 2025-7:

      • continue testing/ confirming edge cases

      Attachments

        1. 2.png
          89 kB
          Aleksei Bukhalov
        2. 1.png
          95 kB
          Aleksei Bukhalov
        3. 4.png
          83 kB
          Aleksei Bukhalov

        Issue Links

          Activity

            People

              alexey.antipovsky Aleksei Antipovskii
              drrtuy Roman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.