Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25736

ROW_NUMBER() returns wrong results for query with distinct count and grouping.

    XMLWordPrintable

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
    • 10.5, 10.6
    • Ubuntu 18.04

    Description

      So I didn't pay attention using a query builder, and accidentally ended up with a query like this, with a COUNT() and a GROUP BY that doesn't make too much sense, as articles is a one-to-one relation:

      SELECT
        comments.id,
        comments.article_id,
        comments.votes,
        COUNT(DISTINCT articles.id) AS aggregated,
        ROW_NUMBER() OVER (
          PARTITION BY
            comments.article_id
          ORDER BY
            comments.votes DESC
          ) AS row_num
      FROM
        comments
        LEFT JOIN articles ON articles.id = comments.article_id
      WHERE
        comments.article_id IN (1, 2)
      GROUP BY
        comments.id,
        articles.id;
      

      Certainly grouping on articles.id, and counting the articles won't really do anything. However, while checking why I was receiving the wrong counts, I also noticed that somehow the row numbering was wrong for the results that I was receiving, for the above query it looked like this:

      id article_id votes aggregated row_num
      1 1 1 1 3
      2 1 2 1 2
      3 1 3 1 1
      4 1 4 1 1
      5 2 10 1 2
      6 2 9 1 3
      7 2 8 1 4
      8 2 7 1 4

      The row number is clearly off, it should contain 1,2,3,4 (not in that order) for both article id partitions that consist of only unique votes values, instead for article id 1 it contains 1,1,2,3, and for article id 2 it's 2,3,4,4. Cross checking with MySQL, I do receive the expected results there.

      The problem goes away in various situations, for example when not grouping on articles.id, when counting without DISTINCT, when using a different aggregate function like for example MIN(), or when ordering on anything but comments.id, even ordering on the otherwise erroneous row number will fix it, which is super double extra weird.

      Here's some data and queries to reproduce it (and a fiddle):

      CREATE TABLE articles (
          id INTEGER PRIMARY KEY
      );
       
      CREATE TABLE comments (
          id INTEGER PRIMARY KEY,
          article_id INTEGER,
          votes INTEGER
      );
       
      INSERT INTO articles (id) 
      VALUES 
        (1), 
        (2);
       
      INSERT INTO comments (id, article_id, votes)
      VALUES 
        (1, 1, 1), 
        (2, 1, 2), 
        (3, 1, 3), 
        (4, 1, 4), 
        (5, 2, 10), 
        (6, 2, 9), 
        (7, 2, 8), 
        (8, 2, 7);
       
      -- wrong row numbers
      SELECT
        comments.id,
        comments.article_id,
        comments.votes,
        COUNT(DISTINCT articles.id) AS aggregated,
        ROW_NUMBER() OVER (
          PARTITION BY
            comments.article_id
          ORDER BY
            comments.votes DESC
          ) AS row_num
      FROM
        comments
        LEFT JOIN articles ON articles.id = comments.article_id
      WHERE
        comments.article_id IN (1, 2)
      GROUP BY
        comments.id,
        articles.id;
       
      -- correct row numbers without grouping on articles.id
      SELECT
        comments.id,
        comments.article_id,
        comments.votes,
        COUNT(DISTINCT articles.id) AS aggregated,
        ROW_NUMBER() OVER (
          PARTITION BY
            comments.article_id
          ORDER BY
            comments.votes DESC
          ) AS row_num
      FROM
        comments
        LEFT JOIN articles ON articles.id = comments.article_id
      WHERE
        comments.article_id IN (1, 2)
      GROUP BY
        comments.id;
       
      -- correct row numbers with non-distinct counting
      SELECT
        comments.id,
        comments.article_id,
        comments.votes,
        COUNT(articles.id) AS aggregated,
        ROW_NUMBER() OVER (
          PARTITION BY
            comments.article_id
          ORDER BY
            comments.votes DESC
          ) AS row_num
      FROM
        comments
        LEFT JOIN articles ON articles.id = comments.article_id
      WHERE
        comments.article_id IN (1, 2)
      GROUP BY
        comments.id,
        articles.id;
       
      -- correct row numbers with min value aggregation instead of counting
      SELECT
        comments.id,
        comments.article_id,
        comments.votes,
        MIN(DISTINCT articles.id) AS aggregated,
        ROW_NUMBER() OVER (
          PARTITION BY
            comments.article_id
          ORDER BY
            comments.votes DESC
          ) AS row_num
      FROM
        comments
        LEFT JOIN articles ON articles.id = comments.article_id
      WHERE
        comments.article_id IN (1, 2)
      GROUP BY
        comments.id,
        articles.id;
       
      -- correct row numbers whith order on the otherwise wrong row number
      SELECT
        comments.id,
        comments.article_id,
        comments.votes,
        COUNT(DISTINCT articles.id) AS aggregated,
        ROW_NUMBER() OVER (
          PARTITION BY
            comments.article_id
          ORDER BY
            comments.votes DESC
          ) AS row_num
      FROM
        comments
        LEFT JOIN articles ON articles.id = comments.article_id
      WHERE
        comments.article_id IN (1, 2)
      GROUP BY
        comments.id,
        articles.id
      ORDER BY
        row_num ASC;
      

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            odtl Oliver
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.