[MDEV-25736] ROW_NUMBER() returns wrong results for query with distinct count and grouping. Created: 2021-05-20  Updated: 2023-11-28

Status: Confirmed
Project: MariaDB Server
Component/s: Optimizer - Window functions
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Oliver Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: not-11.0-sel
Environment:

Ubuntu 18.04



 Description   

So I didn't pay attention using a query builder, and accidentally ended up with a query like this, with a COUNT() and a GROUP BY that doesn't make too much sense, as articles is a one-to-one relation:

SELECT
  comments.id,
  comments.article_id,
  comments.votes,
  COUNT(DISTINCT articles.id) AS aggregated,
  ROW_NUMBER() OVER (
    PARTITION BY
      comments.article_id
    ORDER BY
      comments.votes DESC
    ) AS row_num
FROM
  comments
  LEFT JOIN articles ON articles.id = comments.article_id
WHERE
  comments.article_id IN (1, 2)
GROUP BY
  comments.id,
  articles.id;

Certainly grouping on articles.id, and counting the articles won't really do anything. However, while checking why I was receiving the wrong counts, I also noticed that somehow the row numbering was wrong for the results that I was receiving, for the above query it looked like this:

id article_id votes aggregated row_num
1 1 1 1 3
2 1 2 1 2
3 1 3 1 1
4 1 4 1 1
5 2 10 1 2
6 2 9 1 3
7 2 8 1 4
8 2 7 1 4

The row number is clearly off, it should contain 1,2,3,4 (not in that order) for both article id partitions that consist of only unique votes values, instead for article id 1 it contains 1,1,2,3, and for article id 2 it's 2,3,4,4. Cross checking with MySQL, I do receive the expected results there.

The problem goes away in various situations, for example when not grouping on articles.id, when counting without DISTINCT, when using a different aggregate function like for example MIN(), or when ordering on anything but comments.id, even ordering on the otherwise erroneous row number will fix it, which is super double extra weird.

Here's some data and queries to reproduce it (and a fiddle):

CREATE TABLE articles (
    id INTEGER PRIMARY KEY
);
 
CREATE TABLE comments (
    id INTEGER PRIMARY KEY,
    article_id INTEGER,
    votes INTEGER
);
 
INSERT INTO articles (id) 
VALUES 
  (1), 
  (2);
 
INSERT INTO comments (id, article_id, votes)
VALUES 
  (1, 1, 1), 
  (2, 1, 2), 
  (3, 1, 3), 
  (4, 1, 4), 
  (5, 2, 10), 
  (6, 2, 9), 
  (7, 2, 8), 
  (8, 2, 7);
 
-- wrong row numbers
SELECT
  comments.id,
  comments.article_id,
  comments.votes,
  COUNT(DISTINCT articles.id) AS aggregated,
  ROW_NUMBER() OVER (
    PARTITION BY
      comments.article_id
    ORDER BY
      comments.votes DESC
    ) AS row_num
FROM
  comments
  LEFT JOIN articles ON articles.id = comments.article_id
WHERE
  comments.article_id IN (1, 2)
GROUP BY
  comments.id,
  articles.id;
 
-- correct row numbers without grouping on articles.id
SELECT
  comments.id,
  comments.article_id,
  comments.votes,
  COUNT(DISTINCT articles.id) AS aggregated,
  ROW_NUMBER() OVER (
    PARTITION BY
      comments.article_id
    ORDER BY
      comments.votes DESC
    ) AS row_num
FROM
  comments
  LEFT JOIN articles ON articles.id = comments.article_id
WHERE
  comments.article_id IN (1, 2)
GROUP BY
  comments.id;
 
-- correct row numbers with non-distinct counting
SELECT
  comments.id,
  comments.article_id,
  comments.votes,
  COUNT(articles.id) AS aggregated,
  ROW_NUMBER() OVER (
    PARTITION BY
      comments.article_id
    ORDER BY
      comments.votes DESC
    ) AS row_num
FROM
  comments
  LEFT JOIN articles ON articles.id = comments.article_id
WHERE
  comments.article_id IN (1, 2)
GROUP BY
  comments.id,
  articles.id;
 
-- correct row numbers with min value aggregation instead of counting
SELECT
  comments.id,
  comments.article_id,
  comments.votes,
  MIN(DISTINCT articles.id) AS aggregated,
  ROW_NUMBER() OVER (
    PARTITION BY
      comments.article_id
    ORDER BY
      comments.votes DESC
    ) AS row_num
FROM
  comments
  LEFT JOIN articles ON articles.id = comments.article_id
WHERE
  comments.article_id IN (1, 2)
GROUP BY
  comments.id,
  articles.id;
 
-- correct row numbers whith order on the otherwise wrong row number
SELECT
  comments.id,
  comments.article_id,
  comments.votes,
  COUNT(DISTINCT articles.id) AS aggregated,
  ROW_NUMBER() OVER (
    PARTITION BY
      comments.article_id
    ORDER BY
      comments.votes DESC
    ) AS row_num
FROM
  comments
  LEFT JOIN articles ON articles.id = comments.article_id
WHERE
  comments.article_id IN (1, 2)
GROUP BY
  comments.id,
  articles.id
ORDER BY
  row_num ASC;



 Comments   
Comment by Alice Sherepa [ 2021-05-20 ]

Thank you for the report and the test case!
I repeated as described on 10.2-10.5, also reproducible with sum, avg, group_concat:

CREATE TABLE t1 ( id int primary key, article_id int, votes int);
INSERT INTO t1 VALUES  
	(1, 1, 1), (2, 1, 2), (3, 1, 3), (4, 1, 4), 
	(5, 2, 10),(6, 2, 9), (7, 2, 8), (8, 2, 7);
 
CREATE TABLE t2 (id int primary key);
INSERT INTO t2 (id) VALUES (1), (2);
 
SELECT avg(DISTINCT t2.id),
       row_number() over ( partition BY t1.article_id ORDER BY t1.votes)
FROM t1 JOIN t2 ON t2.id = t1.article_id
GROUP BY t1.id, t2.id;
 
SELECT group_concat(t2.id),   ##even without DISTINCT
       row_number() over (partition BY t1.article_id ORDER BY t1.votes)
FROM t1 JOIN t2 ON t2.id = t1.article_id
GROUP BY t1.id, t2.id;

MariaDB [test]> SELECT avg(DISTINCT t2.id),
    ->        row_number() over ( partition BY t1.article_id ORDER BY t1.votes)
    -> FROM t1 JOIN t2 ON t2.id = t1.article_id
    -> GROUP BY t1.id, t2.id;
+---------------------+-------------------------------------------------------------------+
| avg(DISTINCT t2.id) | row_number() over ( partition BY t1.article_id ORDER BY t1.votes) |
+---------------------+-------------------------------------------------------------------+
|              1.0000 |                                                                 2 |
|              1.0000 |                                                                 3 |
|              1.0000 |                                                                 4 |
|              1.0000 |                                                                 4 |
|              2.0000 |                                                                 3 |
|              2.0000 |                                                                 2 |
|              2.0000 |                                                                 1 |
|              2.0000 |                                                                 1 |
+---------------------+-------------------------------------------------------------------+
8 rows in set (0.002 sec)
 
MariaDB [test]> SELECT group_concat(t2.id),
    ->        row_number() over (partition BY t1.article_id ORDER BY t1.votes)
    -> FROM t1 JOIN t2 ON t2.id = t1.article_id
    -> GROUP BY t1.id, t2.id;
+---------------------+------------------------------------------------------------------+
| group_concat(t2.id) | row_number() over (partition BY t1.article_id ORDER BY t1.votes) |
+---------------------+------------------------------------------------------------------+
| 1                   |                                                                2 |
| 1                   |                                                                3 |
| 1                   |                                                                4 |
| 1                   |                                                                4 |
| 2                   |                                                                3 |
| 2                   |                                                                2 |
| 2                   |                                                                1 |
| 2                   |                                                                1 |
+---------------------+------------------------------------------------------------------+
8 rows in set (0.001 sec)

Generated at Thu Feb 08 09:39:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.