[MDEV-15092] Ordering by aggregate expression sometimes works incorrectly Created: 2018-01-26  Updated: 2020-12-01

Status: Confirmed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1, 10.2
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Pavel Goran Assignee: Vicențiu Ciorbaru
Resolution: Unresolved Votes: 2
Labels: None
Environment:

Nixos
dbfiddle.uk


Issue Links:
Duplicate
is duplicated by MDEV-15381 ORDER BY datediff() reversed with unn... Closed

 Description   

http://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=cacfe3a36b241a9379faf72286bb0531 - basically, it has everything that's needed.

Long story:

When profiling a complex query on a copy of production database, I found out that the ORDER BY clause doesn't always work properly with an expression that contains aggregate functions.

I used MariaDB version 10.1.28, on Nixos Linux. The sample query below exhibits the bug on MariaDB 10.1.28 and on dbfiddle.uk (which runs MariaDB 10.2.12). MySQL 5.6.38 (which I run on Gentoo Linux) doesn't have this bug (at least, the sample query doesn't exhibit it).

Here is some sample data that demonstrate the issue:

CREATE TABLE `t0` (
  `a` int(11) NOT NULL,
  `x` int(11) NOT NULL,
  `y` int(11) NOT NULL
) ENGINE=InnoDB;
INSERT INTO `t0` VALUES (1,40,9),(2,3,12),(3,14,14),(4,1,2),(10,23,1),(20,15,10),(30,8,2),(40,100,12),(1,40,9),(2,3,12),(3,14,14),(4,1,2),(10,23,1),(20,15,10),(30,8,2),(40,100,12);
 
CREATE TABLE `t1` (
  `a` int(11) NOT NULL,
  `b` int(11) NOT NULL
) ENGINE=InnoDB;
INSERT INTO `t1` VALUES (10,5),(10,8),(20,5),(10,5),(10,7),(10,8),(20,5),(20,8),(30,9),(30,5),(40,9),(40,11),(40,8),(1,5),(1,7),(1,8),(2,5),(2,8),(3,9),(3,5),(4,9),(4,11),(4,8);

The problematic query is as follows:

select t0.a, sum(x + y * b)/count(distinct b) + x as mysum
from t0 inner join t1 on t0.a=t1.a
group by a order by (sum(x + y * b)/count(distinct b) + x) desc;

It produces the following result:

a	mysum
40	524.0000
20	240.0000
3	238.0000
1	240.0000
2	165.0000
10	121.6667
30	52.0000
4	40.3333

So, despite the query being sorted by the expression that's used to calculate the "mysum" column, the rows aren't listed in correct order. If I replace the ORDER BY expression with "mysum", the order becomes correct.

The ORDER BY expression in the original query (for production data) looks like this: "r.value1 + sum(ifnull(e.value2 * (e.value3 + e.value), 0.0))". That query also starts working properly once I replace the ORDER BY expression with an alias.

Also, both queries seem to work fine if I remove the non-aggregate part ("x" in sample query and "r.value1" in production query).



 Comments   
Comment by Alice Sherepa [ 2018-01-29 ]

Thanks for the report!
The problem is reproducible as described on MariaDB 10.1-10.3, not on 5.5, 10.0

CREATE TABLE t0 (a int,x int,y int);
INSERT INTO t0 VALUES (1,40,9),(3,14,14),(1,40,9),(3,14,14);
CREATE TABLE t1 (a int,b int);
INSERT INTO t1 VALUES (1,5),(1,7),(1,8),(3,9),(3,5);
 
SELECT x+sum(x + y * b)/count(b) AS mysum,t0.a
FROM t0 NATURAL JOIN t1
GROUP BY a
ORDER BY (x+sum(x + y * b)/count(b));
+----------+------+
| mysum    | a    |
+----------+------+
| 140.0000 |    1 |
| 126.0000 |    3 |
+----------+------+
2 rows in set (0.001 sec)
 
SELECT min(x)+sum(x + y * b)/count(b) AS mysum,
       t0.a
FROM t0 NATURAL JOIN t1
GROUP BY a
ORDER BY (min(x)+sum(x + y * b)/count(b));
+----------+------+
| mysum    | a    |
+----------+------+
| 126.0000 |    3 |
| 140.0000 |    1 |
+----------+------+
2 rows in set (0.001 sec)

Comment by Alice Sherepa [ 2018-02-22 ]

Please, after fixing this, check also the case from MDEV-15381, if it is also fixed

CREATE TABLE Projects (
  Task_ID int unsigned NOT NULL, Start_Date date NOT NULL, End_Date date NOT NULL) engine=InnoDB;
 
INSERT INTO Projects VALUES
  (1, '2015-10-01', '2015-10-02'), (2, '2015-10-02', '2015-10-03'), (2, '2015-10-03', '2015-10-04'), (2, '2015-10-13', '2015-10-14'), (2, '2015-10-14', '2015-10-15'), (2, '2015-10-28', '2015-10-29'), (2, '2015-10-30', '2015-10-31');
 
SELECT Start_Date, min(End_Date), datediff(min(End_Date), Start_Date) AS 'test_diff'
FROM
	(SELECT Start_Date FROM Projects
 		WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
	(SELECT End_Date FROM Projects 
		WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY test_diff;
 
SELECT Start_Date, min(End_Date), datediff(min(End_Date), Start_Date) AS 'test_diff'
FROM
	(SELECT Start_Date FROM Projects
 		WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
	(SELECT End_Date FROM Projects 
		WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY datediff(min(End_Date), Start_Date);

Comment by Cuchac [ 2018-09-05 ]

Another example with just one aggregate in order by https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=a1491acfe30452d74e863d9a2602f8ca

Comment by Cuchac [ 2018-09-05 ]

Workaround is to put whole ordering expression inside aggregate function. See example in https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=a1491acfe30452d74e863d9a2602f8ca

Last query has correct ordering contrary to previous query. The only difference is in one bracket position.

Generated at Thu Feb 08 08:18:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.