[MDEV-3926] Wrong result with GROUP BY ... WITH ROLLUP Created: 2012-12-10  Updated: 2021-05-19  Resolved: 2013-03-21

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.28a
Fix Version/s: None

Type: Bug Priority: Major
Reporter: P Ka Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Windows server 2003, 64-bit


Issue Links:
Duplicate
duplicates MDEV-13055 from select statement not normal result Closed
Relates
relates to MDEV-5007 wrong order by moving from mysql to m... Closed

 Description   

The following query returns different result between MYSQL and MARIADB (5.5.28):

CREATE TABLE t(a int,b int,c int, amount1 int, amount2 int);
INSERT INTO t (a, b, c, amount1, amount2) values (1,1,1,1,1);
INSERT INTO t (a, b, c, amount1, amount2) values (2,2,2,2,1);
INSERT INTO t (a, b, c, amount1, amount2) values (2,3,3,3,1);
 
SELECT *,
    IF(`t_reorder`.`b` IS NULL, @ccount1:=amount1, @ccount1) AS `a_amount1`,
    IF(`t_reorder`.`b` IS NULL, @dcount1:=amount2, @dcount1) AS `a_amount2`
FROM (
	SELECT *
		FROM (
  		SELECT `a`, `b`,`c`,
			SUM(`amount1`) AS `amount1`, SUM(`amount2`) AS `amount2`
		FROM `t`
		WHERE (c IN (1,2,3)) AND ((`a` IS NOT NULL) AND (`b` IS NOT NULL))
		GROUP BY `a`, `b` WITH ROLLUP) `t_rollup`
	ORDER BY `a`, `b`
)`t_reorder`;
 
mysql.exe  Ver 14.14 Distrib 5.5.28, for Win32 (x86) 
(Correct)
+------+------+------+---------+---------+-----------+-----------+
| a    | b    | c    | amount1 | amount2 | a_amount1 | a_amount2 |
+------+------+------+---------+---------+-----------+-----------+
| NULL | NULL |    3 |       6 |       3 |         6 |         3 |
|    1 | NULL |    1 |       1 |       1 |         1 |         1 |
|    1 |    1 |    1 |       1 |       1 |         1 |         1 |
|    2 | NULL |    3 |       5 |       2 |         5 |         2 |
|    2 |    2 |    2 |       2 |       1 |         5 |         2 |
|    2 |    3 |    3 |       3 |       1 |         5 |         2 |
+------+------+------+---------+---------+-----------+-----------+
 
mysql.exe  Ver 15.1 Distrib 5.5.28-MariaDB, for Win32 (x86)
(Wrong values on a_amount1 and a_amount2)
ORDER BY `a`, `b` does not have any effect to this qyery
+------+------+------+---------+---------+-----------+-----------+
| a    | b    | c    | amount1 | amount2 | a_amount1 | a_amount2 |
+------+------+------+---------+---------+-----------+-----------+
|    1 |    1 |    1 |       1 |       1 |         6 |         3 |
|    1 | NULL |    1 |       1 |       1 |         1 |         1 |
|    2 |    2 |    2 |       2 |       1 |         1 |         1 |
|    2 |    3 |    3 |       3 |       1 |         1 |         1 |
|    2 | NULL |    3 |       5 |       2 |         5 |         2 |
| NULL | NULL |    3 |       6 |       3 |         6 |         3 |
+------+------+------+---------+---------+-----------+-----------+

MariaDB seems to ignore ORDER BY statement after subquery having GROUP BY ... WITH ROLLUP.

Optimizer_switch settings as per default.



 Comments   
Comment by Sergei Golubchik [ 2012-12-10 ]

MariaDB isn't wrong here. A "table" (and subquery in the FROM clause too) is - according to the SQL standard - an unordered set of rows. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order. That's why the optimizer can ignore the ORDER BY clause that you have specified. In fact, SQL standard does not even allow the ORDER BY clause to appear in this subquery (we allow it, because ORDER BY ... LIMIT ... changes the result, the set of rows, not only their order).

You need to treat the subquery in the FROM clause, as a set of rows in some unspecified and undefined order, and put the ORDER BY on the top-level SELECT.

Comment by Yann-Gaël GAUTHERON [ 2013-03-21 ]

I'm concerned by this behavior too. And i think it's a real problem because of MySQL is working good with this kind of subqueries with a special order.
My test case is simpler :

SELECT *
FROM (
SELECT *
FROM tableX
ORDER BY timeX DESC
) x
GROUP BY keyX

=> MySQL 5.1.62 does the right order, and returns the first occurence needed : the latest in time
=> MariaDB 5.5.28 ignore the order and always returns the first occurence it finds : the older

Why MySQL is working, and not MariaDB on this ?

Comment by Yann-Gaël GAUTHERON [ 2013-03-21 ]

I see that you already have closed this bug, do i need to create a new bug or can you please reopen this one ?

Comment by Sergei Golubchik [ 2013-03-21 ]

Please, read my comment above. This MariaDB's behavior is not a bug. In your query you rely on the specific implementation glitch in MySQL 5.1. MariaDB has a better subquery optimizer and it ignores ORDER BY in the subquery in the FROM clause. MySQL 5.6 has got an improved optimizer too, so I suppose, your query won't work in MySQL 5.6 either.

To have stable result on all MySQL/MariaDB versions (and other databases) you need to rewrite the query to conform to the SQL Standard semantics.

Comment by Yann-Gaël GAUTHERON [ 2013-03-21 ]

Thank you for your quick precision.
I believe you about MySQL 5.6, you are the expert.
But I don't really understand why it is called an optimization, because an optimization does not remove a feature ; but only improves a feature. It looks a depreciation.
Then maybe there is another way to get the same result with the SQL Standard like you said, if i find a solution i'll share it here.
Have an excellent day, and keep good work !

Comment by Yann-Gaël GAUTHERON [ 2013-03-21 ]

Fortunately, we are using this kind of subqueries for more than 5 years, and there has been always "deterministic" (not tried on MySQL 5.6).
Tables with InnoDB, more than 100 millions rows, and this behavior seems very logic to us (subquery ordered, group by to get all the firsts).
And now, i see with Google we are not alone to use this "unkind not standard SQL way" to get the results expected...

I'm not saying we are right and you are wrong, i'm just very surprised about what i'm reading on this page, because we are using this (bad standard SQL) successfully for a very long time without any drawbacks.

And now we are testing MariaDB for few months, and this is the first difference of behavior.

According to you words, we had a lot of chance for years : fine, we have been lucky, but it means we need to change a lot of queries to adapt our softwares to the future.

Good for you it's not a bug, but then it means we are a lot in the world with bugs to fix now.

Thank you for reading,
and have a nice day.

=================
Just to be sure of what we are seeing :
tableX ( primary int PK, int keyX, date timeX )
( 1, 1, 2012-01-01 )
( 2, 1, 2013-01-01 )

SELECT * FROM tableX ORDER BY timeX DESC
( 2, 1, 2013-01-01 )
( 1, 1, 2012-01-01 )

SELECT * FROM ( SELECT * FROM tableX ORDER BY timeX DESC ) x GROUP BY keyX
( 2, 1, 2013-01-01 ) on MySQL 5.1 and before
...but...
( 1, 1, 2012-01-01 ) on MariaDB 5.5

Comment by Yann-Gaël GAUTHERON [ 2013-03-21 ]

I think this thread on MySQL illustrates our discussion :

http://dev.mysql.com/doc/refman/5.1/en/example-maximum-column-group-row.html

And the last post confirms what you explained : "that can break in any future update".

Comment by Elena Stepanova [ 2013-03-21 ]

It's not hypothetical, things do change, and in MySQL too, sometimes you don't even need a major upgrade for that. Here is just one fairly recent example for you: http://bugs.mysql.com/bug.php?id=67846
It's not quite the same query as yours, but it hits the same non-determinism of GROUP BY behavior that you currently rely on.

Comment by Yann-Gaël GAUTHERON [ 2013-03-22 ]

Yes and the last post on your link says it's beginning from the 5.1.66 of MySQL.
We are still on 5.1.62 here in production, so we'll have this problem very soon, we must work at this shortly.

Thank you again for all,
and keep good work.

Comment by Yann-Gaël GAUTHERON [ 2013-03-22 ]

We tried a standard workaround on a concrete example :

Method A = SELECT SQL_NO_CACHE * FROM ( SELECT * FROM t ORDER BY d DESC ) x GROUP BY id;
Method B = SELECT SQL_NO_CACHE * FROM t WHERE d IN ( SELECT MAX(d) FROM t GROUP BY id ) GROUP BY id;

Method A on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0068 sec, NOT SQL Standard
Method B on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0675 sec
Method A on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec, NOT SQL Standard, ORDER BY ignored, NOT the result expected
Method B on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec

Times are the average of 5 trials with SQL_NO_CACHE enabled.

1. Is the method B the STANDARD SQL method to do the same ?
2. Does the time found on MariaDB 5.5 illustrate correctly the optimisation you talked about ?
3. If the method B isn't correct STANDARD SQL, can you guide us to a specific MariaDB forum or community we can find any help on this ?

Thank you for your help !

For information, here are the respective explains :
Method A on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0068 sec, NOT SQL Standard :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1427 Using temporary; Using filesort
2 DERIVED t ALL NULL NULL NULL NULL 1533 Using filesort

Method B on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0675 sec :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t index NULL id 4 NULL 1323 Using where
2 DEPENDENT SUBQUERY t index NULL id 4 NULL 1

Method A on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec, NOT SQL Standard, ORDER BY ignored, NOT the result expected :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t index NULL id 4 NULL 1234

Method B on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t ALL d NULL NULL NULL 1411 Using temporary; Using filesort
1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 database.t.d 1
2 MATERIALIZED t index NULL id 4 NULL 1411

Comment by Sergei Golubchik [ 2013-03-22 ]

google for "groupwise max" - you will see lots of different solutions to this problem.

you can also ask your question on irc, freenode, #maria channel. Or email your question to maria-discuss@lists.launchpad.net

Generated at Thu Feb 08 06:52:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.