Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-3926

Wrong result with GROUP BY ... WITH ROLLUP

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 5.5.28a
    • None
    • None
    • None
    • Windows server 2003, 64-bit

    Description

      The following query returns different result between MYSQL and MARIADB (5.5.28):

      CREATE TABLE t(a int,b int,c int, amount1 int, amount2 int);
      INSERT INTO t (a, b, c, amount1, amount2) values (1,1,1,1,1);
      INSERT INTO t (a, b, c, amount1, amount2) values (2,2,2,2,1);
      INSERT INTO t (a, b, c, amount1, amount2) values (2,3,3,3,1);
       
      SELECT *,
          IF(`t_reorder`.`b` IS NULL, @ccount1:=amount1, @ccount1) AS `a_amount1`,
          IF(`t_reorder`.`b` IS NULL, @dcount1:=amount2, @dcount1) AS `a_amount2`
      FROM (
      	SELECT *
      		FROM (
        		SELECT `a`, `b`,`c`,
      			SUM(`amount1`) AS `amount1`, SUM(`amount2`) AS `amount2`
      		FROM `t`
      		WHERE (c IN (1,2,3)) AND ((`a` IS NOT NULL) AND (`b` IS NOT NULL))
      		GROUP BY `a`, `b` WITH ROLLUP) `t_rollup`
      	ORDER BY `a`, `b`
      )`t_reorder`;
       
      mysql.exe  Ver 14.14 Distrib 5.5.28, for Win32 (x86) 
      (Correct)
      +------+------+------+---------+---------+-----------+-----------+
      | a    | b    | c    | amount1 | amount2 | a_amount1 | a_amount2 |
      +------+------+------+---------+---------+-----------+-----------+
      | NULL | NULL |    3 |       6 |       3 |         6 |         3 |
      |    1 | NULL |    1 |       1 |       1 |         1 |         1 |
      |    1 |    1 |    1 |       1 |       1 |         1 |         1 |
      |    2 | NULL |    3 |       5 |       2 |         5 |         2 |
      |    2 |    2 |    2 |       2 |       1 |         5 |         2 |
      |    2 |    3 |    3 |       3 |       1 |         5 |         2 |
      +------+------+------+---------+---------+-----------+-----------+
       
      mysql.exe  Ver 15.1 Distrib 5.5.28-MariaDB, for Win32 (x86)
      (Wrong values on a_amount1 and a_amount2)
      ORDER BY `a`, `b` does not have any effect to this qyery
      +------+------+------+---------+---------+-----------+-----------+
      | a    | b    | c    | amount1 | amount2 | a_amount1 | a_amount2 |
      +------+------+------+---------+---------+-----------+-----------+
      |    1 |    1 |    1 |       1 |       1 |         6 |         3 |
      |    1 | NULL |    1 |       1 |       1 |         1 |         1 |
      |    2 |    2 |    2 |       2 |       1 |         1 |         1 |
      |    2 |    3 |    3 |       3 |       1 |         1 |         1 |
      |    2 | NULL |    3 |       5 |       2 |         5 |         2 |
      | NULL | NULL |    3 |       6 |       3 |         6 |         3 |
      +------+------+------+---------+---------+-----------+-----------+

      MariaDB seems to ignore ORDER BY statement after subquery having GROUP BY ... WITH ROLLUP.

      Optimizer_switch settings as per default.

      Attachments

        Issue Links

          Activity

            MariaDB isn't wrong here. A "table" (and subquery in the FROM clause too) is - according to the SQL standard - an unordered set of rows. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order. That's why the optimizer can ignore the ORDER BY clause that you have specified. In fact, SQL standard does not even allow the ORDER BY clause to appear in this subquery (we allow it, because ORDER BY ... LIMIT ... changes the result, the set of rows, not only their order).

            You need to treat the subquery in the FROM clause, as a set of rows in some unspecified and undefined order, and put the ORDER BY on the top-level SELECT.

            serg Sergei Golubchik added a comment - MariaDB isn't wrong here. A "table" (and subquery in the FROM clause too) is - according to the SQL standard - an unordered set of rows. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order. That's why the optimizer can ignore the ORDER BY clause that you have specified. In fact, SQL standard does not even allow the ORDER BY clause to appear in this subquery (we allow it, because ORDER BY ... LIMIT ... changes the result, the set of rows, not only their order). You need to treat the subquery in the FROM clause, as a set of rows in some unspecified and undefined order, and put the ORDER BY on the top-level SELECT.

            I'm concerned by this behavior too. And i think it's a real problem because of MySQL is working good with this kind of subqueries with a special order.
            My test case is simpler :

            SELECT *
            FROM (
            SELECT *
            FROM tableX
            ORDER BY timeX DESC
            ) x
            GROUP BY keyX

            => MySQL 5.1.62 does the right order, and returns the first occurence needed : the latest in time
            => MariaDB 5.5.28 ignore the order and always returns the first occurence it finds : the older

            Why MySQL is working, and not MariaDB on this ?

            abalam Yann-Gaël GAUTHERON added a comment - I'm concerned by this behavior too. And i think it's a real problem because of MySQL is working good with this kind of subqueries with a special order. My test case is simpler : SELECT * FROM ( SELECT * FROM tableX ORDER BY timeX DESC ) x GROUP BY keyX => MySQL 5.1.62 does the right order, and returns the first occurence needed : the latest in time => MariaDB 5.5.28 ignore the order and always returns the first occurence it finds : the older Why MySQL is working, and not MariaDB on this ?

            I see that you already have closed this bug, do i need to create a new bug or can you please reopen this one ?

            abalam Yann-Gaël GAUTHERON added a comment - I see that you already have closed this bug, do i need to create a new bug or can you please reopen this one ?

            Please, read my comment above. This MariaDB's behavior is not a bug. In your query you rely on the specific implementation glitch in MySQL 5.1. MariaDB has a better subquery optimizer and it ignores ORDER BY in the subquery in the FROM clause. MySQL 5.6 has got an improved optimizer too, so I suppose, your query won't work in MySQL 5.6 either.

            To have stable result on all MySQL/MariaDB versions (and other databases) you need to rewrite the query to conform to the SQL Standard semantics.

            serg Sergei Golubchik added a comment - Please, read my comment above. This MariaDB's behavior is not a bug. In your query you rely on the specific implementation glitch in MySQL 5.1. MariaDB has a better subquery optimizer and it ignores ORDER BY in the subquery in the FROM clause. MySQL 5.6 has got an improved optimizer too, so I suppose, your query won't work in MySQL 5.6 either. To have stable result on all MySQL/MariaDB versions (and other databases) you need to rewrite the query to conform to the SQL Standard semantics.

            Thank you for your quick precision.
            I believe you about MySQL 5.6, you are the expert.
            But I don't really understand why it is called an optimization, because an optimization does not remove a feature ; but only improves a feature. It looks a depreciation.
            Then maybe there is another way to get the same result with the SQL Standard like you said, if i find a solution i'll share it here.
            Have an excellent day, and keep good work !

            abalam Yann-Gaël GAUTHERON added a comment - Thank you for your quick precision. I believe you about MySQL 5.6, you are the expert. But I don't really understand why it is called an optimization, because an optimization does not remove a feature ; but only improves a feature. It looks a depreciation. Then maybe there is another way to get the same result with the SQL Standard like you said, if i find a solution i'll share it here. Have an excellent day, and keep good work !

            Fortunately, we are using this kind of subqueries for more than 5 years, and there has been always "deterministic" (not tried on MySQL 5.6).
            Tables with InnoDB, more than 100 millions rows, and this behavior seems very logic to us (subquery ordered, group by to get all the firsts).
            And now, i see with Google we are not alone to use this "unkind not standard SQL way" to get the results expected...

            I'm not saying we are right and you are wrong, i'm just very surprised about what i'm reading on this page, because we are using this (bad standard SQL) successfully for a very long time without any drawbacks.

            And now we are testing MariaDB for few months, and this is the first difference of behavior.

            According to you words, we had a lot of chance for years : fine, we have been lucky, but it means we need to change a lot of queries to adapt our softwares to the future.

            Good for you it's not a bug, but then it means we are a lot in the world with bugs to fix now.

            Thank you for reading,
            and have a nice day.

            =================
            Just to be sure of what we are seeing :
            tableX ( primary int PK, int keyX, date timeX )
            ( 1, 1, 2012-01-01 )
            ( 2, 1, 2013-01-01 )

            SELECT * FROM tableX ORDER BY timeX DESC
            ( 2, 1, 2013-01-01 )
            ( 1, 1, 2012-01-01 )

            SELECT * FROM ( SELECT * FROM tableX ORDER BY timeX DESC ) x GROUP BY keyX
            ( 2, 1, 2013-01-01 ) on MySQL 5.1 and before
            ...but...
            ( 1, 1, 2012-01-01 ) on MariaDB 5.5

            abalam Yann-Gaël GAUTHERON added a comment - Fortunately, we are using this kind of subqueries for more than 5 years, and there has been always "deterministic" (not tried on MySQL 5.6). Tables with InnoDB, more than 100 millions rows, and this behavior seems very logic to us (subquery ordered, group by to get all the firsts). And now, i see with Google we are not alone to use this "unkind not standard SQL way" to get the results expected... I'm not saying we are right and you are wrong, i'm just very surprised about what i'm reading on this page, because we are using this (bad standard SQL) successfully for a very long time without any drawbacks. And now we are testing MariaDB for few months, and this is the first difference of behavior. According to you words, we had a lot of chance for years : fine, we have been lucky, but it means we need to change a lot of queries to adapt our softwares to the future. Good for you it's not a bug, but then it means we are a lot in the world with bugs to fix now. Thank you for reading, and have a nice day. ================= Just to be sure of what we are seeing : tableX ( primary int PK, int keyX, date timeX ) ( 1, 1, 2012-01-01 ) ( 2, 1, 2013-01-01 ) SELECT * FROM tableX ORDER BY timeX DESC ( 2, 1, 2013-01-01 ) ( 1, 1, 2012-01-01 ) SELECT * FROM ( SELECT * FROM tableX ORDER BY timeX DESC ) x GROUP BY keyX ( 2, 1, 2013-01-01 ) on MySQL 5.1 and before ...but... ( 1, 1, 2012-01-01 ) on MariaDB 5.5

            I think this thread on MySQL illustrates our discussion :

            http://dev.mysql.com/doc/refman/5.1/en/example-maximum-column-group-row.html

            And the last post confirms what you explained : "that can break in any future update".

            abalam Yann-Gaël GAUTHERON added a comment - I think this thread on MySQL illustrates our discussion : http://dev.mysql.com/doc/refman/5.1/en/example-maximum-column-group-row.html And the last post confirms what you explained : "that can break in any future update".

            It's not hypothetical, things do change, and in MySQL too, sometimes you don't even need a major upgrade for that. Here is just one fairly recent example for you: http://bugs.mysql.com/bug.php?id=67846
            It's not quite the same query as yours, but it hits the same non-determinism of GROUP BY behavior that you currently rely on.

            elenst Elena Stepanova added a comment - It's not hypothetical, things do change, and in MySQL too, sometimes you don't even need a major upgrade for that. Here is just one fairly recent example for you: http://bugs.mysql.com/bug.php?id=67846 It's not quite the same query as yours, but it hits the same non-determinism of GROUP BY behavior that you currently rely on.

            Yes and the last post on your link says it's beginning from the 5.1.66 of MySQL.
            We are still on 5.1.62 here in production, so we'll have this problem very soon, we must work at this shortly.

            Thank you again for all,
            and keep good work.

            abalam Yann-Gaël GAUTHERON added a comment - Yes and the last post on your link says it's beginning from the 5.1.66 of MySQL. We are still on 5.1.62 here in production, so we'll have this problem very soon, we must work at this shortly. Thank you again for all, and keep good work.

            We tried a standard workaround on a concrete example :

            Method A = SELECT SQL_NO_CACHE * FROM ( SELECT * FROM t ORDER BY d DESC ) x GROUP BY id;
            Method B = SELECT SQL_NO_CACHE * FROM t WHERE d IN ( SELECT MAX(d) FROM t GROUP BY id ) GROUP BY id;

            Method A on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0068 sec, NOT SQL Standard
            Method B on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0675 sec
            Method A on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec, NOT SQL Standard, ORDER BY ignored, NOT the result expected
            Method B on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec

            Times are the average of 5 trials with SQL_NO_CACHE enabled.

            1. Is the method B the STANDARD SQL method to do the same ?
            2. Does the time found on MariaDB 5.5 illustrate correctly the optimisation you talked about ?
            3. If the method B isn't correct STANDARD SQL, can you guide us to a specific MariaDB forum or community we can find any help on this ?

            Thank you for your help !

            For information, here are the respective explains :
            Method A on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0068 sec, NOT SQL Standard :
            id select_type table type possible_keys key key_len ref rows Extra
            1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1427 Using temporary; Using filesort
            2 DERIVED t ALL NULL NULL NULL NULL 1533 Using filesort

            Method B on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0675 sec :
            id select_type table type possible_keys key key_len ref rows Extra
            1 PRIMARY t index NULL id 4 NULL 1323 Using where
            2 DEPENDENT SUBQUERY t index NULL id 4 NULL 1

            Method A on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec, NOT SQL Standard, ORDER BY ignored, NOT the result expected :
            id select_type table type possible_keys key key_len ref rows Extra
            1 SIMPLE t index NULL id 4 NULL 1234

            Method B on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec :
            id select_type table type possible_keys key key_len ref rows Extra
            1 PRIMARY t ALL d NULL NULL NULL 1411 Using temporary; Using filesort
            1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 database.t.d 1
            2 MATERIALIZED t index NULL id 4 NULL 1411

            abalam Yann-Gaël GAUTHERON added a comment - We tried a standard workaround on a concrete example : Method A = SELECT SQL_NO_CACHE * FROM ( SELECT * FROM t ORDER BY d DESC ) x GROUP BY id; Method B = SELECT SQL_NO_CACHE * FROM t WHERE d IN ( SELECT MAX(d) FROM t GROUP BY id ) GROUP BY id; Method A on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0068 sec, NOT SQL Standard Method B on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0675 sec Method A on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec, NOT SQL Standard, ORDER BY ignored, NOT the result expected Method B on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec Times are the average of 5 trials with SQL_NO_CACHE enabled. 1. Is the method B the STANDARD SQL method to do the same ? 2. Does the time found on MariaDB 5.5 illustrate correctly the optimisation you talked about ? 3. If the method B isn't correct STANDARD SQL, can you guide us to a specific MariaDB forum or community we can find any help on this ? Thank you for your help ! For information, here are the respective explains : Method A on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0068 sec, NOT SQL Standard : id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1427 Using temporary; Using filesort 2 DERIVED t ALL NULL NULL NULL NULL 1533 Using filesort Method B on Production machine, MySQL 5.1.62, Linux Gentoo, 400 rows, 0.0675 sec : id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t index NULL id 4 NULL 1323 Using where 2 DEPENDENT SUBQUERY t index NULL id 4 NULL 1 Method A on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec, NOT SQL Standard, ORDER BY ignored, NOT the result expected : id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t index NULL id 4 NULL 1234 Method B on a beta machine, MariaDB 5.5.28, Linux Gentoo, 400 rows, 0.0043 sec : id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t ALL d NULL NULL NULL 1411 Using temporary; Using filesort 1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 database.t.d 1 2 MATERIALIZED t index NULL id 4 NULL 1411

            google for "groupwise max" - you will see lots of different solutions to this problem.

            you can also ask your question on irc, freenode, #maria channel. Or email your question to maria-discuss@lists.launchpad.net

            serg Sergei Golubchik added a comment - google for "groupwise max" - you will see lots of different solutions to this problem. you can also ask your question on irc, freenode, #maria channel. Or email your question to maria-discuss@lists.launchpad.net

            People

              Unassigned Unassigned
              mitrpaka P Ka
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.