Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20922

Adding an order by changes the query results

    XMLWordPrintable

    Details

      Description

      CREATE TABLE revenue(id int, month int, year int, value int);
      INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
      

      SELECT 
          anon.month_and_year, 
          (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue 
      FROM (
          SELECT 
              id, value, 
              concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year 
          FROM revenue
      ) as anon 
      GROUP BY anon.month_and_year 
      ORDER BY average_revenue;
      

      Produces

      +----------------+-----------------+
      | month_and_year | average_revenue |
      +----------------+-----------------+
      | 1-2000         |        100.0000 |
      | 2-2000         |        200.0000 |
      | 1-2000         |        300.0000 |
      | 2-2000         |        400.0000 |
      +----------------+-----------------+
      

      Removing the order by clause gives:

      +----------------+-----------------+
      | month_and_year | average_revenue |
      +----------------+-----------------+
      | 1-2000         |        200.0000 |
      | 2-2000         |        300.0000 |
      +----------------+-----------------+
      

      Turning off the derived_merge optimization gives:

      set session optimizer_switch="derived_merge=off";
      

      +----------------+-----------------+
      | month_and_year | average_revenue |
      +----------------+-----------------+
      | 1-2000         |        200.0000 |
      | 2-2000         |        300.0000 |
      +----------------+-----------------+
      

      The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

      Also, here are a few more interesting observations:

      • Removing the DISTINCT removes the duplicate rows
      • Removing the ORDER BY removes the duplicate rows

      Interestingly, when the derived_merged optimization is on (the default), the rewritten query seems correct:

      EXPLAIN EXTENDED 
      SELECT 
          anon.month_and_year, 
          (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue 
      FROM (
          SELECT 
              id, value, 
              concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year 
          FROM revenue
      ) as anon 
      GROUP BY anon.month_and_year 
      ORDER BY average_revenue;
      

      produces:

      +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
      | id   | select_type | table   | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                           |
      +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
      |    1 | SIMPLE      | revenue | ALL  | NULL          | NULL | NULL    | NULL |    8 |   100.00 | Using temporary; Using filesort |
      +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
      

      SHOW WARNINGS;
      

      +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Level | Code | Message                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
      +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Note  | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
      +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      If I run the rewritten query (the one from above), it produces the correct result:

      select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`)
      

       
      +----------------+-----------------+
      | month_and_year | average_revenue |
      +----------------+-----------------+
      | 1-2000         |        200.0000 |
      | 2-2000         |        300.0000 |
      +----------------+-----------------+
      

      But the original query produces an incorrect result. This is particularly puzzling.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              varun Varun Gupta
              Reporter:
              bugra.gedik Bugra Gedik
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: