Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20922

Adding an order by changes the query results

Details

    Description

      CREATE TABLE revenue(id int, month int, year int, value int);
      INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
      

      SELECT 
          anon.month_and_year, 
          (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue 
      FROM (
          SELECT 
              id, value, 
              concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year 
          FROM revenue
      ) as anon 
      GROUP BY anon.month_and_year 
      ORDER BY average_revenue;
      

      Produces

      +----------------+-----------------+
      | month_and_year | average_revenue |
      +----------------+-----------------+
      | 1-2000         |        100.0000 |
      | 2-2000         |        200.0000 |
      | 1-2000         |        300.0000 |
      | 2-2000         |        400.0000 |
      +----------------+-----------------+
      

      Removing the order by clause gives:

      +----------------+-----------------+
      | month_and_year | average_revenue |
      +----------------+-----------------+
      | 1-2000         |        200.0000 |
      | 2-2000         |        300.0000 |
      +----------------+-----------------+
      

      Turning off the derived_merge optimization gives:

      set session optimizer_switch="derived_merge=off";
      

      +----------------+-----------------+
      | month_and_year | average_revenue |
      +----------------+-----------------+
      | 1-2000         |        200.0000 |
      | 2-2000         |        300.0000 |
      +----------------+-----------------+
      

      The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

      Also, here are a few more interesting observations:

      • Removing the DISTINCT removes the duplicate rows
      • Removing the ORDER BY removes the duplicate rows

      Interestingly, when the derived_merged optimization is on (the default), the rewritten query seems correct:

      EXPLAIN EXTENDED 
      SELECT 
          anon.month_and_year, 
          (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue 
      FROM (
          SELECT 
              id, value, 
              concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year 
          FROM revenue
      ) as anon 
      GROUP BY anon.month_and_year 
      ORDER BY average_revenue;
      

      produces:

      +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
      | id   | select_type | table   | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                           |
      +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
      |    1 | SIMPLE      | revenue | ALL  | NULL          | NULL | NULL    | NULL |    8 |   100.00 | Using temporary; Using filesort |
      +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
      

      SHOW WARNINGS;
      

      +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Level | Code | Message                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
      +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Note  | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
      +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      If I run the rewritten query (the one from above), it produces the correct result:

      select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`)
      

       
      +----------------+-----------------+
      | month_and_year | average_revenue |
      +----------------+-----------------+
      | 1-2000         |        200.0000 |
      | 2-2000         |        300.0000 |
      +----------------+-----------------+
      

      But the original query produces an incorrect result. This is particularly puzzling.

      Attachments

        Issue Links

          Activity

            varun Varun Gupta (Inactive) added a comment - Patch http://lists.askmonty.org/pipermail/commits/2019-December/014081.html
            varun Varun Gupta (Inactive) added a comment - New Patch http://lists.askmonty.org/pipermail/commits/2019-December/014111.html

            Introduced val_*_result functions for Item_direct_view_ref to make sure to get the value from the item it is referring to.

            varun Varun Gupta (Inactive) added a comment - Introduced val_*_result functions for Item_direct_view_ref to make sure to get the value from the item it is referring to.

            Test is not stable because order by the same value, please fix it.

            sanja Oleksandr Byelkin added a comment - Test is not stable because order by the same value, please fix it.
            varun Varun Gupta (Inactive) added a comment - Made the test stable in this commit https://github.com/MariaDB/server/commit/1adc559370cc53ca69e225739a942287eba1b974

            People

              varun Varun Gupta (Inactive)
              bugra.gedik Bugra Gedik
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.