[MDEV-20922] Adding an order by changes the query results Created: 2019-10-29  Updated: 2020-01-26  Resolved: 2020-01-07

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.3.17
Fix Version/s: 10.1.44, 10.2.31, 10.3.22, 10.4.12, 10.5.1

Type: Bug Priority: Major
Reporter: Bugra Gedik Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Ubuntu 2018


Issue Links:
Relates
relates to MDEV-17775 Different results of query with 'deri... Confirmed
relates to MDEV-20010 Equal on two RANK window functions cr... Closed
relates to MDEV-21565 main.group_by failed in buildbot with... Closed

 Description   

CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);

SELECT 
    anon.month_and_year, 
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue 
FROM (
    SELECT 
        id, value, 
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year 
    FROM revenue
) as anon 
GROUP BY anon.month_and_year 
ORDER BY average_revenue;

Produces

+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000         |        100.0000 |
| 2-2000         |        200.0000 |
| 1-2000         |        300.0000 |
| 2-2000         |        400.0000 |
+----------------+-----------------+

Removing the order by clause gives:

+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000         |        200.0000 |
| 2-2000         |        300.0000 |
+----------------+-----------------+

Turning off the derived_merge optimization gives:

set session optimizer_switch="derived_merge=off";

+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000         |        200.0000 |
| 2-2000         |        300.0000 |
+----------------+-----------------+

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:

  • Removing the DISTINCT removes the duplicate rows
  • Removing the ORDER BY removes the duplicate rows

Interestingly, when the derived_merged optimization is on (the default), the rewritten query seems correct:

EXPLAIN EXTENDED 
SELECT 
    anon.month_and_year, 
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue 
FROM (
    SELECT 
        id, value, 
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year 
    FROM revenue
) as anon 
GROUP BY anon.month_and_year 
ORDER BY average_revenue;

produces:

+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id   | select_type | table   | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                           |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
|    1 | SIMPLE      | revenue | ALL  | NULL          | NULL | NULL    | NULL |    8 |   100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+

SHOW WARNINGS;

+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

If I run the rewritten query (the one from above), it produces the correct result:

select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`)

 
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000         |        200.0000 |
| 2-2000         |        300.0000 |
+----------------+-----------------+

But the original query produces an incorrect result. This is particularly puzzling.



 Comments   
Comment by Varun Gupta (Inactive) [ 2019-12-03 ]

Patch
http://lists.askmonty.org/pipermail/commits/2019-December/014081.html

Comment by Varun Gupta (Inactive) [ 2019-12-18 ]

New Patch
http://lists.askmonty.org/pipermail/commits/2019-December/014111.html

Comment by Varun Gupta (Inactive) [ 2019-12-31 ]

Introduced val_*_result functions for Item_direct_view_ref to make sure to get the value from the item it is referring to.

Comment by Oleksandr Byelkin [ 2020-01-03 ]

Test is not stable because order by the same value, please fix it.

Comment by Varun Gupta (Inactive) [ 2020-01-07 ]

Made the test stable in this commit https://github.com/MariaDB/server/commit/1adc559370cc53ca69e225739a942287eba1b974

Generated at Thu Feb 08 09:03:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.