Details
Description
CREATE TABLE revenue(id int, month int, year int, value int); |
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); |
SELECT |
anon.month_and_year,
|
(SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue |
FROM ( |
SELECT |
id, value,
|
concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year |
FROM revenue |
) as anon |
GROUP BY anon.month_and_year |
ORDER BY average_revenue; |
Produces
+----------------+-----------------+
|
| month_and_year | average_revenue |
|
+----------------+-----------------+
|
| 1-2000 | 100.0000 |
|
| 2-2000 | 200.0000 |
|
| 1-2000 | 300.0000 |
|
| 2-2000 | 400.0000 |
|
+----------------+-----------------+
|
Removing the order by clause gives:
+----------------+-----------------+
|
| month_and_year | average_revenue |
|
+----------------+-----------------+
|
| 1-2000 | 200.0000 |
|
| 2-2000 | 300.0000 |
|
+----------------+-----------------+
|
Turning off the derived_merge optimization gives:
set session optimizer_switch="derived_merge=off"; |
+----------------+-----------------+
|
| month_and_year | average_revenue |
|
+----------------+-----------------+
|
| 1-2000 | 200.0000 |
|
| 2-2000 | 300.0000 |
|
+----------------+-----------------+
|
The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.
Also, here are a few more interesting observations:
- Removing the DISTINCT removes the duplicate rows
- Removing the ORDER BY removes the duplicate rows
Interestingly, when the derived_merged optimization is on (the default), the rewritten query seems correct:
EXPLAIN EXTENDED
|
SELECT |
anon.month_and_year,
|
(SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue |
FROM ( |
SELECT |
id, value,
|
concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year |
FROM revenue |
) as anon |
GROUP BY anon.month_and_year |
ORDER BY average_revenue; |
produces:
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
|
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
|
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
|
SHOW WARNINGS;
|
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Level | Code | Message |
|
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
|
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
If I run the rewritten query (the one from above), it produces the correct result:
select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
 |
+----------------+-----------------+
|
| month_and_year | average_revenue |
|
+----------------+-----------------+
|
| 1-2000 | 200.0000 |
|
| 2-2000 | 300.0000 |
|
+----------------+-----------------+
|
But the original query produces an incorrect result. This is particularly puzzling.
Attachments
Issue Links
- relates to
-
MDEV-17775 Different results of query with 'derived_merge=on';
- Confirmed
-
MDEV-20010 Equal on two RANK window functions create wrong result
- Closed
-
MDEV-21565 main.group_by failed in buildbot with wrong result
- Closed