Details
Description
CREATE TABLE revenue(id int, month int, year int, value int); |
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); |
SELECT |
anon.month_and_year,
|
(SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue |
FROM ( |
SELECT |
id, value,
|
concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year |
FROM revenue |
) as anon |
GROUP BY anon.month_and_year |
ORDER BY average_revenue; |
Produces
+----------------+-----------------+
|
| month_and_year | average_revenue |
|
+----------------+-----------------+
|
| 1-2000 | 100.0000 |
|
| 2-2000 | 200.0000 |
|
| 1-2000 | 300.0000 |
|
| 2-2000 | 400.0000 |
|
+----------------+-----------------+
|
Removing the order by clause gives:
+----------------+-----------------+
|
| month_and_year | average_revenue |
|
+----------------+-----------------+
|
| 1-2000 | 200.0000 |
|
| 2-2000 | 300.0000 |
|
+----------------+-----------------+
|
Turning off the derived_merge optimization gives:
set session optimizer_switch="derived_merge=off"; |
+----------------+-----------------+
|
| month_and_year | average_revenue |
|
+----------------+-----------------+
|
| 1-2000 | 200.0000 |
|
| 2-2000 | 300.0000 |
|
+----------------+-----------------+
|
The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.
Also, here are a few more interesting observations:
- Removing the DISTINCT removes the duplicate rows
- Removing the ORDER BY removes the duplicate rows
Interestingly, when the derived_merged optimization is on (the default), the rewritten query seems correct:
EXPLAIN EXTENDED
|
SELECT |
anon.month_and_year,
|
(SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue |
FROM ( |
SELECT |
id, value,
|
concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year |
FROM revenue |
) as anon |
GROUP BY anon.month_and_year |
ORDER BY average_revenue; |
produces:
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
|
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
|
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
|
SHOW WARNINGS;
|
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Level | Code | Message |
|
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
|
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
If I run the rewritten query (the one from above), it produces the correct result:
select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
|
+----------------+-----------------+
|
| month_and_year | average_revenue |
|
+----------------+-----------------+
|
| 1-2000 | 200.0000 |
|
| 2-2000 | 300.0000 |
|
+----------------+-----------------+
|
But the original query produces an incorrect result. This is particularly puzzling.
Attachments
Issue Links
- relates to
-
MDEV-17775 Different results of query with 'derived_merge=on';
-
- Confirmed
-
-
MDEV-20010 Equal on two RANK window functions create wrong result
-
- Closed
-
-
MDEV-21565 main.group_by failed in buildbot with wrong result
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue relates to MDEV-17775 [ MDEV-17775 ] |
Description |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off {derived_merge} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the `derived_merge` optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows |
Description |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the `derived_merge` optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows |
Summary | Adding an order by changes query results | Adding an order by changes the query results |
Description |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Some additional details: {code:sql} {code} |
Description |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Some additional details: {code:sql} {code} |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, CONCAT(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query, it produce the correct result: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 3-2000 | 200.0000 | | 1-2000 | 200.0000 | | 2-2000 | 266.6667 | +----------------+-----------------+ {code} But the original query produces a different result. This is particularly puzzling. |
Description |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, CONCAT(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query, it produce the correct result: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 3-2000 | 200.0000 | | 1-2000 | 200.0000 | | 2-2000 | 266.6667 | +----------------+-----------------+ {code} But the original query produces a different result. This is particularly puzzling. |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, CONCAT(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query (the one from above), it produces the correct result: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 3-2000 | 200.0000 | | 1-2000 | 200.0000 | | 2-2000 | 266.6667 | +----------------+-----------------+ {code} But the original query produces an incorrect result. This is particularly puzzling. |
Description |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, CONCAT(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query (the one from above), it produces the correct result: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 3-2000 | 200.0000 | | 1-2000 | 200.0000 | | 2-2000 | 266.6667 | +----------------+-----------------+ {code} But the original query produces an incorrect result. This is particularly puzzling. |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} produces: {code} +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query (the one from above), it produces the correct result: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 3-2000 | 200.0000 | | 1-2000 | 200.0000 | | 2-2000 | 266.6667 | +----------------+-----------------+ {code} But the original query produces an incorrect result. This is particularly puzzling. |
Description |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} produces: {code} +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query (the one from above), it produces the correct result: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 3-2000 | 200.0000 | | 1-2000 | 200.0000 | | 2-2000 | 266.6667 | +----------------+-----------------+ {code} But the original query produces an incorrect result. This is particularly puzzling. |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} produces: {code} +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query (the one from above), it produces the correct result: {code:sql} select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 3-2000 | 200.0000 | | 1-2000 | 200.0000 | | 2-2000 | 266.6667 | +----------------+-----------------+ {code} But the original query produces an incorrect result. This is particularly puzzling. |
Assignee | Varun Gupta [ varun ] |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Fix Version/s | 10.1 [ 16100 ] | |
Fix Version/s | 10.2 [ 14601 ] |
Description |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} produces: {code} +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query (the one from above), it produces the correct result: {code:sql} select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 3-2000 | 200.0000 | | 1-2000 | 200.0000 | | 2-2000 | 266.6667 | +----------------+-----------------+ {code} But the original query produces an incorrect result. This is particularly puzzling. |
{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int); INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400); {code} {code:sql} SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} Produces {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 100.0000 | | 2-2000 | 200.0000 | | 1-2000 | 300.0000 | | 2-2000 | 400.0000 | +----------------+-----------------+ {code} Removing the order by clause gives: {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} Turning off the {{derived_merge}} optimization gives: {code:sql} set session optimizer_switch="derived_merge=off"; {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join. Also, here are a few more interesting observations: * Removing the DISTINCT removes the duplicate rows * Removing the ORDER BY removes the duplicate rows Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct: {code:sql} EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue; {code} produces: {code} +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort | +------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+ {code} {code:sql} SHOW WARNINGS; {code} {code} +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {code} If I run the rewritten query (the one from above), it produces the correct result: {code:sql} select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) {code} {code} +----------------+-----------------+ | month_and_year | average_revenue | +----------------+-----------------+ | 1-2000 | 200.0000 | | 2-2000 | 300.0000 | +----------------+-----------------+ {code} But the original query produces an incorrect result. This is particularly puzzling. |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Assignee | Varun Gupta [ varun ] | Igor Babaev [ igor ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Link |
This issue relates to |
Assignee | Igor Babaev [ igor ] | Varun Gupta [ varun ] |
Fix Version/s | 10.1.44 [ 23912 ] | |
Fix Version/s | 10.2.31 [ 24017 ] | |
Fix Version/s | 10.3.22 [ 24018 ] | |
Fix Version/s | 10.4.12 [ 24019 ] | |
Fix Version/s | 10.5.1 [ 24029 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.1 [ 16100 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Review [ 10002 ] | Closed [ 6 ] |
Resolution | Fixed [ 1 ] | |
Status | Closed [ 6 ] | Stalled [ 10000 ] |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Link |
This issue relates to |
Workflow | MariaDB v3 [ 100689 ] | MariaDB v4 [ 156914 ] |
Patch
http://lists.askmonty.org/pipermail/commits/2019-December/014081.html