[MDEV-20922] Adding an order by changes the query results - Jira

Bugra Gedik created issue - 2019-10-29 22:30

Bugra Gedik made changes - 2019-10-29 22:30

Field	Original Value	New Value
Link		This issue relates to MDEV-17775 [ MDEV-17775 ]

Bugra Gedik made changes - 2019-10-29 22:31

Description

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off {derived_merge} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the `derived_merge` optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Bugra Gedik made changes - 2019-10-29 22:31

Description

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the `derived_merge` optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Bugra Gedik made changes - 2019-10-29 23:08

Summary

Adding an order by changes query results

Adding an order by changes the query results

Bugra Gedik made changes - 2019-10-29 23:09

Description

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Some additional details:

{code:sql}

{code}

Bugra Gedik made changes - 2019-10-29 23:12

Description

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Some additional details:

{code:sql}

{code}

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, CONCAT(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue;
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query, it produce the correct result:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 3-2000 | 200.0000 |
| 1-2000 | 200.0000 |
| 2-2000 | 266.6667 |
+----------------+-----------------+
{code}

But the original query produces a different result. This is particularly puzzling.

Bugra Gedik made changes - 2019-10-29 23:13

Description

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, CONCAT(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue;
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query, it produce the correct result:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 3-2000 | 200.0000 |
| 1-2000 | 200.0000 |
| 2-2000 | 266.6667 |
+----------------+-----------------+
{code}

But the original query produces a different result. This is particularly puzzling.

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, CONCAT(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue;
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query (the one from above), it produces the correct result:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 3-2000 | 200.0000 |
| 1-2000 | 200.0000 |
| 2-2000 | 266.6667 |
+----------------+-----------------+
{code}

But the original query produces an incorrect result. This is particularly puzzling.

Bugra Gedik made changes - 2019-10-29 23:14

Description

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED SELECT anon.month_and_year, (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue FROM ( SELECT id, value, CONCAT(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year FROM revenue ) as anon GROUP BY anon.month_and_year ORDER BY average_revenue;
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query (the one from above), it produces the correct result:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 3-2000 | 200.0000 |
| 1-2000 | 200.0000 |
| 2-2000 | 266.6667 |
+----------------+-----------------+
{code}

But the original query produces an incorrect result. This is particularly puzzling.

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}
produces:
{code}
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query (the one from above), it produces the correct result:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 3-2000 | 200.0000 |
| 1-2000 | 200.0000 |
| 2-2000 | 266.6667 |
+----------------+-----------------+
{code}

But the original query produces an incorrect result. This is particularly puzzling.

Bugra Gedik made changes - 2019-10-29 23:15

Description

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}
produces:
{code}
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query (the one from above), it produces the correct result:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 3-2000 | 200.0000 |
| 1-2000 | 200.0000 |
| 2-2000 | 266.6667 |
+----------------+-----------------+
{code}

But the original query produces an incorrect result. This is particularly puzzling.

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}
produces:
{code}
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query (the one from above), it produces the correct result:
{code:sql}
select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`)
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 3-2000 | 200.0000 |
| 1-2000 | 200.0000 |
| 2-2000 | 266.6667 |
+----------------+-----------------+
{code}

But the original query produces an incorrect result. This is particularly puzzling.

Varun Gupta (Inactive) made changes - 2019-10-30 12:47

Assignee

Varun Gupta [ varun ]

Varun Gupta (Inactive) made changes - 2019-10-30 12:48

Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]

Varun Gupta (Inactive) made changes - 2019-10-30 12:48

Status

Open [ 1 ]

Confirmed [ 10101 ]

Varun Gupta (Inactive) made changes - 2019-10-30 12:57

Fix Version/s		10.1 [ 16100 ]
Fix Version/s		10.2 [ 14601 ]

Bugra Gedik made changes - 2019-10-30 14:16

Description

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}
produces:
{code}
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query (the one from above), it produces the correct result:
{code:sql}
select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`)
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 3-2000 | 200.0000 |
| 1-2000 | 200.0000 |
| 2-2000 | 266.6667 |
+----------------+-----------------+
{code}

But the original query produces an incorrect result. This is particularly puzzling.

{code:sql}
CREATE TABLE revenue(id int, month int, year int, value int);
INSERT INTO revenue values (1, 1, 2000, 100), (2, 2, 2000, 200), (3, 1, 2000, 300), (4, 2, 2000, 400);
{code}

{code:sql}
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}

Produces

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 100.0000 |
| 2-2000 | 200.0000 |
| 1-2000 | 300.0000 |
| 2-2000 | 400.0000 |
+----------------+-----------------+
{code}

Removing the order by clause gives:
{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

Turning off the {{derived_merge}} optimization gives:
{code:sql}
set session optimizer_switch="derived_merge=off";
{code}

{code}
+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

The issue seems to be somewhat similar to MDEV-17775, but in my case there is no join.

Also, here are a few more interesting observations:
* Removing the DISTINCT removes the duplicate rows
* Removing the ORDER BY removes the duplicate rows

Interestingly, when the {{derived_merged}} optimization is on (the default), the rewritten query seems correct:

{code:sql}
EXPLAIN EXTENDED
SELECT
    anon.month_and_year,
    (SUM(anon.value) / COUNT(DISTINCT anon.id)) AS average_revenue
FROM (
    SELECT
        id, value,
        concat(CAST(month AS CHAR(2)), '-', CAST(year AS CHAR(4))) AS month_and_year
    FROM revenue
) as anon
GROUP BY anon.month_and_year
ORDER BY average_revenue;
{code}
produces:
{code}
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
| 1 | SIMPLE | revenue | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------+---------------------------------+
{code}

{code:sql}
SHOW WARNINGS;
{code}

{code}
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{code}

If I run the rewritten query (the one from above), it produces the correct result:
{code:sql}
select concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) AS `month_and_year`,sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`) AS `average_revenue` from `test`.`revenue` group by concat(cast(`test`.`revenue`.`month` as char(2) charset utf8mb4),'-',cast(`test`.`revenue`.`year` as char(4) charset utf8mb4)) order by sum(`test`.`revenue`.`value`) / count(distinct `test`.`revenue`.`id`)
{code}

{code}

+----------------+-----------------+
| month_and_year | average_revenue |
+----------------+-----------------+
| 1-2000 | 200.0000 |
| 2-2000 | 300.0000 |
+----------------+-----------------+
{code}

But the original query produces an incorrect result. This is particularly puzzling.

Varun Gupta (Inactive) made changes - 2019-11-05 12:11

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

Varun Gupta (Inactive) made changes - 2019-11-08 13:43

Priority

Critical [ 2 ]

Major [ 3 ]

Varun Gupta (Inactive) added a comment - 2019-12-03 13:49

Patch
http://lists.askmonty.org/pipermail/commits/2019-December/014081.html

Varun Gupta (Inactive) added a comment - 2019-12-03 13:49 Patch http://lists.askmonty.org/pipermail/commits/2019-December/014081.html

Varun Gupta (Inactive) made changes - 2019-12-03 13:54

Assignee	Varun Gupta [ varun ]	Igor Babaev [ igor ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Varun Gupta (Inactive) added a comment - 2019-12-18 19:00

New Patch
http://lists.askmonty.org/pipermail/commits/2019-December/014111.html

Varun Gupta (Inactive) added a comment - 2019-12-18 19:00 New Patch http://lists.askmonty.org/pipermail/commits/2019-December/014111.html

Varun Gupta (Inactive) made changes - 2019-12-19 11:43

Link

This issue relates to ~~MDEV-20010~~ [ ~~MDEV-20010~~ ]

Igor Babaev (Inactive) made changes - 2019-12-19 23:25

Assignee

Igor Babaev [ igor ]

Varun Gupta [ varun ]

Varun Gupta (Inactive) added a comment - 2019-12-31 05:31

Introduced val_*_result functions for Item_direct_view_ref to make sure to get the value from the item it is referring to.

Varun Gupta (Inactive) added a comment - 2019-12-31 05:31 Introduced val_*_result functions for Item_direct_view_ref to make sure to get the value from the item it is referring to.

Varun Gupta (Inactive) made changes - 2020-01-02 04:21

Fix Version/s		10.1.44 [ 23912 ]
Fix Version/s		10.2.31 [ 24017 ]
Fix Version/s		10.3.22 [ 24018 ]
Fix Version/s		10.4.12 [ 24019 ]
Fix Version/s		10.5.1 [ 24029 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.1 [ 16100 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Resolution		Fixed [ 1 ]
Status	In Review [ 10002 ]	Closed [ 6 ]

Oleksandr Byelkin added a comment - 2020-01-03 17:05

Test is not stable because order by the same value, please fix it.

Oleksandr Byelkin added a comment - 2020-01-03 17:05 Test is not stable because order by the same value, please fix it.

Oleksandr Byelkin made changes - 2020-01-03 17:05

Resolution	Fixed [ 1 ]
Status	Closed [ 6 ]	Stalled [ 10000 ]

Varun Gupta (Inactive) added a comment - 2020-01-07 05:59

Made the test stable in this commit https://github.com/MariaDB/server/commit/1adc559370cc53ca69e225739a942287eba1b974

Varun Gupta (Inactive) added a comment - 2020-01-07 05:59 Made the test stable in this commit https://github.com/MariaDB/server/commit/1adc559370cc53ca69e225739a942287eba1b974

Varun Gupta (Inactive) made changes - 2020-01-07 05:59

Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Varun Gupta (Inactive) made changes - 2020-01-26 04:15

Link

This issue relates to ~~MDEV-21565~~ [ ~~MDEV-21565~~ ]

Sergei Golubchik made changes - 2021-12-06 21:50

Workflow

MariaDB v3 [ 100689 ]

MariaDB v4 [ 156914 ]

MariaDB Server

Adding an order by changes the query results

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration