[MDEV-10253] select field group by field with rollup Created: 2016-06-19  Updated: 2022-01-25

Status: Open
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.1.14, 5.5, 10.0, 10.1, 10.2
Fix Version/s: 10.2

Type: Bug Priority: Minor
Reporter: Philip orleans Assignee: Oleksandr Byelkin
Resolution: Unresolved Votes: 0
Labels: upstream
Environment:

Linux



 Description   

I tested this issue with MsSQL server, on identical data and table. In MsSQL it works correctly.

Suppose you have a table with field1, field2, etc.
if you do a query like

select field1, sum(field2)
group by field1
with rollup

in MariaDB the result is correct and looks like this, where x and y are the contents of columns field1 and field2

x, sum(y)
x1,sum(y)
null, sum(y)

That is: the null on the lower left indicates that this is the grouping field.

However, suppose that this column is a string, and then we are grouping on a part of the column. Here is a concrete example.

CREATE TABLE demo(field1 varchar(100) PRIMARY KEY,field2 int(11));
INSERT demo(field1,field2) VALUES('192.168.1.20-Servere1',100),('192.168.1.3-Server3',200),('192.168.1.20-Server1',100),('192.168.1.3-Server2',400);
 select * from demo;

    SELECT SUBSTRING(field1,1,instr(field1, '-')-1) AS Name, SUM(field2)
    FROM demo
    GROUP BY SUBSTRING(field1,1,instr(field1, '-')-1)
    WITH rollup

  • RESULT IS INCORRECT, THE LOWER LEFT NULL IS NOT PRESENT

By contrast, in MsSQL, this identical query is correct:

  SELECT SUBSTRING(field1,1, charINDEX('-',field1, 0)-1) AS Mycol, SUM(field2)
    FROM demo
    GROUP BY SUBSTRING(field1,1, charINDEX('-',field1, 0)-1)
    WITH ROLLUP



 Comments   
Comment by Elena Stepanova [ 2016-06-19 ]

The result is reproducible on all versions of MySQL and MariaDB available to me.
I'm not sure however it is a bug, even though the behavior is certainly not intuitive.

Here is the MySQL manual page about ROLLUP. Importantly, it says:

For any column in the result set with a name that is a lexical match to any of those names, its value is set to NULL. (If you specify grouping columns by column number, the server identifies which columns to set to NULL by number.)

Now, if we look at the query in question and its result set,

SELECT SUBSTRING(field1,1,instr(field1, '-')-1) AS Name, SUM(field2)
    FROM demo
    GROUP BY SUBSTRING(field1,1,instr(field1, '-')-1)
    WITH rollup;
 
+--------------+-------------+
| Name         | SUM(field2) |
+--------------+-------------+
| 192.168.1.20 |         200 |
| 192.168.1.3  |         600 |
| 192.168.1.3  |         800 |
+--------------+-------------+
3 rows in set (0.01 sec)

the column in the result set is Name, because it is called so in the SELECT list. However, GROUP BY refers to SUBSTRING(field1,1,instr(field1, '-')-1), which is certainly not a lexical match to Name.

If we modify the query like to use Name in GROUP BY, we get the expected result:

  SELECT SUBSTRING(field1,1,instr(field1, '-')-1) AS Name, SUM(field2)
    FROM demo
    GROUP BY Name
    WITH rollup;
 
+--------------+-------------+
| Name         | SUM(field2) |
+--------------+-------------+
| 192.168.1.20 |         200 |
| 192.168.1.3  |         600 |
| NULL         |         800 |
+--------------+-------------+
3 rows in set (0.01 sec)

So, if a column nickname is used, it needs to be used consistently.

Further, if we remove the column nickname from the SELECT list in the initial query, we still don't get the result we expect:

  SELECT SUBSTRING(field1,1,instr(field1, '-')-1), SUM(field2)
    FROM demo
    GROUP BY SUBSTRING(field1,1,instr(field1, '-')-1)
    WITH rollup;
 
+------------------------------------------+-------------+
| SUBSTRING(field1,1,instr(field1, '-')-1) | SUM(field2) |
+------------------------------------------+-------------+
| 192.168.1.20                             |         200 |
| 192.168.1.3                              |         600 |
| 192.168.1.3                              |         800 |
+------------------------------------------+-------------+
3 rows in set (0.00 sec)

I assume it's because with all those non-alphanumeric symbols in SUBSTRING(field1,1,instr(field1, '-')-1), the match just does not work.*
However, if we quote the name in GROUP BY, it works:

  SELECT SUBSTRING(field1,1,instr(field1, '-')-1), SUM(field2)
    FROM demo
    GROUP BY `SUBSTRING(field1,1,instr(field1, '-')-1)`
    WITH rollup;
 
+------------------------------------------+-------------+
| SUBSTRING(field1,1,instr(field1, '-')-1) | SUM(field2) |
+------------------------------------------+-------------+
| 192.168.1.20                             |         200 |
| 192.168.1.3                              |         600 |
| NULL                                     |         800 |
+------------------------------------------+-------------+
3 rows in set (0.00 sec)


* maybe there is something to fix in there


bar, could you please take a look and see whether, from your point of view, there is anything to fix here (or to convert to a feature request)? If not, please feel free to close. If you are unsure, please reassign to whomever you think it should belong to.
If you decide to fix something, feel free to modify the 'Fix version/s' any way you want.

Comment by Philip orleans [ 2016-06-19 ]

the alternative does not really work
SELECT ifnull(SUBSTRING(field1,1,instr(field1, '-')-1),'Total') AS Name, SUM(field2)
FROM demo
GROUP BY Name
WITH rollup;

If this worked, I would get "Total" on the lower left, but I do not.
In MsSQL it works fine.
The whole idea is to catch the nulls and show a meaningful word, like "Total".

No matter how you do it, in MariaDB there is no way to get to the right result.
If Itry the original
SELECT ifnull(SUBSTRING(field1,1,instr(field1, '-')-1),'Total') AS Name, SUM(field2)
FROM demo
GROUP BY SUBSTRING(field1,1,instr(field1, '-')-1)
WITH rollup;

it does not work either, because there is never a null in the lower left.
I emphasize that we need to offer identical results to identical data and query, compared to paid enterprise databases. Otherwise the business will never be able to dump those monsters.

Comment by Elena Stepanova [ 2016-06-19 ]

the alternative does not really work.
...
If this worked, I would get "Total" on the lower left, but I do not.

Sorry, it's hard to keep up, because your requirements change on the fly. In the initial description you said

RESULT IS INCORRECT, THE LOWER LEFT NULL IS NOT PRESENT

So, with the consistent use of the column alias it does work, the lower left NULL is present. If you want to use this NULL in the query, it's a totally different story. Once again, I recommend reading documentation, it really helps. From the same page:

Because the NULL values in the super-aggregate rows are placed into the result set at such a late stage in query processing, you cannot test them as NULL values within the query itself. For example, you cannot add HAVING product IS NULL to the query to eliminate from the output all but the super-aggregate rows.

Regarding

I emphasize that we need to offer identical results to identical data and query, compared to paid enterprise databases.

This is simply impossible, paid enterprise databases themselves don't offer results identical to each other. I'm pretty sure if you try Oracle, you'll get something different from MSSQL.

Best we can try to do is to conform to the standard. If you find that the described behavior is different from the standard, it might be a valid feature request.

Comment by Philip orleans [ 2016-06-19 ]

You cannot show the word NULL to a final user in a report. The NULL is only useful if it will be replaced by a meaningful word. I wrote a system whith hundreds of reports, and went straight from Select ..group by with rollup to the customer, only because that NULL can be detected in MsSQL. In MsSQL, they have a function called grouping(fieldname) as boolean, that tells you when your are in the intersection of a group, and then you may show what that subtotal means. I think that we need to allow the coder a similar functionality. It is not me, it is something that the industry finds clearly lacking in comparison with MsSQL. We need to be better than MsSQL, not worse. As you know, MsSQL is being written for Linux. If we act with your frame of mind, this database will disappear from the industry without trace. As Donal Trump says, "believe me, I know"

Comment by Philip orleans [ 2021-01-07 ]

I wonder if we are ever going to do this.

Generated at Thu Feb 08 07:40:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.