[MDEV-22011] DISTINCT with JSON_ARRAYAGG gives wrong results Created: 2020-03-23  Updated: 2020-06-16  Resolved: 2020-06-12

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5
Fix Version/s: 10.5.4

Type: Bug Priority: Major
Reporter: Varun Gupta (Inactive) Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-11563 GROUP_CONCAT(DISTINCT ...) may produ... Closed
relates to MDEV-16620 Add support for JSON_ARRAYAGG and JSO... Closed
relates to MDEV-22840 JSON_ARRAYAGG gives wrong results wit... Closed

 Description   

As JSON_ARRAYAGG is implemented as a wrapper over Item_func_group_concat, MDEV-11563 is going to affect JSON_ARRAYAGG too.

The problem with DISTINCT in JSON_ARRAYAGG is that it expects the values read from the table for each record. It does not use the key but instead calls val_str() for all the arguments in the GROUP_CONCAT function.



 Comments   
Comment by Varun Gupta (Inactive) [ 2020-03-24 ]

Patch
http://lists.askmonty.org/pipermail/commits/2020-March/014221.html

Comment by Sergei Petrunia [ 2020-06-07 ]

Just an observation: DISTINCT modifier inside JSON_ARRAYAGG is not part of the standard. And MySQL 8 doesn't support this, either.

Comment by Sergei Petrunia [ 2020-06-07 ]

... and the patch breaks something in ORDER BY handling for JSON_ARRAYAGG.

Testcase:

cat main/_a1.test
create table t6(a varchar(255));
insert into t6 values ('red'),('blue');
select json_arrayagg(a) from t6;
select json_arrayagg(a order by a) from t6;
drop table t6;

Output:

create table t6(a varchar(255));
insert into t6 values ('red'),('blue');
select json_arrayagg(a) from t6;
json_arrayagg(a)
["red","blue"]
select json_arrayagg(a order by a) from t6;
json_arrayagg(a order by a)
["",null]
drop table t6;
main._a1                                 [ pass ]      5

and in the current 10.5, I get the correct output:

json_arrayagg(a order by a)
["blue","red"]

Comment by Sergei Petrunia [ 2020-06-07 ]

The above is a showstopper for this bug.

I personally would not mind (actually I would prefer this) if the issue is resolved by
disallowing DISTINCT with JSON_ARRAYAGG. Please check this with holyfoot.

Comment by Alexey Botchkov [ 2020-06-08 ]

I belive the DISTINCT can be helpful to some users in JSON_ARRAYAGG.
Besides the Oracle server supports it.
So I don't like removing the DISTINCT unless it's really problemmatic to fix.

Comment by Varun Gupta (Inactive) [ 2020-06-08 ]

I checked against 95da2113a050ad739fdaf60ee871329468a01554, we get incorrect results with ORDER BY in JSON_ARRAYAGG function.

MariaDB [test]> create table t6(a varchar(255));
Query OK, 0 rows affected (0.003 sec)
 
MariaDB [test]> insert into t6 values ('red'),('bl');
Query OK, 2 rows affected (0.012 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select json_arrayagg(a order by a) from t6;
+-----------------------------+
| json_arrayagg(a order by a) |
+-----------------------------+
| ["bl","bl"]                 |
+-----------------------------+
1 row in set (0.002 sec)

This was fixed by MDEV-21914

Comment by Varun Gupta (Inactive) [ 2020-06-08 ]

So here is a case which does not work with ORDER BY

MariaDB [test]> select json_arrayagg(a order by a) from t1;
+-----------------------------+
| json_arrayagg(a order by a) |
+-----------------------------+
| ["","blue","red"]           |
+-----------------------------+
1 row in set (0.002 sec)
 
MariaDB [test]> select * from t1;
+------+
| a    |
+------+
| red  |
| blue |
| NULL |
+------+
3 rows in set (0.001 sec)

Well handling of NULL columns cannot work with the current implementation of JSON_ARRAYAGG

Comment by Varun Gupta (Inactive) [ 2020-06-09 ]

So lets report a proper case here, the problem that exists in current 10.5 with distinct

create table t1 (
a VARCHAR(1000), b int);
insert into t1 values
( 0 , 1 ),( 1 , 2 ),( 2 , 3 ),( 3 , 4 ),( 4 , 5 ), ( 5 , 6 ), ( 0 , 11 ), ( 5 , 16 );

Now lets force temp files be created (making sure data does not fit in memory)

MariaDB [test]> set tmp_table_size=1024;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> select JSON_ARRAYAGG(distinct a) from t1;
+---------------------------+
| JSON_ARRAYAGG(distinct a) |
+---------------------------+
| ["0"]                     |
+---------------------------+
1 row in set (0.002 sec)

This is the same problem that is reported in MDEV-11563 for GROUP_CONCAT.

Comment by Varun Gupta (Inactive) [ 2020-06-09 ]

Now let us port the patch for MDEV-11563 on 10.5 (it is in branch 10.5-varun). The commit is https://github.com/MariaDB/server/commit/f968feed977bf6d684347308a36b7083175d6f55

and now running the above test

MariaDB [test]> set tmp_table_size=1024;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> select JSON_ARRAYAGG(distinct a) from t1;
+---------------------------+
| JSON_ARRAYAGG(distinct a) |
+---------------------------+
| ["0","1","2","3","4","5"] |
+---------------------------+
1 row in set (0.012 sec)

I get the correct output.

Comment by Varun Gupta (Inactive) [ 2020-06-09 ]

But even with the above port, the NULLS are still not handled with DISTINCT

CREATE TABLE t1(a INT, b INT);
INSERT INTO t1 VALUES (1,1), (2,2), (3,3);
INSERT INTO t1 VALUES (1,1), (2,2), (3,3);
INSERT INTO t1 VALUES (NULL,NULL), (NULL,NULL);

MariaDB [test]> SELECT JSON_ARRAYAGG(a) FROM t1;
+-------------------------+
| JSON_ARRAYAGG(a)        |
+-------------------------+
| [1,2,3,1,2,3,null,null] |
+-------------------------+

This is correct, now let try DISTINCT

MariaDB [test]> SELECT JSON_ARRAYAGG(DISTINCT a) FROM t1;
+---------------------------+
| JSON_ARRAYAGG(DISTINCT a) |
+---------------------------+
| [null,null,null,null]     |
+---------------------------+
1 row in set (0.001 sec)

So we get incorrect results with NULL value, even after the patch.
The code according to me cannot handle NULL values right now for JSON_ARRAYAGG function

Comment by Varun Gupta (Inactive) [ 2020-06-09 ]

Patches are pushed to the branch 10.5-varun2.
It contains the patch for both MDEV-22011 and MDEV-22840.

Comment by Varun Gupta (Inactive) [ 2020-06-09 ]

Patch
https://github.com/MariaDB/server/commit/c127f5021ccc7cebc09aebfdbe841e630993f12d

Comment by Sergei Petrunia [ 2020-06-11 ]

Review input:
https://lists.launchpad.net/maria-developers/msg12275.html

Ok to push after addressed.

Generated at Thu Feb 08 09:11:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.