[MDEV-12789] JSON_KEYS returns duplicate keys twice Created: 2017-05-12  Updated: 2017-08-10  Resolved: 2017-08-08

Status: Closed
Project: MariaDB Server
Component/s: JSON
Affects Version/s: 10.2.5
Fix Version/s: 10.2.8

Type: Bug Priority: Minor
Reporter: Anders Karlsson Assignee: Alexey Botchkov
Resolution: Fixed Votes: 0
Labels: None
Environment:

Linux CentOS 6.5



 Description   

When there are multiple attributes in a JSON object with the same name, which is allowed but discouraged in JSON, JSON_KEYS returns one entry for each duplicate:

MySQL [test]> select JSON_KEYS('{"c1": "value 1", "c1": "value 2"}');
+-------------------------------------------------+
| JSON_KEYS('{"c1": "value 1", "c1": "value 2"}') |
+-------------------------------------------------+
| ["c1", "c1"]                                    |
+-------------------------------------------------+

MySQL 5.7 returns just the single key:

MySQL [test]> select JSON_KEYS('{"c1": "value 1", "c1": "value 2"}');
+-------------------------------------------------+
| JSON_KEYS('{"c1": "value 1", "c1": "value 2"}') |
+-------------------------------------------------+
| ["c1"]                                          |
+-------------------------------------------------+

Multiple values for a single key are discouraged, but I think MySQL has the correct behavior as the JSON_KEYS function should return an array of the keys, and although "c1" in the example above exists twice in the object, it is the same key.



 Comments   
Comment by Elena Stepanova [ 2017-05-13 ]

But should JSON_KEYS return keys or unique keys, according to the standard? If it's just keys, and while duplicate keys are only discouraged but not prohibited, MariaDB behavior seems justified – there are indeed two keys "c1", so it returns two.

Comment by Anders Karlsson [ 2017-05-15 ]

That is a valid point, but I'm not sure there is a good standard. MySQL works differently returning just 1 key. Now, my reasoning for voting for the MySQL behavior is that even though there are two keys, there is actually just 1 value, i.e. you can not access the second or following values with the same key. So if there is just 1 value there is in essence only one key/value pair. I tested how Jansson and PHP treats this, and PHP ignores the duplicates although stick with the last, instead of the first, value.

Comment by Elena Stepanova [ 2017-05-15 ]

I have no opinion on the subject, so I'm assigning it to holyfoot to consult with serg and decide what's best.

Just please note that if you decide to change it, some other functions need to be changed accordingly, so they provide consistent results. For example,

MariaDB

MariaDB [test]> select json_length('{"a":1,"a":2,"b":3}');
+------------------------------------+
| json_length('{"a":1,"a":2,"b":3}') |
+------------------------------------+
|                                  3 |
+------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> select @@version;
+----------------------+
| @@version            |
+----------------------+
| 10.2.6-MariaDB-debug |
+----------------------+
1 row in set (0.00 sec)

MySQL

MySQL [test]> select json_length('{"a":1,"a":2,"b":3}');
+------------------------------------+
| json_length('{"a":1,"a":2,"b":3}') |
+------------------------------------+
|                                  2 |
+------------------------------------+
1 row in set (0.01 sec)
 
MySQL [test]> select @@version;
+--------------+
| @@version    |
+--------------+
| 5.7.17-debug |
+--------------+
1 row in set (0.00 sec)

Comment by michiv [ 2017-06-22 ]

That's not a bug.
Mariadb made a good decision to allow duplicate keys, that's natural fact in daily life.
For example, students last name in a class, maybe have 1+ students named "John".
As for this good feature, lot's of user prefer to Mariadb instead of Mysql.

Comment by Anders Karlsson [ 2017-06-22 ]

michiv: In your example, John is not a key but a value. Multiple values are fine. For example, if we have a class of students, this is valid and is not in contradiction to what I am looking for here:

{"classname": "JSON with MariaDB",
  "students": [{"firstname": "John", "Lastname": "Doe"},{"Firstname": "John", "Lastname": "Foo"}]
}

The following is the situation I am looking at:

{"classname": "JSON with MariaDB",
  "students": {"name": "John Doe", "name": "John Foo"}
}

In the last example, which on the two "name" attributes, when I ask for it, do we want? The first, the second or both.

Comment by Alexey Botchkov [ 2017-08-08 ]

http://lists.askmonty.org/pipermail/commits/2017-August/011360.html

Generated at Thu Feb 08 08:00:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.