[MDEV-12805] inconsistency in NULL handling, JSON type, CONNECT engine Created: 2017-05-15 Updated: 2017-07-23 Resolved: 2017-07-23 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - Connect |
| Affects Version/s: | 10.1.23 |
| Fix Version/s: | 10.1.26, 10.0.32, 10.2.8 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alice Sherepa | Assignee: | Olivier Bertrand |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
ubuntu 16.04 |
||
| Description |
|
Inconsistency in NULL handling JSON file:
Null value is substituted by string "???"
When adding 'level=1' in option_list:
When using [" and "] Null is substituted by "". If null is in last position, then it is not represented
|
| Comments |
| Comment by Olivier Bertrand [ 2017-05-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
All this work as designed. The first case is the "text" of the object meaning all key values separated by blanks. The problem is how to represent a null value? NULL would seem nice but prevents to make the difference between a null value and a "NULL" string. "???" is perhaps a bad choice. Suggestions welcome. The second example is OK. As for the first one, only the first object of the k array is displayed. The third one is a "calculated array". Null values are replaced by CONNECT by "pseudo" null values (void string or 0 for numeric types) This is clearly necessary for numeric calculations such as SUM. Here again, how to represent nulls when concatenating strings? My feeling is that these formats are not meant for objects that can contain nulls anyway. For instance, to have a complete representation of this file, your table could be created as:
Then the query:
will return:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-05-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I understand why it was done like that. Sql operations with null result as null, so concat will be null and it is not, what we want her. Also about sum function, if there was null as result, we get null or 0 (if not null field), it is confusing with 0, maybe there should be warning about null value. And also I wondered why in the t3 table the result is not like " and sugar and ", that is why I did not expanded that field) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2017-05-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If you prefer "---" to "???" I'll do it. No problem. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-05-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Maybe just <null>? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2017-05-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
<null> is Ok. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-05-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sorry, I didn't mean "---", it was "---" some string, that user will decide, how he wants to see null appeared. And maybe to set default to <null> | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2017-05-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is an interresting topic. One question is: what are null values in a json document? For CONNECT, the problem it not when a table column directly access a value, which is represented by NULL or <null> by client programs, but when representing object "text" or concatenate or calculated array values. Both are specific to CONNECT, there are no such things in the standard JSON handling by other products. For example, supposing we have the following pres.json file:
One user can wish to have a simple view on it creating a table such as:
However, when executing:
The result currently obtained is:
Even replacing "???" by "<null>" will not be satisfactory for him and the pet count is clearly wrong. The proper answer would be something such as:
This means that in these cases, the best handling of NULL values is probably to consider that they are equivalent to no value at all (like for Ronald Reagan). If someone really wants to see NULL values, this is probably that he is more interrested by the file contain than by its values, then he can use the "*" format to directly see the JSON representation of it. What do you think of it? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alice Sherepa [ 2017-05-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
https://en.wikipedia.org/wiki/Null_(SQL)#Null_propagation so the real question is why input file has NULL values there. Maybe it was some kind of corruption? Or maybe we know that there was some pet, just it is unknown which exactly? Or maybe he had an elephant, which is formally not a pet - but it worth mentioning? Or maybe we were not able to find any information about that, but this work is expected to be performed soon? RDBMS must not guess what exactly NULL may mean. Interpretation of NULL is up to users of RDBMS, but RDBMS must follow standard way of handling NULL values. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2017-05-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I agree that the pres.json file is sort of artificial. But nulls could be the result of DELETE that could not be done and replaced by UPDATE to null. But it also shows that it would be much better if JSON could not contain NULLs. Now, according to your first site, NULL is somewhat ambiguous. After saying that nulls indicate missing values, the article describes what to do with nulls as having unknown values. What MariaDB does with nulls is:
Applied to my example, if CONNECT apply this, the result would be:
This position seems to me rather dogmatic. However, I could add an option enabling the user to specify how he wants nulls to me handled. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-05-25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In regard to the example from Olivier's comment : as a regular user who doesn't care about internals and standards, I would actually expect the output from that example to look like this:
(or whatever visible representation of NULL we use instead of ???, it's not the point). That is, I agree with Alice in the part that the presence of NULL both in "second" field and in "pets" array most likely indicates that something went wrong. For the second name, if a person doesn't have it at all, it should probably be absent, like it is for Ronald Reagan; while NULL means that it's either unknown or failed to be stored, and should be taken care of. For pets it's even more obvious, there should be no NULL or empty values, they are meaningless; if they're there, something is definitely wrong. At the same time (again, as a user who doesn't care as much about standards as about common sense), I agree with Olivier that producing NULL as a result string for all those fileds wouldn't do me any good at all, it's investigation nightmare, even though it's a standard. Luckily, I think for now Olivier has an excuse for not following, because we can say that it is not really functions, it is just some internal magic of CONNECT JSON, so Olivier has some leeway here to decide how it works "by design". All in all, I suggest to leave it for now, and maybe revisit it later when we have complaints or opinions from real users – after all, that's what our community testing is for. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2017-07-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have added a new string session variable connect_json_null that can be set to the representation of nulls in object text or array list. Its default value is <null>. It can also be set to NULL, indicating to ignore nulls in these case. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2017-07-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Concerning your example, it will look like:
With the default value of connect_json_null and as:
when ignoring nulls. Note that the nb column will still count null values. However, when making calculation on arrays or when retrieving min or max values, nulls will be always ignored. |