[MDEV-16443] JSON can not be implemented as a alias for longtext Created: 2018-06-08  Updated: 2021-08-14  Resolved: 2021-08-14

Status: Closed
Project: MariaDB Server
Component/s: Data types, JSON
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Dave Pullin Assignee: Alexey Botchkov
Resolution: Fixed Votes: 0
Labels: json

Issue Links:
Relates
relates to MDEV-17832 Protocol: extensions for Pluggable ty... Closed
relates to MDEV-4912 Data type plugin API version 1 Closed
Epic Link: New data types

 Description   

JSON support.

1. json_object('name', json_column )
2. json_object('name',longtext_column)

These two SHOULD give different results, because the longtext_column is a string so it gets quoted and quotes inside it get escaped, whereas the json_column is SEMANTICALLY a not necessarily a string and should not get quoted when it is an json object or json array.

Mariadb's support for JSON uses 'json' as a alias for longtext and therefore produces the same result for 1 & 2.

I know you are going to close this as a duplicate! BUT every discussion I have started or seen with MariaDB folk gets the answer "Our tests show that there is no performance difference between storing json internally as binary or a string", which has NOTHING to do with the problem. [I'd prefer you remain file-compatible with MySQL, but the internal representation is INTERNAL]. The problem here is that the JSON SEMANTICS.

Isn't it possible for you to implement the semantics correctly? (independently of your decision on internal representation). It would require functions like json_object to be sensitive to whether a column or expression is typed "json".

(I have a ton of procedures and functions that use JSON on MySQL5.7 and a ton of servers that I'd like to get to MariaDB), but I cant because if this problem. You have so much more good stuff I want to be able to use!)



 Comments   
Comment by Sergei Golubchik [ 2018-06-08 ]

You are right, "no performance difference" has nothing to do with this issue. And we never said that we didn't create JSON type because there was "no performance difference". The internal representation is plain text because there was no performance difference, but I agree, internal representation is internal.

The reason why we don't have a true JSON data type is because SQL Standard explicitly forbids it. In the Part 2 "Foundation" of SQL:2016, in the Section 4.46 "JSON data handling in SQL" it says unambiguously:

NOTE 116 — There is no SQL <data type> whose value space is SQL/JSON items or SQL/JSON sequences.

Note that Oracle, DB2, and SQL Server follow the standard here. For example, here's the manual for the JSON_QUERY function:
Oracle:

For expr, specify an expression that evaluates to a text literal. If expr is a column, then the column must be of data type VARCHAR2, CLOB, or BLOB. If expr is not a text literal of well-formed JSON data using strict or lax syntax, then the function returns null by default.

DB2:

An expression that returns a value that is a built-in string data type. If a character or graphic value is returned, it must contain correctly formatted JSON data.

SQLServer:

An expression. Typically the name of a variable or a column that contains JSON text. If JSON_QUERY finds JSON that is not valid in expression before it finds the value identified by path, the function returns an error.

SQL Server manual doesn't say it must be a text literal, but, like others, it says what happens when the expression is not a valid JSON, which implies it's a text literal, and not a specialized type


So, we've made a conscious decision to follow the SQL Standard (like Oracle, DB2, SQL Server, and others).

But at the moment not everything from the SQL Standard is implemented yet. The Standard specifies that one can optionally specify FORMAT JSON after the value, to specify that it is JSON, not an arbitrary text. In your example it would've been json_object('name',json_column FORMAT JSON) vs json_object('name', longtext_column).

We did not implement it yet, but exactly the same functionality can be achieved with our JSON formatting functions. Compare:

MariaDB [test]> select json_object('foo', '{"bar": 5}') as_text;
+-------------------------+
| as_text                 |
+-------------------------+
| {"foo": "{\"bar\": 5}"} |
+-------------------------+
1 row in set (0.001 sec)
 
MariaDB [test]> select json_object('foo',json_compact('{"bar": 5}')) as_json;
+---------------------+
| as_json             |
+---------------------+
| {"foo": {"bar": 5}} |
+---------------------+
1 row in set (0.001 sec)

Comment by Dave Pullin [ 2018-06-09 ]

Thanks for the explanation. [I might argue that the standard is dumb and/or the mariadb needs to upwards compatible with mysql, but that's not getting me anywhere!]

This means that everywhere a column/expression is, or might be, json it must be written by json_compact( expr ); [which is tedious at best, and wrong for some expr types, eg strings that are not json.]

It also means the return type of json_compact is unique ... it is not a string. It is a string that json_object knows not to quote. (it's a JSON datatype! A string with special treatments).

So how do I create a function that returns the datatype that json_compact returns?

>select json_object("z",json_compact("string")) as result\G
result:

{"z": string}

> select json_object("z","string") as result\G
result:

{"z": "string"}

> select json_object("z",json_compact("{}")) as result\G
result: {"z": {}}

> select json_object("z","{}") as result\G
result: {"z": "{}"}

> create or replace function json_test() returns json return json_object("thisa","atest");
> create or replace function json_testc() returns json return json_compact(json_object("thisa","atest"));

>select json_object("x",json_testc()) as result\G
result: {"x": "

{\"thisa\":\"atest\"}

"}

> select json_object("x",json_test()) as result\G
result: {"x": "

{\"thisa\": \"atest\"}

"}

Comment by Sergei Golubchik [ 2018-06-10 ]

I personally think that was not the smartest part of the standard. Or the standard committee had some good reasons that I don't know about, and I'd like to know them...

Anyway, I cannot run any tests on the standard, obviously, so I tried that in Oracle instead (Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production Version 18.3.0.0.0). It seems that in Oracle JSON_OBJECT also returns something special, which is not a string:

select json_object('x' is '{"foo" : "bar"}') from dual;
→ {"x" : "{"foo" : "bar"}"}
select json_object('x' is json_object('foo' is 'bar')) from dual;
→ {"x" : {"foo" : "bar"}}
select json_object('x' is '[' || json_object('foo' is 'bar') || ']') from dual
→ {"x" : "[{"foo":"bar"}]"}

See how the string in the first statement is used as a string inside JSON object (with incorrect quoting even!), but a json object in the second statement is used as a json object. And using string concatenation on it makes it a normal string again.

And a stored function doesn't return anything special in Oracle:

create function foo return varchar as
  a varchar2(100);
begin
  select json_object('foo' is 'bar') into a from dual;
  return a;
end;
 
select json_object('x' is foo) from dual;
→ {"x" : "{"foo":"bar"}"}

Of course, saying json_object('x' is foo format json) works as expected, but so far I didn't find how to return it from a stored function.

Still looking...

Comment by Dave Pullin [ 2018-06-10 ]

It would seem to a be a legitimate extension to the standard to implement "format json" as a character column/expression qualifier

create table .. ( my_json longtext format json )

create function x() returns longtext format json ...

and then to retroactively redefine 'json' as a [non standard] alias for "longtext format json"

Comment by Sergei Golubchik [ 2018-07-09 ]

Yes, it's an interesting idea. It practically creates a new type, but it kind of doesn't create a new type... Nice.

Comment by Sergei Golubchik [ 2021-08-14 ]

I think this was meanwhile implemented:

MariaDB [test]> create table t1 (a text, b json);
Query OK, 0 rows affected (0.025 sec)
 
MariaDB [test]> insert t1 values ('{"a":1}', '{"a":1}');
Query OK, 1 row affected (0.003 sec)
 
MariaDB [test]> select * from t1;
+---------+---------+
| a       | b       |
+---------+---------+
| {"a":1} | {"a":1} |
+---------+---------+
1 row in set (0.001 sec)
 
MariaDB [test]> select json_object('name', a), json_object('name', b) from t1;
+------------------------+------------------------+
| json_object('name', a) | json_object('name', b) |
+------------------------+------------------------+
| {"name": "{\"a\":1}"}  | {"name": {"a":1}}      |
+------------------------+------------------------+
1 row in set (0.002 sec)

Generated at Thu Feb 08 08:28:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.