[MDEV-17399] Add support for JSON_TABLE Created: 2018-10-08 Updated: 2023-07-04 Resolved: 2021-04-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | JSON |
| Fix Version/s: | 10.6.0 |
| Type: | Task | Priority: | Critical |
| Reporter: | Diego Dupin | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 31 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Epic Link: | Implement JSON functions available in MySQL 8.0 but not MariaDB | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
MySQL 8.0 has JSON_TABLE. These is also available in Oracle, DB2. (PostgreSQL has a patch under development but it seems it is not in the 12th or 13th release)
1. SyntaxSQL Standard defines a lot of features for JSON_TABLE. This task will implement a practically-important subset (roughly the same what MySQL-8 supports) The syntax to be implemented is:
Here, JSON_TABLE(...) is a table function and is allowed wherever a table reference is allowed, except for the context which imply that the referred table is to be modified. 1.1 Standard complianceThe above is a subset of the syntax allowed by the SQL Standard, with exception of name type EXISTS PATH path_str which is a non-standard extension in (Oracle database and Oracle MySQL?) 2. JSON_TABLE as table functionJSON_TABLE is the first table function - a table whose contents depends on its arguments. The arguments may be non-constant (e.g. columns of other tables), which causes the JSON_TABLE table to have LATERAL-like dependencies on its arguments. 3. Implementing a table function3.1 DDL considerationsA temporary table [definition] will be created with appropriate set of columns with appropriate attributes. This will allow for references to the columns of JSON_TABLE(...) to be resolved using the regular name resolution process. TODO: At which point should this temporary table be created? It is apparent that this should happen before any Item::fix_fields() calls for the ON/WHERE/etc clauses. It should also happen before Item::fix_fields() calls made for the first parameter of any JSON_TABLE(...). 3.1.2 Name resolution for JSON_TABLE argumentThe name resolution context should be ... "the same as for the ON expression" ? 3.2 Producing the rowsIn order to produce rows, a Storage Engine will be implemented, ha_json_table. That is, the temporary table will use the ha_json_table engine. The engine will only support full table scans (rnd_init/rnd_next). rnd_init() call will implicitly evaluate the JSON_TABLE's parameters and set the storage engine to produce the rows that come from the parameters. 3.3 Interfacing with the optimizer3.3.1 EstimatesThe optimizer will need to be aware that the table representing the output of JSON_TABLE(...) does not have any contents during the optimization phase. We will still need to provide some numbers for expected #rows and read_time. JSON documents are typically small, so a hard-coded constant describing a reasonably-sized JSON document would be sufficient. 3.3.2 JSON_TABLE must be after its dependenciesSecond, the join optimizer will need to be aware that 3.3.3 Can't use Join bufferingJoin buffering will be disabled for table JSON_TABLE(...). 4. Producing JSON_TABLE contents4.1 FOR ORDINALITY columnsThe standard says:
The numbering seems to be "global", that is, it doesn't restart across table re-reads. 4.2 NESTED contentsThe code has a "primary" JSON parser which scans through the JSON document and locates nodes that represent the rows to be produced by JSON_TABLE. 5. Links
|
| Comments |
| Comment by Francisco Dueñas [ 2019-05-15 ] | ||||||||||||||||||||||||||||||||||
|
This function will be great to implement | ||||||||||||||||||||||||||||||||||
| Comment by Alexey Botchkov [ 2019-12-09 ] | ||||||||||||||||||||||||||||||||||
|
The proper implementation for this task requires adding the new useful thing to MariaDB - table functions. So we'll be able to have functional expressions after the FROM clause. | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-12-11 ] | ||||||||||||||||||||||||||||||||||
|
Take-aways from discussion: A practically-important use case is when JSON_TABLE has arguments that depend on
This puts some restrictions on the query plans:
The property also affects the SE API:
| ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-12-11 ] | ||||||||||||||||||||||||||||||||||
|
On limitations: | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2019-12-11 ] | ||||||||||||||||||||||||||||||||||
|
Another observation: MySQL seems to A) use a temporary table and B) support "ref access" on it:
I'm not sure how big is the benefit from ref access here (they have to parse the whole JSON text anyway... after that, they can determine that a given row is not a match faster... but still they'll need to read all of the JSON data?) | ||||||||||||||||||||||||||||||||||
| Comment by Alexey Botchkov [ 2020-03-29 ] | ||||||||||||||||||||||||||||||||||
|
https://github.com/MariaDB/server/commit/654fdfee33e3eafe3b7f25d7e213717c22ea1e18 | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-04-03 ] | ||||||||||||||||||||||||||||||||||
|
The patch implements this syntax:
#note "AS alias" while in other dbs it's just 'alias'?
Type is a type definition like in the table DDL.
| ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-04-03 ] | ||||||||||||||||||||||||||||||||||
|
Review for the patch: https://lists.launchpad.net/maria-developers/msg12162.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-04-12 ] | ||||||||||||||||||||||||||||||||||
|
Second part of the review: https://lists.launchpad.net/maria-developers/msg12172.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-05-11 ] | ||||||||||||||||||||||||||||||||||
|
Review part #4: https://lists.launchpad.net/maria-developers/msg12225.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-05-11 ] | ||||||||||||||||||||||||||||||||||
|
Notes to self (for next review iteration)
| ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-05-26 ] | ||||||||||||||||||||||||||||||||||
|
Interesting, this query
produces an error in MySQL 8:
But runs successfully in Oracle 18c: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=af4084b9b8abc04281bc5effa6d10249 . It runs in MariaDB, too. | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-05-26 ] | ||||||||||||||||||||||||||||||||||
|
Another observation is that Oracle allows DEFAULT's value-expression to be, for example, integer:
while MySQL 8 does not. I'm reading the part of the standard this MDEV links to, and it has this:
but I don't see anything about what type <value expression> should have. Interesting that NULL is mentioned separately, this means <value expression> cannot be NULL? | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-05-26 ] | ||||||||||||||||||||||||||||||||||
|
there's this:
So,
| ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-05-29 ] | ||||||||||||||||||||||||||||||||||
|
Review input part #5 : https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg11827.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-06-22 ] | ||||||||||||||||||||||||||||||||||
|
Review input part #6: https://lists.launchpad.net/maria-developers/msg12295.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-06-22 ] | ||||||||||||||||||||||||||||||||||
|
Also, a question: https://lists.launchpad.net/maria-developers/msg12296.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-07-30 ] | ||||||||||||||||||||||||||||||||||
|
Review input part #7: https://lists.launchpad.net/maria-developers/msg12334.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-08-03 ] | ||||||||||||||||||||||||||||||||||
|
Review input part #9: https://lists.launchpad.net/maria-developers/msg12344.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-08-05 ] | ||||||||||||||||||||||||||||||||||
|
Followup to optimizer call on Tuesday: SQL Standard, section 7.6 <table reference>:
Section 7.11 <JSON table>:
That is, JSON_TABLE and TABLE (...) are two different productions that have nothing in common. | ||||||||||||||||||||||||||||||||||
| Comment by Robert Dyas [ 2020-09-22 ] | ||||||||||||||||||||||||||||||||||
|
I would really like to see this implemented with json_text being able to be an S3 bucket reference as follows:
This would make it super easy to stage lots of large json log files in an S3 bucket and then injest them into column store with a command like CREATE TABLE blah AS SELECT * FROM JSON_TABLE('s3://my-bucket/path', ....). | ||||||||||||||||||||||||||||||||||
| Comment by Robert Dyas [ 2020-09-22 ] | ||||||||||||||||||||||||||||||||||
|
Also, while we are at it, it would be very nice to have a CSV_TABLE function with similar syntax such that CREATE TABLE t1 AS SELECT * FROM CSV_TABLE('s3://bucket/mycsvfile.csv', ...) With the path COLUMNS(columns_list) argument taking a list of columns in the CSV file. Great way of injecting large CSV files from S3 into column store. | ||||||||||||||||||||||||||||||||||
| Comment by Francisco Dueñas [ 2020-09-23 ] | ||||||||||||||||||||||||||||||||||
|
The functionality to get data from a CSV file already exists using CONNECT engine. Check the CONNECT engine documentation for more info. | ||||||||||||||||||||||||||||||||||
| Comment by Robert Dyas [ 2020-09-23 ] | ||||||||||||||||||||||||||||||||||
|
from an s3 bucket efficiently ? lots of experience with CONNECT engine and On Tue, Sep 22, 2020, 10:19 PM Francisco Dueñas (Jira) <jira@mariadb.org> | ||||||||||||||||||||||||||||||||||
| Comment by Alexey Botchkov [ 2020-09-27 ] | ||||||||||||||||||||||||||||||||||
|
https://github.com/MariaDB/server/commit/fbcd1908ebb9278e2a073341c18f4e896fda2e1d | ||||||||||||||||||||||||||||||||||
| Comment by Justin Swanhart [ 2020-10-29 ] | ||||||||||||||||||||||||||||||||||
|
If table functions are going to be added, making the interface generic such that UDF table functions and/or using stored routines as table functions (since they can return resultsets) would be really nice. Perhaps that should be a prequisite MDEV for this feature? | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-15 ] | ||||||||||||||||||||||||||||||||||
|
greenlion, one problem with developing generic interfaces is answering the question what should be supported. Taking Stored Routines as an example - we won't be able have an "opaque" table function that calls SPs in it, due to the process called "prelocking". The big decision that this MDEV code has made is that the table function also uses a table handler to provide rows. This allows for interesting opportunities like pushing conditions down into table functions, and/or table functions' tables providing indexes. Compare that to MySQL's implementation - they just fill a temporary table and then read from there (an example stack trace : https://gist.github.com/spetrunia/1c38564a7ead92562fd984cbf48c0655) . OTOH, using a temporary may make some implementation simpler - one doesn't need to support ::position or ::rnd_pos.) | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-15 ] | ||||||||||||||||||||||||||||||||||
|
On some previous optimizer call, I was asked if MySQL's way of making Table Function's columns available for Name Resolution is any different from MariaDB's. Did some debugging and the answer is NO. Both use the same approach: create a temporary table (incl. a TABLE object), and then the TABLE_LIST representing the table function has table_list->table pointing to the created table, which allows Name Resolution functions to work as usual. A Table Function call is just another base table for it. | ||||||||||||||||||||||||||||||||||
| Comment by Justin Swanhart [ 2020-11-15 ] | ||||||||||||||||||||||||||||||||||
|
@sergeyp This is just a suggestion. I think a stored proc interface could be created like:
Then for select * from TABLE_FUNC(x,y,z) , some_table the query execution engine does:
Just a thought, but that seems like a reasonably flexible interface that will support arbitrary table functions.}} as for "prelocking" any tables used in the init func, clear func, or the func itself, as well as the temporary table being used, can be locked just like normal stored functions. all the table names are known at parse time in this case... I think that satisfies the requirements for prelocking. Please correct me if I'm wrong. | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-16 ] | ||||||||||||||||||||||||||||||||||
|
holyfoot, review input: https://lists.launchpad.net/maria-developers/msg12465.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-18 ] | ||||||||||||||||||||||||||||||||||
|
More: https://lists.launchpad.net/maria-developers/msg12467.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-18 ] | ||||||||||||||||||||||||||||||||||
|
Interesting, MySQL has this limitation: https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
This frees them from the need to detect circular dependencies:
I cannot find any mention of such limitation in the SQL Standard, though. AFAIU, something like this could be supported:
but it is not. | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-18 ] | ||||||||||||||||||||||||||||||||||
|
Note: name resolution does not seem to be "like in an ON expression".
referring to t10.a as an argument to JSON_TABLE is accepted, but if one un-comments t10.a is not null, that produces an error. | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-01-15 ] | ||||||||||||||||||||||||||||||||||
|
... well, Oracle 18C is a bad example, as it also allows cases that we didn't want to allow:
| ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-02-06 ] | ||||||||||||||||||||||||||||||||||
|
Review input for the last version of the patch: https://lists.launchpad.net/maria-developers/msg12521.html | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-02-08 ] | ||||||||||||||||||||||||||||||||||
|
As for name resolution, the closest thing the SQL standard has to say is section 7.6 <table reference>,
That is, Table Reference is visible in every <lateral derived table> that follows it. Note that the above list does NOT include JSON_TABLE:
| ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-02-09 ] | ||||||||||||||||||||||||||||||||||
|
More review input: https://lists.launchpad.net/maria-developers/msg12531.html I don't have any other input for now, need to take another look after things reported so far have been addressed. | ||||||||||||||||||||||||||||||||||
| Comment by Alexey Botchkov [ 2021-02-14 ] | ||||||||||||||||||||||||||||||||||
|
https://github.com/MariaDB/server/commit/4d2f5a23de4304d756263d2f170889c586a48680 | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-02-22 ] | ||||||||||||||||||||||||||||||||||
|
The latest patch was: https://github.com/MariaDB/server/commit/85757ecbef44484270e385a8d52998c67f72d11a | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2021-03-15 ] | ||||||||||||||||||||||||||||||||||
|
Pushed the fix for the last issue, and another code cleanup fix. The result is at: https://github.com/mariadb/server/tree/bb-10.6-mdev17399-psergey2 | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2021-04-19 ] | ||||||||||||||||||||||||||||||||||
|
The last round of tests on https://github.com/MariaDB/server/commit/c3cb41111c4bf27c5b0f378a4c5283956d34ed52 didn't reveal any significant issues related to the scope of the task. In my opinion it can be released with a beta-version of the server. | ||||||||||||||||||||||||||||||||||
| Comment by Justin Swanhart [ 2021-04-27 ] | ||||||||||||||||||||||||||||||||||
|
This is a great feature. Will make a lot of people happy. On Mon, Apr 26, 2021, 12:57 PM Sergei Petrunia (Jira) <jira@mariadb.org> |