MySQL 8.0 has JSON_TABLE. These is also available in Oracle, DB2. (PostgreSQL has a patch under development but it seems it is not in the 12th or 13th release)
SQL Standard defines a lot of features for JSON_TABLE. This task will implement a practically-important subset (roughly the same what MySQL-8 supports)
The syntax to be implemented is:
Here, JSON_TABLE(...) is a table function and is allowed wherever a table reference is allowed, except for the context which imply that the referred table is to be modified.
The above is a subset of the syntax allowed by the SQL Standard, with exception of name type EXISTS PATH path_str which is a non-standard extension in (Oracle database and Oracle MySQL?)
JSON_TABLE is the first table function - a table whose contents depends on its arguments.
The arguments may be non-constant (e.g. columns of other tables), which causes the JSON_TABLE table to have LATERAL-like dependencies on its arguments.
A temporary table [definition] will be created with appropriate set of columns with appropriate attributes. This will allow for references to the columns of JSON_TABLE(...) to be resolved using the regular name resolution process.
TODO: At which point should this temporary table be created? It is apparent that this should happen before any Item::fix_fields() calls for the ON/WHERE/etc clauses. It should also happen before Item::fix_fields() calls made for the first parameter of any JSON_TABLE(...).
The name resolution context should be ... "the same as for the ON expression" ?
In order to produce rows, a Storage Engine will be implemented, ha_json_table. That is, the temporary table will use the ha_json_table engine.
(This could be generalized to a generic ha_table_function Storage Engine for table function, but it was decided not to do that until we have other examples of table functions).
The engine will only support full table scans (rnd_init/rnd_next). rnd_init() call will implicitly evaluate the JSON_TABLE's parameters and set the storage engine to produce the rows that come from the parameters.
The optimizer will need to be aware that the table representing the output of JSON_TABLE(...) does not have any contents during the optimization phase. We will still need to provide some numbers for expected #rows and read_time. JSON documents are typically small, so a hard-coded constant describing a reasonably-sized JSON document would be sufficient.
Second, the join optimizer will need to be aware that
JSON_TABLE(tbl1.column_x, ... ) can only be accessed when the current row for table tbl1 is available.
This will be done as follows:
Table dependencies will be set such that JSON_TABLE(...) depends on tbl1.
Join buffering will be disabled for table JSON_TABLE(...).
The standard says:
An ordinality column provides a sequential numbering of rows. Row numbering is 1-based.
The numbering seems to be "global", that is, it doesn't restart across table re-reads.
The code has a "primary" JSON parser which scans through the JSON document and locates nodes that represent the rows to be produced by JSON_TABLE.
In order to locate contents of the NESTED construct, we create another instance of JSON parser, point it at the current node, and then let it locate the pointed by the path. It's obvious that this can handle as many NESTED constructs as needed, and the "primary" parser state is not affected by them.
- JSON_TABLE in MySQL-8: https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
- SQL Standard: This page https://modern-sql.com/blog/2017-06/whats-new-in-sql-2016
has a link to http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip
which in section 5.3.4 has a specification for JSON_TABLE function.
- SQL:2016, as well as SQL Standard drafts from as early as 2014-11-10 cover JSON_TABLE.
- PostgreSQL were working on this, see e.g. https://pgconf.ru/media/2019/02/20/postgresql-12-pgconfru-2019-v2.pdf , but currently they don't seem to have this.