Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17399

Add support for JSON_TABLE

    XMLWordPrintable

Details

    • Task
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.6.0
    • JSON
    • None

    Description

      MySQL 8.0 has JSON_TABLE. These is also available in Oracle, DB2. (PostgreSQL has a patch under development but it seems it is not in the 12th or 13th release)

      1. Syntax
      1.1 Standard compliance
       
      2. JSON_TABLE as table function
      3. Implementing a table function
      3.1 DDL considerations
      3.2 Producing the rows
      3.3 Interfacing with the optimizer
      3.3.1 Estimates 
      3.3.2 JSON_TABLE must be after its dependencies
      3.3.3 Can't use Join buffering 
      5. Links
      

      1. Syntax

      SQL Standard defines a lot of features for JSON_TABLE. This task will implement a practically-important subset (roughly the same what MySQL-8 supports)

      The syntax to be implemented is:

        JSON_TABLE(
          json_text,
          path COLUMNS(columns_list) 
        ) [AS] alias
      

      columns_list:  
        column [, column[, ... ]]
      

      column:
          name FOR ORDINALITY 
        | name type PATH path_str [on_empty_on_error]
        | name type EXISTS PATH path_str [on_empty_on_error]
        | NESTED PATH path COLUMNS (columns_list)
      

      on_empty_on_error:
        [ behavior ON EMPTY ] [ behavior ON ERROR ]
      

      behavior:
         ERROR | NULL | DEFAULT <value_expression>
      

      Here, JSON_TABLE(...) is a table function and is allowed wherever a table reference is allowed, except for the context which imply that the referred table is to be modified.

      1.1 Standard compliance

      The above is a subset of the syntax allowed by the SQL Standard, with exception of name type EXISTS PATH path_str which is a non-standard extension in (Oracle database and Oracle MySQL?)

      2. JSON_TABLE as table function

      JSON_TABLE is the first table function - a table whose contents depends on its arguments.

      The arguments may be non-constant (e.g. columns of other tables), which causes the JSON_TABLE table to have LATERAL-like dependencies on its arguments.

      3. Implementing a table function

      3.1 DDL considerations

      A temporary table [definition] will be created with appropriate set of columns with appropriate attributes. This will allow for references to the columns of JSON_TABLE(...) to be resolved using the regular name resolution process.

      TODO: At which point should this temporary table be created? It is apparent that this should happen before any Item::fix_fields() calls for the ON/WHERE/etc clauses. It should also happen before Item::fix_fields() calls made for the first parameter of any JSON_TABLE(...).

      3.1.2 Name resolution for JSON_TABLE argument

      The name resolution context should be ... "the same as for the ON expression" ?

      3.2 Producing the rows

      In order to produce rows, a Storage Engine will be implemented, ha_json_table. That is, the temporary table will use the ha_json_table engine.
      (This could be generalized to a generic ha_table_function Storage Engine for table function, but it was decided not to do that until we have other examples of table functions).

      The engine will only support full table scans (rnd_init/rnd_next). rnd_init() call will implicitly evaluate the JSON_TABLE's parameters and set the storage engine to produce the rows that come from the parameters.

      3.3 Interfacing with the optimizer

      3.3.1 Estimates

      The optimizer will need to be aware that the table representing the output of JSON_TABLE(...) does not have any contents during the optimization phase. We will still need to provide some numbers for expected #rows and read_time. JSON documents are typically small, so a hard-coded constant describing a reasonably-sized JSON document would be sufficient.

      3.3.2 JSON_TABLE must be after its dependencies

      Second, the join optimizer will need to be aware that
      JSON_TABLE(tbl1.column_x, ... ) can only be accessed when the current row for table tbl1 is available.
      This will be done as follows:
      Table dependencies will be set such that JSON_TABLE(...) depends on tbl1.

      3.3.3 Can't use Join buffering

      Join buffering will be disabled for table JSON_TABLE(...).

      4. Producing JSON_TABLE contents

      4.1 FOR ORDINALITY columns

      The standard says:

      An ordinality column provides a sequential numbering of rows. Row numbering is 1-based.

      The numbering seems to be "global", that is, it doesn't restart across table re-reads.

      4.2 NESTED contents

      The code has a "primary" JSON parser which scans through the JSON document and locates nodes that represent the rows to be produced by JSON_TABLE.
      In order to locate contents of the NESTED construct, we create another instance of JSON parser, point it at the current node, and then let it locate the pointed by the path. It's obvious that this can handle as many NESTED constructs as needed, and the "primary" parser state is not affected by them.

      5. Links

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              diego dupin Diego Dupin
              Votes:
              31 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.