Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30623

JSON_TABLE in subquery not correctly marked as correlated

Details

    Description

      CREATE TEMPORARY TABLE t1 (t1_id int, t1_json text);
      INSERT t1 VALUES (1, '[[10,20]]'), (2, '[[100,200]]'), (3, '[[1000,2000]]');
       
      SELECT
      	t1_id,
      	t2_json,
      	(SELECT SUM(x2) 
      	FROM 
      	  JSON_TABLE(t2_json, '$[*]' COLUMNS ( x2 int PATH '$' )) t3
      	) t2_json_sum
      FROM 
        t1
        CROSS JOIN 
        JSON_TABLE(t1_json, '$[*]' COLUMNS (t2_json json PATH '$' )) t2;
      

      Gives the result:

      +-------+-------------+-------------+
      | t1_id | t2_json     | t2_json_sum |
      +-------+-------------+-------------+
      |     1 | [10,20]     |          30 |
      |     2 | [100,200]   |          30 |
      |     3 | [1000,2000] |          30 |
      +-------+-------------+-------------+
      
      

      In the subquery the t2_json column from first row is used on every row.

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment -

            Thank you for the report!
            I repeated on 10.6-10.11:

            MariaDB [test]> create table t1(x json);
            Query OK, 0 rows affected (0,039 sec)
             
            MariaDB [test]> insert into t1 values ('[{"x":"1"},{"x":"2"}]'),('[{"x":"10"},{"x":"20"}]'),('[{"x":"100"},{"x":"200"}]');
            Query OK, 3 rows affected (0,003 sec)
            Records: 3  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> select *, (SELECT sum(x) FROM json_table(x, '$[*]' columns( x int path '$.x')) AS jt) from t1;
            +---------------------------+-----------------------------------------------------------------------------+
            | x                         | (SELECT sum(x) FROM json_table(x, '$[*]' columns( x int path '$.x')) AS jt) |
            +---------------------------+-----------------------------------------------------------------------------+
            | [{"x":"1"},{"x":"2"}]     |                                                                           3 |
            | [{"x":"10"},{"x":"20"}]   |                                                                           3 |
            | [{"x":"100"},{"x":"200"}] |                                                                           3 |
            +---------------------------+-----------------------------------------------------------------------------+
            3 rows in set (0,000 sec)
            MariaDB [test]> explain extended select *, (SELECT sum(x) FROM json_table(x, '$[*]' columns( x int path '$.x')) AS jt) from t1;
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                      |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------+
            |    1 | PRIMARY     | t1    | ALL  | NULL          | NULL | NULL    | NULL | 3    |   100.00 |                            |
            |    2 | SUBQUERY    | jt    | ALL  | NULL          | NULL | NULL    | NULL | 40   |   100.00 | Table function: json_table |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------+
            2 rows in set, 2 warnings (0,000 sec)
             
            Note (Code 1276): Field or reference 'test.t1.x' of SELECT #2 was resolved in SELECT #1
            Note (Code 1003): /* select#1 */ select `test`.`t1`.`x` AS `x`,(/* select#2 */ select sum(`jt`.`x`) from JSON_TABLE(`test`.`t1`.`x`, '$[*]' COLUMNS (`x` int(11) PATH '$.x')) `jt`) AS `(SELECT sum(x) FROM json_table(x, '$[*]' columns( x int path '$.x')) AS jt)` from `test`.`t1`
            
            

            alice Alice Sherepa added a comment - Thank you for the report! I repeated on 10.6-10.11: MariaDB [test]> create table t1(x json); Query OK, 0 rows affected (0,039 sec)   MariaDB [test]> insert into t1 values ('[{"x":"1"},{"x":"2"}]'),('[{"x":"10"},{"x":"20"}]'),('[{"x":"100"},{"x":"200"}]'); Query OK, 3 rows affected (0,003 sec) Records: 3 Duplicates: 0 Warnings: 0   MariaDB [test]> select *, (SELECT sum(x) FROM json_table(x, '$[*]' columns( x int path '$.x')) AS jt) from t1; +---------------------------+-----------------------------------------------------------------------------+ | x | (SELECT sum(x) FROM json_table(x, '$[*]' columns( x int path '$.x')) AS jt) | +---------------------------+-----------------------------------------------------------------------------+ | [{"x":"1"},{"x":"2"}] | 3 | | [{"x":"10"},{"x":"20"}] | 3 | | [{"x":"100"},{"x":"200"}] | 3 | +---------------------------+-----------------------------------------------------------------------------+ 3 rows in set (0,000 sec) MariaDB [test]> explain extended select *, (SELECT sum(x) FROM json_table(x, '$[*]' columns( x int path '$.x')) AS jt) from t1; +------+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | | | 2 | SUBQUERY | jt | ALL | NULL | NULL | NULL | NULL | 40 | 100.00 | Table function: json_table | +------+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------+ 2 rows in set, 2 warnings (0,000 sec)   Note (Code 1276): Field or reference 'test.t1.x' of SELECT #2 was resolved in SELECT #1 Note (Code 1003): /* select#1 */ select `test`.`t1`.`x` AS `x`,(/* select#2 */ select sum(`jt`.`x`) from JSON_TABLE(`test`.`t1`.`x`, '$[*]' COLUMNS (`x` int(11) PATH '$.x')) `jt`) AS `(SELECT sum(x) FROM json_table(x, '$[*]' columns( x int path '$.x')) AS jt)` from `test`.`t1`
            clint-wylie clint chenery added a comment -

            I have the same issue as this with MariaDb 10.6, i have worked around it by creating a custom function containing the json_table command that is used in place of the subquery and then passing the JSON column to the function which returns some value

            clint-wylie clint chenery added a comment - I have the same issue as this with MariaDb 10.6, i have worked around it by creating a custom function containing the json_table command that is used in place of the subquery and then passing the JSON column to the function which returns some value
            psergei Sergei Petrunia added a comment - - edited

            The subquery uses JSON_TABLE and its first parameter JSON_TABLE(t2_json is a reference to outside the subquery:

            SELECT ...
            	t2_json,
            	(SELECT SUM(x2) 
            	FROM 
            	  JSON_TABLE(t2_json, '$[*]' COLUMNS ( x2 int PATH '$' )) t3
            	) t2_json_sum
            

            But EXPLAIN output in the first comment shows type=SUBQUERY, which means the optimizer thinks the subquery is not correlated.

            Could the cause be that the code that collects subquery's outside references doesn't check table function's parameters? And the fix would be to check them?

            psergei Sergei Petrunia added a comment - - edited The subquery uses JSON_TABLE and its first parameter JSON_TABLE(t2_json is a reference to outside the subquery: SELECT ... t2_json, ( SELECT SUM (x2) FROM JSON_TABLE(t2_json, '$[*]' COLUMNS ( x2 int PATH '$' )) t3 ) t2_json_sum But EXPLAIN output in the first comment shows type=SUBQUERY , which means the optimizer thinks the subquery is not correlated. Could the cause be that the code that collects subquery's outside references doesn't check table function's parameters? And the fix would be to check them?
            danblack Daniel Black added a comment -

            From https://stackoverflow.com/questions/77360787/mariadb-query-exists-and-between

            create table events (event_id int, event_title varchar(30), event_distances json);
            insert into events values (1, 'event1', '[{"order":0,"value":21}]'),
               (2, 'event2', '[{"order":0,"value":8}]'),
               (3, 'event3', '[{"order":0,"value":10}]'),
               (4, 'event4', '[{"order":0,"value":5},{"order":1,"value":10},{"order":2,"value":21}]');
             
            SELECT events.event_title, event_distances 
            FROM events 
            WHERE 
            EXISTS ( 
                SELECT 1 
                FROM JSON_TABLE(event_distances, '$[*]' COLUMNS (distance_value FLOAT PATH '$.value')) AS distances 
                WHERE distances.distance_value BETWEEN 9 AND 22
            ) 
            ORDER BY events.event_id ASC;
            +-------------+-----------------------------------------------------------------------+
            | event_title | event_distances                                                       |
            +-------------+-----------------------------------------------------------------------+
            | event1      | [{"order":0,"value":21}]                                              |
            | event2      | [{"order":0,"value":8}]                                               |
            | event3      | [{"order":0,"value":10}]                                              |
            | event4      | [{"order":0,"value":5},{"order":1,"value":10},{"order":2,"value":21}] |
            +-------------+-----------------------------------------------------------------------+
            4 rows in set (0.001 sec)
             
            MariaDB [test]> SELECT events.event_title, event_distances  FROM events  WHERE  EXISTS (      SELECT 1      FROM JSON_TABLE(event_distances, '$[*]' COLUMNS (distance_value FLOAT PATH '$.value')) AS distances      WHERE distances.distance_value BETWEEN 20 AND 22 )  ORDER BY events.event_id ASC;
            +-------------+-----------------------------------------------------------------------+
            | event_title | event_distances                                                       |
            +-------------+-----------------------------------------------------------------------+
            | event1      | [{"order":0,"value":21}]                                              |
            | event2      | [{"order":0,"value":8}]                                               |
            | event3      | [{"order":0,"value":10}]                                              |
            | event4      | [{"order":0,"value":5},{"order":1,"value":10},{"order":2,"value":21}] |
            +-------------+-----------------------------------------------------------------------+
            

            danblack Daniel Black added a comment - From https://stackoverflow.com/questions/77360787/mariadb-query-exists-and-between create table events (event_id int, event_title varchar(30), event_distances json); insert into events values (1, 'event1', '[{"order":0,"value":21}]'), (2, 'event2', '[{"order":0,"value":8}]'), (3, 'event3', '[{"order":0,"value":10}]'), (4, 'event4', '[{"order":0,"value":5},{"order":1,"value":10},{"order":2,"value":21}]');   SELECT events.event_title, event_distances FROM events WHERE EXISTS ( SELECT 1 FROM JSON_TABLE(event_distances, '$[*]' COLUMNS (distance_value FLOAT PATH '$.value')) AS distances WHERE distances.distance_value BETWEEN 9 AND 22 ) ORDER BY events.event_id ASC; +-------------+-----------------------------------------------------------------------+ | event_title | event_distances | +-------------+-----------------------------------------------------------------------+ | event1 | [{"order":0,"value":21}] | | event2 | [{"order":0,"value":8}] | | event3 | [{"order":0,"value":10}] | | event4 | [{"order":0,"value":5},{"order":1,"value":10},{"order":2,"value":21}] | +-------------+-----------------------------------------------------------------------+ 4 rows in set (0.001 sec)   MariaDB [test]> SELECT events.event_title, event_distances FROM events WHERE EXISTS ( SELECT 1 FROM JSON_TABLE(event_distances, '$[*]' COLUMNS (distance_value FLOAT PATH '$.value')) AS distances WHERE distances.distance_value BETWEEN 20 AND 22 ) ORDER BY events.event_id ASC; +-------------+-----------------------------------------------------------------------+ | event_title | event_distances | +-------------+-----------------------------------------------------------------------+ | event1 | [{"order":0,"value":21}] | | event2 | [{"order":0,"value":8}] | | event3 | [{"order":0,"value":10}] | | event4 | [{"order":0,"value":5},{"order":1,"value":10},{"order":2,"value":21}] | +-------------+-----------------------------------------------------------------------+
            mokraemer Marc added a comment -

            I've got a very similar (same) issue:
            if you want to "join" JSON-Array's this is not very handy, but should work as follows:

            CREATE TABLE `tbl` (
              `ID` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
             `Tankkarten` json DEFAULT NULL
            );
             
            SELECT t.ID, t.js, 
            (SELECT group_concat(jt.item  SEPARATOR "\n" )
            FROM JSON_TABLE(t.js, "$[*]" COLUMNS (item tinytext PATH "$")) jt ) AS tk
            FROM tbl t
            LIMIT 50
            
            

            but the first value is repeated for every row

            mokraemer Marc added a comment - I've got a very similar (same) issue: if you want to "join" JSON-Array's this is not very handy, but should work as follows: CREATE TABLE `tbl` ( `ID` smallint (5) unsigned NOT NULL AUTO_INCREMENT, `Tankkarten` json DEFAULT NULL );   SELECT t.ID, t.js, ( SELECT group_concat(jt.item SEPARATOR "\n" ) FROM JSON_TABLE(t.js, "$[*]" COLUMNS (item tinytext PATH "$" )) jt ) AS tk FROM tbl t LIMIT 50 but the first value is repeated for every row
            alice Alice Sherepa added a comment -

            from MDEV-34284

            MariaDB [test]> select a, b, (
                ->   select json_arrayagg(i)
                ->   from json_table(json_array(a, b), '$[*]' columns (i int path '$')) t
                -> ) 
                -> from (select 1 a, 2 b union all select 3 a, 4 b) as t
                -> order by a;
            +---+---+------------------------------------------------------------------------------------------------------+
            | a | b | (
              select json_arrayagg(i)
              from json_table(json_array(a, b), '$[*]' columns (i int path '$')) t
            ) |
            +---+---+------------------------------------------------------------------------------------------------------+
            | 1 | 2 | [1,2]                                                                                                |
            | 3 | 4 | [1,2]                                                                                                |
            +---+---+------------------------------------------------------------------------------------------------------+
            2 rows in set (0,001 sec)
             
            MariaDB [test]> explain extended select a, b, (   select json_arrayagg(i)   from json_table(json_array(a, b), '$[*]' columns (i int path '$')) t )  from (select 1 a, 2 b union all select 3 a, 4 b) as t order by a;
            +------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------+
            | id   | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                      |
            +------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------+
            |    1 | PRIMARY     | <derived3> | ALL  | NULL          | NULL | NULL    | NULL | 2    |   100.00 | Using filesort             |
            |    3 | DERIVED     | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used             |
            |    4 | UNION       | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used             |
            |    2 | SUBQUERY    | t          | ALL  | NULL          | NULL | NULL    | NULL | 40   |   100.00 | Table function: json_table |
            +------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------+
            4 rows in set, 3 warnings (0,001 sec)
             
            Note (Code 1276): Field or reference 't.a' of SELECT #2 was resolved in SELECT #1
            Note (Code 1276): Field or reference 't.b' of SELECT #2 was resolved in SELECT #1
            Note (Code 1003): /* select#1 */ select `t`.`a` AS `a`,`t`.`b` AS `b`,(/* select#2 */ select json_arrayagg(`t`.`i`) from JSON_TABLE(json_array(`t`.`a`,`t`.`b`), '$[*]' COLUMNS (`i` int(11) PATH '$')) `t`) AS `(   select json_arrayagg(i)   from json_table(json_array(a, b), '$[*]' columns (i int path '$')) t )` from (/* select#3 */ select 1 AS `a`,2 AS `b` union all /* select#4 */ select 3 AS `a`,4 AS `b`) `t` order by `t`.`a`
             
            MariaDB [test]> select a, b, (   select json_arrayagg(i)   from json_table(json_array(a, b), '$[*]' columns (i int path '$') ) t where rand() is not null )  from (select 1 a, 2 b union all select 3 a, 4 b) as t order by a;
            +---+---+--------------------------------------------------------------------------------------------------------------------------------+
            | a | b | (   select json_arrayagg(i)   from json_table(json_array(a, b), '$[*]' columns (i int path '$') ) t where rand() is not null ) |
            +---+---+--------------------------------------------------------------------------------------------------------------------------------+
            | 1 | 2 | [1,2]                                                                                                                          |
            | 3 | 4 | [3,4]                                                                                                                          |
            +---+---+--------------------------------------------------------------------------------------------------------------------------------+
            2 rows in set (0,001 sec)
            

            alice Alice Sherepa added a comment - from MDEV-34284 MariaDB [test]> select a, b, ( -> select json_arrayagg(i) -> from json_table(json_array(a, b), '$[*]' columns (i int path '$')) t -> ) -> from (select 1 a, 2 b union all select 3 a, 4 b) as t -> order by a; +---+---+------------------------------------------------------------------------------------------------------+ | a | b | ( select json_arrayagg(i) from json_table(json_array(a, b), '$[*]' columns (i int path '$')) t ) | +---+---+------------------------------------------------------------------------------------------------------+ | 1 | 2 | [1,2] | | 3 | 4 | [1,2] | +---+---+------------------------------------------------------------------------------------------------------+ 2 rows in set (0,001 sec)   MariaDB [test]> explain extended select a, b, ( select json_arrayagg(i) from json_table(json_array(a, b), '$[*]' columns (i int path '$')) t ) from (select 1 a, 2 b union all select 3 a, 4 b) as t order by a; +------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------+ | 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using filesort | | 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 4 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 2 | SUBQUERY | t | ALL | NULL | NULL | NULL | NULL | 40 | 100.00 | Table function: json_table | +------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------+ 4 rows in set, 3 warnings (0,001 sec)   Note (Code 1276): Field or reference 't.a' of SELECT #2 was resolved in SELECT #1 Note (Code 1276): Field or reference 't.b' of SELECT #2 was resolved in SELECT #1 Note (Code 1003): /* select#1 */ select `t`.`a` AS `a`,`t`.`b` AS `b`,(/* select#2 */ select json_arrayagg(`t`.`i`) from JSON_TABLE(json_array(`t`.`a`,`t`.`b`), '$[*]' COLUMNS (`i` int(11) PATH '$')) `t`) AS `( select json_arrayagg(i) from json_table(json_array(a, b), '$[*]' columns (i int path '$')) t )` from (/* select#3 */ select 1 AS `a`,2 AS `b` union all /* select#4 */ select 3 AS `a`,4 AS `b`) `t` order by `t`.`a`   MariaDB [test]> select a, b, ( select json_arrayagg(i) from json_table(json_array(a, b), '$[*]' columns (i int path '$') ) t where rand() is not null ) from (select 1 a, 2 b union all select 3 a, 4 b) as t order by a; +---+---+--------------------------------------------------------------------------------------------------------------------------------+ | a | b | ( select json_arrayagg(i) from json_table(json_array(a, b), '$[*]' columns (i int path '$') ) t where rand() is not null ) | +---+---+--------------------------------------------------------------------------------------------------------------------------------+ | 1 | 2 | [1,2] | | 3 | 4 | [3,4] | +---+---+--------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set (0,001 sec)
            Johnston Rex Johnston added a comment - - edited

            psergei was almost right. Outside references are correctly gathered, but then the attribute st_select_lex::is_correlated is reset from the correct value (true, set during name resolution) back to false during the optimization phase by st_select_lex::update_correlated_cache().

            The fix is to update st_select_lex::update_correlated_cache() to look for a json_table function in it's TABLE_LIST and set the appropriate flag if it is correlated.

            Johnston Rex Johnston added a comment - - edited psergei was almost right. Outside references are correctly gathered, but then the attribute st_select_lex::is_correlated is reset from the correct value (true, set during name resolution) back to false during the optimization phase by st_select_lex::update_correlated_cache() . The fix is to update st_select_lex::update_correlated_cache() to look for a json_table function in it's TABLE_LIST and set the appropriate flag if it is correlated.

            Hi Johnston,
            Ok to push.

            psergei Sergei Petrunia added a comment - Hi Johnston , Ok to push.
            Johnston Rex Johnston added a comment -

            for the release notes: usage of columns from outer table references within json_table() will likely cause incorrect results.

            Johnston Rex Johnston added a comment - for the release notes: usage of columns from outer table references within json_table() will likely cause incorrect results.

            People

              Johnston Rex Johnston
              tgj1970 Thomas G. Jensen
              Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.