Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32583

UUID() should be treated as stochastic for the purposes of forcing query materialization

Details

    Description

      When compiling/running relatively simple queries that construct JSON objects it seems like the query engine is re-computing function calls instead of using the actual value when extracting JSON data.

      For example, given the query:

      WITH RECURSIVE iter AS
      (
      	SELECT 1 AS 'i'
      	UNION 
      	SELECT i + 1
      	FROM iter 
      	WHERE i < 10
      ),
      json_data AS 
      (
      	SELECT
      		JSON_OBJECT(
      			'UUID', UUID()
      		) AS 'Data'
      	FROM iter 
      )
      SELECT
      	JSON_EXTRACT(jd.`Data`, '$.UUID') AS 'Extracted',
      	jd.`Data`
      FROM json_data jd
      

      you would expect the `Extracted` value to be the same as the value that was computed in the `Data` JSON_OBJECT, but it is actually different.

      With this query I was able to get the same results for iter generating numbers up to 1,000,000, although in another query that consisted of more joins then I did not see this behavior; in that case the extracted UUID was the same as the originally computed UUID.

      Attachments

        Issue Links

          Activity

            edyck Ezra Dyck created issue -
            edyck Ezra Dyck made changes -
            Field Original Value New Value
            Attachment Screenshot 2023-10-25 130737.png [ 72360 ]
            edyck Ezra Dyck made changes -
            Attachment Screenshot 2023-10-25 130737.png [ 72361 ]
            Johnston Rex Johnston added a comment -

            The issue here has nothing to do with JSON_EXTRACT(), rather query materialization.

            If you force the optimizer to materialize the cte, rather than merge it, you will get consistent results. For example

            WITH 
            json_data AS 
            (
            	SELECT
            		JSON_OBJECT(
            			'UUID', UUID()
            		) AS 'Data'
            	FROM seq_1_to_10 where rand() > 0
            )
            SELECT 
            	JSON_EXTRACT(jd.`Data`, '$.UUID') AS 'Extracted',
            	jd.`Data`
            FROM json_data jd
            

            will produce consistent UUIDs, but if the CTE can be merged into the parent query, then you will get 2 calls to UUID() rather than one, which is expected to produce 2 different values.

            I'm not sure this is a bug.

            Johnston Rex Johnston added a comment - The issue here has nothing to do with JSON_EXTRACT(), rather query materialization. If you force the optimizer to materialize the cte, rather than merge it, you will get consistent results. For example WITH json_data AS ( SELECT JSON_OBJECT( 'UUID' , UUID() ) AS 'Data' FROM seq_1_to_10 where rand() > 0 ) SELECT JSON_EXTRACT(jd.`Data`, '$.UUID' ) AS 'Extracted' , jd.`Data` FROM json_data jd will produce consistent UUIDs, but if the CTE can be merged into the parent query, then you will get 2 calls to UUID() rather than one, which is expected to produce 2 different values. I'm not sure this is a bug.
            edyck Ezra Dyck added a comment -

            Very interesting.

            I agree that this doesn't appear to be a bug with JSON.

            One could argue that a call to UUID should cause the CTE to be materialized instead of merging it because that function is non-deterministic.

            edyck Ezra Dyck added a comment - Very interesting. I agree that this doesn't appear to be a bug with JSON. One could argue that a call to UUID should cause the CTE to be materialized instead of merging it because that function is non-deterministic.
            Johnston Rex Johnston added a comment -

            A perfectly valid argument.

            Johnston Rex Johnston added a comment - A perfectly valid argument.
            Johnston Rex Johnston made changes -
            Assignee Rex Johnston [ JIRAUSER52533 ]
            Johnston Rex Johnston added a comment - - edited

            UUID() should be treated as stochastic (the same as RAND()) for the purposes of forcing query materialization.

            For example, the following query

            WITH 
            cte AS 
            (
            	SELECT UUID() as r
            	FROM seq_1_to_10
            )
            SELECT 
            	r as r1,
            	r
            FROM cte
            

            produces data where r1 != r

            whereas

            WITH 
            cte AS 
            (
            	SELECT RAND() as r
            	FROM seq_1_to_10
            )
            SELECT 
            	r as r1,
            	r
            FROM cte
            

            produces data where r1==r

            Johnston Rex Johnston added a comment - - edited UUID() should be treated as stochastic (the same as RAND()) for the purposes of forcing query materialization. For example, the following query WITH cte AS ( SELECT UUID() as r FROM seq_1_to_10 ) SELECT r as r1, r FROM cte produces data where r1 != r whereas WITH cte AS ( SELECT RAND() as r FROM seq_1_to_10 ) SELECT r as r1, r FROM cte produces data where r1==r
            Johnston Rex Johnston made changes -
            Summary JSON_EXTRACT recomputes value for small data sets UUID() should be treated as stochastic
            Johnston Rex Johnston made changes -
            Summary UUID() should be treated as stochastic UUID() should be treated as stochastic for the purposes of forcing query materialization
            Johnston Rex Johnston made changes -
            Fix Version/s 10.4 [ 22408 ]
            Johnston Rex Johnston made changes -
            Labels CTE JSON CTE optimizer
            Johnston Rex Johnston made changes -
            Fix Version/s 10.4.32 [ 29300 ]
            Fix Version/s 10.4 [ 22408 ]
            edyck Ezra Dyck made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            serg Sergei Golubchik made changes -
            Priority Blocker [ 1 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 10.4.32 [ 29300 ]
            serg Sergei Golubchik made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.10 [ 27530 ]
            Johnston Rex Johnston made changes -
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.4 [ 29301 ]
            Johnston Rex Johnston made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            Johnston Rex Johnston made changes -
            Fix Version/s 10.4 [ 22408 ]
            Johnston Rex Johnston made changes -
            Fix Version/s 11.5 [ 29506 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.3 [ 28565 ]
            danblack Daniel Black made changes -
            Johnston Rex Johnston added a comment - - edited

            Helper functions Create_func_uuid::create_builder() and Create_func_uuid_short::create_builder() *currently set *lex::safe_to_cache_query to false (this is used to en/disable the main query result cache), but this is not enough to disable selects with item types like this from being merged into parents. lex::uncacheable() should be used to indicate that lex::current_select is both uncacheable and unmergeable.

            Johnston Rex Johnston added a comment - - edited Helper functions Create_func_uuid::create_builder() and Create_func_uuid_short::create_builder() *currently set *lex::safe_to_cache_query to false (this is used to en/disable the main query result cache), but this is not enough to disable selects with item types like this from being merged into parents. lex::uncacheable() should be used to indicate that lex::current_select is both uncacheable and unmergeable.
            Johnston Rex Johnston made changes -
            Assignee Rex Johnston [ JIRAUSER52533 ] Oleg Smirnov [ JIRAUSER50405 ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            oleg.smirnov Oleg Smirnov added a comment -

            Thanks for addressing the questions. The fix seems legitimate to me. I don't think I'm allowed to approve the merge so please someone else for approval.

            oleg.smirnov Oleg Smirnov added a comment - Thanks for addressing the questions. The fix seems legitimate to me. I don't think I'm allowed to approve the merge so please someone else for approval.
            oleg.smirnov Oleg Smirnov made changes -
            Assignee Oleg Smirnov [ JIRAUSER50405 ] Rex Johnston [ JIRAUSER52533 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            serg Sergei Golubchik made changes -

            5f628add4562172371f88679053e913748686e8a is ok to push

            although you can remove

            set @@rand_seed1=31415926,@@rand_seed2=2718281828;
            

            from the test, as it doesn't do anything for UUID()

            serg Sergei Golubchik added a comment - 5f628add4562172371f88679053e913748686e8a is ok to push although you can remove set @@rand_seed1=31415926,@@rand_seed2=2718281828; from the test, as it doesn't do anything for UUID()
            Johnston Rex Johnston made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            Johnston Rex Johnston added a comment - - edited

            commit without rand_seed setting is 9e800eda8602a09212f6c35b6f8b32c48c8df8e8

            Johnston Rex Johnston added a comment - - edited commit without rand_seed setting is 9e800eda8602a09212f6c35b6f8b32c48c8df8e8
            Johnston Rex Johnston made changes -
            Fix Version/s 10.5.26 [ 29832 ]
            Fix Version/s 10.6.19 [ 29833 ]
            Fix Version/s 10.11.9 [ 29834 ]
            Fix Version/s 11.1.6 [ 29835 ]
            Fix Version/s 11.2.5 [ 29836 ]
            Fix Version/s 11.4.3 [ 29837 ]
            Fix Version/s 11.5.2 [ 29838 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.5 [ 29506 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]

            For the release notes: The optimizer must not merge CTEs or derived tables that use the UUID() function. Failing to honor this limitation caused wrong query results.

            psergei Sergei Petrunia added a comment - For the release notes: The optimizer must not merge CTEs or derived tables that use the UUID() function. Failing to honor this limitation caused wrong query results.

            People

              Johnston Rex Johnston
              edyck Ezra Dyck
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.