[MDEV-32583] UUID() should be treated as stochastic for the purposes of forcing query materialization Created: 2023-10-25  Updated: 2023-12-07

Status: Confirmed
Project: MariaDB Server
Component/s: JSON, Optimizer - CTE
Affects Version/s: 10.3.32
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4

Type: Bug Priority: Critical
Reporter: Ezra Dyck Assignee: Rex Johnston
Resolution: Unresolved Votes: 0
Labels: CTE, optimizer
Environment:

Windows 11
MariaDB 10.3.32


Attachments: PNG File Screenshot 2023-10-25 130737.png    

 Description   

When compiling/running relatively simple queries that construct JSON objects it seems like the query engine is re-computing function calls instead of using the actual value when extracting JSON data.

For example, given the query:

WITH RECURSIVE iter AS
(
	SELECT 1 AS 'i'
	UNION 
	SELECT i + 1
	FROM iter 
	WHERE i < 10
),
json_data AS 
(
	SELECT
		JSON_OBJECT(
			'UUID', UUID()
		) AS 'Data'
	FROM iter 
)
SELECT
	JSON_EXTRACT(jd.`Data`, '$.UUID') AS 'Extracted',
	jd.`Data`
FROM json_data jd

you would expect the `Extracted` value to be the same as the value that was computed in the `Data` JSON_OBJECT, but it is actually different.

With this query I was able to get the same results for iter generating numbers up to 1,000,000, although in another query that consisted of more joins then I did not see this behavior; in that case the extracted UUID was the same as the originally computed UUID.



 Comments   
Comment by Rex Johnston [ 2023-10-25 ]

The issue here has nothing to do with JSON_EXTRACT(), rather query materialization.

If you force the optimizer to materialize the cte, rather than merge it, you will get consistent results. For example

WITH 
json_data AS 
(
	SELECT
		JSON_OBJECT(
			'UUID', UUID()
		) AS 'Data'
	FROM seq_1_to_10 where rand() > 0
)
SELECT 
	JSON_EXTRACT(jd.`Data`, '$.UUID') AS 'Extracted',
	jd.`Data`
FROM json_data jd

will produce consistent UUIDs, but if the CTE can be merged into the parent query, then you will get 2 calls to UUID() rather than one, which is expected to produce 2 different values.

I'm not sure this is a bug.

Comment by Ezra Dyck [ 2023-10-25 ]

Very interesting.

I agree that this doesn't appear to be a bug with JSON.

One could argue that a call to UUID should cause the CTE to be materialized instead of merging it because that function is non-deterministic.

Comment by Rex Johnston [ 2023-10-25 ]

A perfectly valid argument.

Comment by Rex Johnston [ 2023-10-25 ]

UUID() should be treated as stochastic (the same as RAND()) for the purposes of forcing query materialization.

For example, the following query

WITH 
cte AS 
(
	SELECT UUID() as r
	FROM seq_1_to_10
)
SELECT 
	r as r1,
	r
FROM cte

produces data where r1 != r

whereas

WITH 
cte AS 
(
	SELECT RAND() as r
	FROM seq_1_to_10
)
SELECT 
	r as r1,
	r
FROM cte

produces data where r1==r

Generated at Thu Feb 08 10:32:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.