[MDEV-6357] QUERY CACHE - Query Cache And Limit Clause Created: 2014-06-18  Updated: 2014-07-08  Due: 2014-07-02  Resolved: 2014-07-07

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.10
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: BELUGABEHR Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: querycache


 Description   

In reading about the query cache, I came across an undocumented situation.

https://mariadb.com/kb/en/query-cache/

Basically, the cache should consume queries that always return the same results. With that said, I have noticed that the query cache will cache a query with a LIMIT clause without an ORDER BY clause. This seems incorrect because the LIMIT could return different results for different queries.

For completeness sake, I would suggest not caching queries that use a LIMIT without an ORDER BY clause.



 Comments   
Comment by Daniel Black [ 2014-06-18 ]

Just because a LIMIT without an ORDER BY could return a different result doesn't mean that the result will be returned from the cache is incorrect.

If the output needs a deterministic order then a ORDER BY should be used.

If no caching is needed then SQL_NO_CACHE can be used.

I'd be of the impression failing to cache LIMIT-full and ORDER BY-less queries would be an unnecessary change in behaviour but that's just my opinion.

What aspect of this is undocumented? There's nothing special about ORDER BY or LIMIT so its not documented as such.

Comment by Elena Stepanova [ 2014-06-18 ]

I agree with Daniel's comment, or maybe I don't quite understand the nature of complaint.

Could you please describe a situation where the result would actually be wrong because of using the query cache?
I don't expect you to create a test case, just words and pseudo-code would be enough, as long as it's clear what goes wrong at the end.

Comment by Elena Stepanova [ 2014-07-07 ]

Closing as incomplete for now, if you have any additional information, please comment to re-open.

Comment by BELUGABEHR [ 2014-07-08 ]

Elena,

Thank you for following up on this trivial remark. I ask that you don't sink too much time into thinking about this. I am probably thinking about this too hard myself.

"It also means [the] query has to be exactly the same and deterministic."
http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/

It's the word "deterministic" that I find interesting in the description. I honestly cannot imagine a true business scenario, but a limit clause without a order by is not deterministic and the result set could change row order between queries. No? To store an un-ordered result list does not seem true to form.

Within the documentation for replication, it states:

"Statement-based replication of LIMIT clauses in DELETE, UPDATE, and INSERT ... SELECT statements is unsafe since the order of the rows affected is not defined."

So, it does seem to me that caching a result set that has not been first ordered may not be the desired behavior.

Comment by Daniel Black [ 2014-07-08 ]

ah. the select without a order by isn't deterministic in the sense that exactly which rows are returned isn't predictable. With no data changes to the tables (or global config changes) however two sequential queries will return the same result (there's no chance the query plan will change). Because this is the case it may as well go in the query cache. Data changes will invalidate the cache anyway.

Comment by Elena Stepanova [ 2014-07-08 ]

Yes, using the word "deterministic" may be not completely accurate in this case; but then again, it is not official documentation, and besides later in the same post Peter introduces the definition of "deterministic" query, which actually fits fine a query with LIMIT without ORDER BY:

Query must be deterministic – Query might provide same result no matter how many times it is run, if data remains the same. So if query works with current data, uses non-deterministic functions such as UUID(), RAND(), CONNECTION_ID() etc it will not be cached.

This query indeed might provide same result – even though there is no obligation to (unlike a query with CONNECTION_ID(), which might not produce the same result if it's run from different connections).

As far as I can see, the official MySQL manual avoids claiming that a query must be deterministic, and instead just provides a list of exceptions explicitly.

Generated at Thu Feb 08 07:11:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.