Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.3.2
-
Using the official docker image with default configuration (other than credentials)
Description
For my work I am using an absolutely massive query. This query has been expanded over time, but lately the query is becoming extremely slow when I try to join the result on another table (one-to-one relation) to the point of not finishing in 1 hour+. Without that join, the query is running in ~2.5 minutes. Interestingly, the query has similar performance if instead of joining I add the column from the table that I need via a correlated subquery. This points to the optimizer using an inefficient query plan, and using a correlated subquery forces a much better plan.
So I did some digging, and it looks like the row estimate is off by a LOT. I have ensured the table statistics are updated, so that part can be ruled out. The bug that I am seeing seems to be present in multiple different CTEs used within our query, and this issue is compounding whenever I try to join the result of a CTE on something else, as each CTE has a row estimate that is way too high. In the end, our final query has a row estimate of quadrillions of rows, while in reality it has a much more modest ~150k rows.
I have extracted a specific very simple CTE that exhibits the issue, together with the dataset that triggers the issue. Nothing interesting is happening at all in this query because I have simplified it down as much as possible while still exhibiting the issue. Obviously the real query is using more fields and the CTE is used usefully instead of simply SELECTing from the CTE.
```
ANALYZE
WITH demo AS
(
SELECT
u.id
FROM users u
LEFT JOIN activity_users au
ON au.user_id = u.id
LEFT JOIN activity_user_signoffs aus
ON aus.activity_user_id = au.id
GROUP BY u.id
)
SELECT * FROM demo
```
Running this query, I get the result shown on the "bad_estimate" image. It estimates 544k rows, but in reality it's only 23k. Interestingly, when I remove the outer SELECT, the estimate is accurate:
```
ANALYZE
SELECT
u.id
FROM users u
LEFT JOIN activity_users au
ON au.user_id = u.id
LEFT JOIN activity_user_signoffs aus
ON aus.activity_user_id = au.id
GROUP BY u.id
```
The interesting second observation, is that I can keep the CTE and get an accurate estimate if I remove the GROUP BY statement in the CTE:
```
ANALYZE
WITH demo AS
(
SELECT
u.id
FROM users u
LEFT JOIN activity_users au
ON au.user_id = u.id
LEFT JOIN activity_user_signoffs aus
ON aus.activity_user_id = au.id
)
SELECT * FROM demo
```
Obviously this is just a toy example, and the bad estimates don't actually slow anything down as the query plan is identical. However, when you were to use this CTE to join it with other CTEs and/or tables, then this wrong estimate can make the optimizer pick horribly inefficient query plans. Attached you will find the .sql database dump that can be used to reproduce this issue.
Please let me know if there is anything else I can provide to help