Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Do
-
1.2.5
-
None
-
ubuntu 18 LTS. 1UM, 2 PMs.
Description
Greg & I are diagnosing remaining scalability issues with the denormalized 50gb dbt3 data set, ran into a bug, where the max() window function + a group by of 500k keys will cause an infinite loop in a single thread. It is based query 15 in the dbt3 set. We've simplified it (and made it nonsensical) to make it clear.
I assume it is independent of the size of the data or the fact it is denormalized, so it is likely reproducible with the standard dbt3 load. Replace the denorm table with supplier.
A version of the query that works:
select * from (
select
rank() over (order by sum(l_extendedprice)) rank_total_revenue
from denorm
where
l_shipdate >= '1994-11-01'
and l_shipdate < date_add('1994-11-01', interval 90 day)
group by s_suppkey
) dq
;
Version that doesn't work
select * from (
select
max(sum(l_extendedprice)) over() as max_total_revenue
from denorm
where
l_shipdate >= '1994-11-01'
and l_shipdate < date_add('1994-11-01', interval 90 day)
group by s_suppkey
) dq
;