[MCOL-4564] Performance issues for DBT3 queries #17, #18 Created: 2021-02-26  Updated: 2021-06-28  Resolved: 2021-03-03

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 5.5.2
Fix Version/s: 5.5.2

Type: Bug Priority: Blocker
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
includes MCOL-4759 DISTINCT and UNION performance degrad... Closed
Relates
relates to MCOL-4575 Hash table performance for collation ... Open
Sprint: 2021-4

 Description   

Build tested: 5.5.2 (Drone build #1718)

Performed 10gb DBT3 performance on the build and noticed couple issues. #18 took 56 times longer to complete.

Query execution time:

5.5.1-1 Disk run
#17 30.388 sec
#18 14.922 sec

5.5.1-1 Cache run
#17 28.71 sec
#18 14.915 sec

5.5.2-1 Disk run
#17 50.439 sec
#18 833.523

5.5.2-1 Cache run
#17 50.436 sec
#18 834.642

Query #17
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem lo,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#15'
and p_container = 'LG BOX'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = lo.l_partkey
);

Query #18
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
select
l_orderkey
from
lineitem
group by
l_orderkey having
sum(l_quantity) > 313
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
LIMIT 100;



 Comments   
Comment by Roman [ 2021-03-01 ]

Plz review.

Comment by Daniel Lee (Inactive) [ 2021-03-01 ]

Build verified: 5.5.2 (Drone #1776)

New elapsed time

5.5.2-1 Disk run
#17 36.556 sec
#18 18.867 sec

5.5.2-1 Cache run
#17 37.045 sec
#18 17.946 sec

Comment by Roman [ 2021-03-01 ]

perf top output comparison b/w two MCS versions implicitly confirmed that the number of hash collisions is a way higher for 5.5.2 then for 5.5.1. AggregateComparator::operator() (this symbol is used by the hashmap used in GROUP BY processing) overhead was a way higher for 5.5.2 (5.5.2 uses CHARSET_INFO::hash_sort whilst 5.5.1 uses homebrew Murmur3 implementation) This tells me that hash values distributions produced by CHARSET_INFO::hash_sort are far from uniform distributions comparing with distrubutions produced by Murmur3 hash function.

Comment by Daniel Lee (Inactive) [ 2021-03-03 ]

Build verified: 5.5.2 (Drone #1793)

Verified again with the latest cron build.

Generated at Thu Feb 08 02:51:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.