[MCOL-4543] Locate and Remove CPU waste in PM1 on nested queries with aggregates Created: 2021-02-18  Updated: 2021-04-27  Resolved: 2021-04-16

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 5.6.1, 6.1.1

Type: Bug Priority: Critical
Reporter: Gregory Dorman (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: performance

Attachments: Microsoft Word Summary.xlsx    
Issue Links:
Issue split
split to MCOL-4569 Queries with UNION ALL perform dispro... Closed
Epic Link: ColumnStore Performance Improvements
Sprint: 2021-3, 2021-4, 2021-5, 2021-6

 Description   

There appears to be a significant and logically unexplainable overhead in queries which involve aggregates on top of a subquery with wide projection. Top examples:

On a table of 40 million rows, select count(c) from (select * from t) q; we see roughly double elapsed time (and CPU consumed by ExeMgr on PM1 from which the query is issued) over select count(c) from (select c from t);

Plans and I/O statistics are identical.
++++++++++++++++++++++++++++++++++++++ Below part was split off into a separate ticket 4569
It gets much worse if a UNION ALL is involved.
If you do

select count(c) from (select * from t1) q;
select count(c) from (select * from t2) q;

and compare it to
select count(c) from (select * from t1 union all select * from t2) q; the difference in elapsed and CPU is around 10x (second query is 10x worse than the sum of the first two).

Again, plans and I/O stats are identical (second is the same as sum of the first two).

++++++++++++++++++++++++++++++++++++++++
Some comments:
1. CPU consumed was obtained by running TOP on the PM1 node, and taking TIME+ value before and immediately after the query
2. There are CPU values from other nodes, but they are not significant (PrimProcs are doing something, but not very much).
3. This was tested on 3 node cluster only. It needs to be first examined on a single node.
4. Please view attached Excel workbook, which shows all the metrics obtained for all query variants.



 Comments   
Comment by Gagan Goel (Inactive) [ 2021-03-02 ]

Patch to address query performance deterioration of query Q1 of the form

SELECT count(c1) FROM (SELECT * FROM t1)q;

vs query Q2 of the form

SELECT count(c1) FROM (SELECT c1 FROM t1)q;

is ready for review.

Assuming t1 is a table consisting of 10 columns c1, c2, ... , c10. We were building an intermediate row in ExeMgr for Q1 of the form (c1_value1, 1, 1, 1, 1, 1, 1, 1, 1, 1). This was wasteful both from memory as well as cpu perspective, since every row processing in ExeMgr involved iterating over all 10 columns (wasted cpu cycles), and the intermediate row is redundantly large (wasted memory).

The patch currently has a limitation: If query Q1 is replaced by query Q3 of the form

SELECT count(c2) FROM (SELECT * FROM t1)q;

In this case, we are building an intermediate row in ExeMgr of the form (1, c2_value1), i.e. non-referenced columns (non-reference meaning the subquery column in not referenced in the outer select, in the case of query Q3, column c1) that appear to the left of a referenced column (column c2) are not eliminated. This is due to the outer query remembering the column indices of the subquery referenced columns. So if we were to also remove c1 from the intermediate row in the case of query Q3, c2's index needs to be updated from 1 to 0. This change needs to be propagated to the outer query, which is non-trivial. This limitation will be addressed in future.

So, if the outer query is referencing subquery columns with lower index values (such as query Q1 or Q3), there will be noticeable gains in terms of query execution times (execution time will be close to that of query Q2) compared to the case when higher index column is referenced in the outer query, such as

SELECT count(c10) FROM (SELECT * FROM t1)q;

Comment by Daniel Lee (Inactive) [ 2021-04-16 ]

Builds verified: 5.6.1 (Drone #2191), 6.1.1 (Drone #2193)

Did performance check on releases 5.5.2, 5.6.1 and 6.1.1 and compared results

ExeMgr CPU utilization

For 5.5.2, query 1 is 37.4% more than query 2
For 5.6.1 and 6.1.1, both queries are virtually identical

For the UNION ALL performance

5.6.1 is 5 times faster than 5.5.2
6.1.1 is 4 times faster than 5.5.2

All tests were done on VM. some performance variance is expected.
Please see details below.

5.5.2

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.003 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (5.533 sec)

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.004 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select l_orderkey from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (5.118 sec)

ExeMgr CPU utilization for query 1 is 37.4 more than query 2

6:40.03
6:26.68
-------
13.36

6:49.75
6:40.03
-------
9.72

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.010 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (5.994 sec)

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.003 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem1) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (6.304 sec)

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.010 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem
-> union all
-> select * from lineitem1) q;
-------------------

count(l_orderkey)

-------------------

119972104

-------------------
1 row in set (1 min 8.500 sec)

5.6.1

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.002 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (5.252 sec)

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.003 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select l_orderkey from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (4.800 sec)

ExeMgr CPU utilization for both queries virtually identical

1:30.07
1:20:39
-------
9.68

1:40.11
1:30.07
-------
10.04

MariaDB [tpch10]> #
MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.004 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (4.843 sec)

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.004 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem1) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (5.865 sec)

MariaDB [tpch10]>
MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.004 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem
-> union all
-> select * from lineitem1) q;
-------------------

count(l_orderkey)

-------------------

119972104

-------------------
1 row in set (13.587 sec)

6.6.1

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.003 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (5.811 sec)

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.003 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select l_orderkey from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (4.929 sec)

ExeMgr CPU utilization for both queries virtually identical

1:48.67
1:38.81
-------
9.86

1:58.39
1:48.67
-------
9.72

MariaDB [tpch10]> #
MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.004 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (5.072 sec)

MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.004 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem1) q;
-------------------

count(l_orderkey)

-------------------

59986052

-------------------
1 row in set (5.673 sec)

MariaDB [tpch10]>
MariaDB [tpch10]> select calflushcache();
-----------------

calflushcache()

-----------------

0

-----------------
1 row in set (0.004 sec)

MariaDB [tpch10]> select count(l_orderkey) from (select * from lineitem
-> union all
-> select * from lineitem1) q;
-------------------

count(l_orderkey)

-------------------

119972104

-------------------
1 row in set (16.454 sec)

Generated at Thu Feb 08 02:51:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.