Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4543

Locate and Remove CPU waste in PM1 on nested queries with aggregates




      There appears to be a significant and logically unexplainable overhead in queries which involve aggregates on top of a subquery with wide projection. Top examples:

      On a table of 40 million rows, select count(c) from (select * from t) q; we see roughly double elapsed time (and CPU consumed by ExeMgr on PM1 from which the query is issued) over select count(c) from (select c from t);

      Plans and I/O statistics are identical.
      ++++++++++++++++++++++++++++++++++++++ Below part was split off into a separate ticket 4569
      It gets much worse if a UNION ALL is involved.
      If you do

      select count(c) from (select * from t1) q;
      select count(c) from (select * from t2) q;

      and compare it to
      select count(c) from (select * from t1 union all select * from t2) q; the difference in elapsed and CPU is around 10x (second query is 10x worse than the sum of the first two).

      Again, plans and I/O stats are identical (second is the same as sum of the first two).

      Some comments:
      1. CPU consumed was obtained by running TOP on the PM1 node, and taking TIME+ value before and immediately after the query
      2. There are CPU values from other nodes, but they are not significant (PrimProcs are doing something, but not very much).
      3. This was tested on 3 node cluster only. It needs to be first examined on a single node.
      4. Please view attached Excel workbook, which shows all the metrics obtained for all query variants.


          Issue Links



              dleeyh Daniel Lee
              gdorman Gregory Dorman
              0 Vote for this issue
              6 Start watching this issue



                  Git Integration