[MCOL-4857] PERCENTILE() on MCS tables works slower then on InnoDB Created: 2021-09-05  Updated: 2021-09-12

Status: Open
Project: MariaDB ColumnStore
Component/s: ExeMgr, PrimProc
Affects Version/s: 1.5.3, 6.1.1
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Roman Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Epic/Theme: Performance

 Description   

Here I copy/paste the message sent to our Google group.

While this didn't always appear to be the case, it appears the performance of percentile functions against larger sets of data has deteriorated significantly where it is pretty much unusable. I tried to do some searching and could find much so I wanted to raise it here.
A query like such on a row count of 10k might take 5 secs, maybe better

MariaDB [hop_2021]> SELECT DISTINCT '2021-05-25' as dateval, t.*, count FROM (
-> SELECT
-> PERCENTILE_CONT(.10) WITHIN GROUP (ORDER BY host_latency) OVER () as p1,
-> PERCENTILE_CONT(.5) WITHIN GROUP (ORDER BY host_latency) OVER () as median,
-> PERCENTILE_CONT(.90) WITHIN GROUP (ORDER BY host_latency) OVER () as p90,
-> PERCENTILE_CONT(.99) WITHIN GROUP (ORDER BY host_latency) OVER () as p99
-> FROM tblpivot_apps
-> WHERE host_latency > 0
-> AND capture_date = '2021-05-25'
-> ) t;

When reaching into the ~500k range it seems to hang indefinitely.

  • Resources on the host appear fine, however increased load noticed.
  • Columnstore logs are clean
  • Other window functions like quartiles work just fine

I then converted the table to innodb and re-ran the query on the larger set and it returned in seconds. I am happy to go that route, but just wanted to bring closure as to why.

I was also tempted to run some regression to see if it was tied to a version, but I don't quite have the time to do so, haha. Any insight on this topic would be greatly appreciated! Is it expected that CS is not great at handling these types of functions?


This server is running ->
mariadb-server: 10.5.8 w/ columnstore plugin


Generated at Thu Feb 08 02:53:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.