[MCOL-3471] Investigate & fix dbt3 scalability problem Created: 2019-09-03 Updated: 2019-11-01 Resolved: 2019-11-01 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ? |
| Affects Version/s: | None |
| Fix Version/s: | Icebox |
| Type: | Task | Priority: | Major |
| Reporter: | Patrick LeBlanc (Inactive) | Assignee: | Patrick LeBlanc (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Sub-Tasks: |
|
| Comments |
| Comment by Patrick LeBlanc (Inactive) [ 2019-09-03 ] |
|
Tracking current work in progress |
| Comment by Patrick LeBlanc (Inactive) [ 2019-09-03 ] |
|
Did a round of benchmarks & experimentation with the 10GB data set. 10GB is not big enough to know what is going on for about half of the queries (extent elimination possibly). For a couple queries, expanding the size of the PM join and letting the UM flood PMs with jobs made them scalable. For the remaining queries, I suspect the time is being spent creating the now-large hash tables on the PM. Will add that to the list of things to work on. |
| Comment by Patrick LeBlanc (Inactive) [ 2019-11-01 ] |
|
Completely forgot I made a ticket for this. I estimate that I spent 2 weeks on this after I got back from our DBS visit. The key to getting very good scaling on this data & these queries was treating CS like a typical data warehouse -> 1) configuration changes to allow more in-flight operations & larger distributed joins, 2) denormalizing most tables and querying that. Greg is continuing some experimentation on the remaining few queries that do not scale as much as we would like. There is more we can do to cut the initialization phase of big distributed joins, but we'll make other tickets for that. |