[MCOL-874] performance regression with dictionary columns Created: 2017-08-12 Updated: 2017-09-06 Resolved: 2017-09-06 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr, PrimProc |
| Affects Version/s: | 1.1.0 |
| Fix Version/s: | 1.1.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | David Thompson (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Sprint: | 2017-16, 2017-17, 2017-18 |
| Description |
|
An aggregate query where the group by column is a char or varchar with dictionary storage appears to be about twice as slow as in 1.0. Normal columns like numbers or inline strings e.g. char(8) are not affected. For example:
if col2 is say char(255) |
| Comments |
| Comment by David Thompson (Inactive) [ 2017-08-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
attachment includes test scripts and my results. To repro: gen.py is set to generate 100M rows. It can be seen that the col2 and col4 query variants are approx 2 slower. These are both dictionary column cases. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-08-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Very likely due to deserialisation of the string store is now done per string instead of per block. I have an idea how to improve this. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-08-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Reverted StringStore to the small-string optimised original version and added an additional vector to store longer strings. Performance should be on-par with 1.0 now. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2017-09-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Build verified: 1.1.0 Github source /root/columnstore/mariadb-columnstore-server /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine Tested on singlet-server, 10g dbt3 database. 1.1 is still a lot slower then 1.0: One thing I noticed is that the values for sum(l_quantity) is right justified in 1.0 but left justified in 1.1. Is the sum() being processed as strings somewhere in the process? 1.1.0
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Confirmed using InnoDB that the left justification of sum() result is a 10.2 specific thing rather than a ColumnStore thing. Looking into the performance... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cannot reproduce what Daniel observed. With DBT3 10GB 1.0:
Same data/machine with 1.1 develop:
Data reloaded into 1.1:
The left/right different on the sum column is a MariaDB 10.2 change. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2017-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Build verified: 1.1.0 Github source [root@localhost ~]# cat mariadb-columnstore-1.1.0-1-centos7.x86_64.bin.tar.txt Merge pull request #68 from mariadb-corporation/ /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine Merge pull request #247 from mariadb-corporation/ Add compiler flag checks and hardening flags Not sure why, I built both 1.1.0 and 1.0.10 and did not see the performance difference now.
|