[MCOL-874] performance regression with dictionary columns Created: 2017-08-12  Updated: 2017-09-06  Resolved: 2017-09-06

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr, PrimProc
Affects Version/s: 1.1.0
Fix Version/s: 1.1.0

Type: Bug Priority: Critical
Reporter: David Thompson (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mcol874.tgz    
Sprint: 2017-16, 2017-17, 2017-18

 Description   

An aggregate query where the group by column is a char or varchar with dictionary storage appears to be about twice as slow as in 1.0. Normal columns like numbers or inline strings e.g. char(8) are not affected. For example:

select col2, sum(val), count(*) from sample1 group by 1 order by 1;

if col2 is say char(255)



 Comments   
Comment by David Thompson (Inactive) [ 2017-08-12 ]

attachment includes test scripts and my results. To repro:
mcsmysql test < schema.sql
./load.sh
mcsmysql -vvv test < queries.sql | tee 100m.out
grep -v "rows in" 100m.out

gen.py is set to generate 100M rows.

It can be seen that the col2 and col4 query variants are approx 2 slower. These are both dictionary column cases.

Comment by Andrew Hutchings (Inactive) [ 2017-08-12 ]

Very likely due to deserialisation of the string store is now done per string instead of per block. I have an idea how to improve this.

Comment by Andrew Hutchings (Inactive) [ 2017-08-14 ]

Reverted StringStore to the small-string optimised original version and added an additional vector to store longer strings. Performance should be on-par with 1.0 now.

Comment by Daniel Lee (Inactive) [ 2017-09-05 ]

Build verified: 1.1.0 Github source

/root/columnstore/mariadb-columnstore-server
commit 9e855a6415e0edd6771c449a6591c21c3915bfec
Merge: 6ed33d1 c206e51
Author: David.Hall <david.hall@mariadb.com>
Date: Tue Sep 5 09:43:29 2017 -0500

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 90353b9b908e1c9ee241c4a156a2a377c53cc807
Author: david hill <david.hill@mariadb.com>
Date: Fri Sep 1 14:46:07 2017 -0500

Tested on singlet-server, 10g dbt3 database. 1.1 is still a lot slower then 1.0:

One thing I noticed is that the values for sum(l_quantity) is right justified in 1.0 but left justified in 1.1. Is the sum() being processed as strings somewhere in the process?

1.1.0

MariaDB [tpch10]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        | 218525124.00    |  8566164 |
| FOB        | 218469194.00    |  8569760 |
| MAIL       | 218541962.00    |  8569053 |
| RAIL       | 218553230.00    |  8571844 |
| REG AIR    | 218617001.00    |  8570280 |
| SHIP       | 218576988.00    |  8571402 |
| TRUCK      | 218454537.00    |  8567549 |
+------------+-----------------+----------+
7 rows in set (7.17 sec)
 
 
MariaDB [tpch10]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        |    218525124.00 |  8566164 |
| FOB        |    218469194.00 |  8569760 |
| MAIL       |    218541962.00 |  8569053 |
| RAIL       |    218553230.00 |  8571844 |
| REG AIR    |    218617001.00 |  8570280 |
| SHIP       |    218576988.00 |  8571402 |
| TRUCK      |    218454537.00 |  8567549 |
+------------+-----------------+----------+
7 rows in set (4.55 sec)

Comment by Andrew Hutchings (Inactive) [ 2017-09-06 ]

Confirmed using InnoDB that the left justification of sum() result is a 10.2 specific thing rather than a ColumnStore thing. Looking into the performance...

Comment by Andrew Hutchings (Inactive) [ 2017-09-06 ]

Cannot reproduce what Daniel observed.

With DBT3 10GB 1.0:

MariaDB [test]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        |    218525124.00 |  8566164 |
| FOB        |    218469194.00 |  8569760 |
| MAIL       |    218541962.00 |  8569053 |
| RAIL       |    218553230.00 |  8571844 |
| REG AIR    |    218617001.00 |  8570280 |
| SHIP       |    218576988.00 |  8571402 |
| TRUCK      |    218454537.00 |  8567549 |
+------------+-----------------+----------+
7 rows in set, 1 warning (1.39 sec)
 
MariaDB [test]> select calgettrace()\G
*************************** 1. row ***************************
calgettrace(): 
Desc Mode Table    TableOID ReferencedColumns       PIO LIO    PBE Elapsed Rows  
BPS  PM   lineitem 3075     (l_quantity,l_shipmode) 0   124493 0   1.375   51261 
TAS  UM   -        -        -                       -   -      -   1.348   7     
 
1 row in set (0.00 sec)

Same data/machine with 1.1 develop:

MariaDB [test]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        | 218525124.00    |  8566164 |
| FOB        | 218469194.00    |  8569760 |
| MAIL       | 218541962.00    |  8569053 |
| RAIL       | 218553230.00    |  8571844 |
| REG AIR    | 218617001.00    |  8570280 |
| SHIP       | 218576988.00    |  8571402 |
| TRUCK      | 218454537.00    |  8567549 |
+------------+-----------------+----------+
7 rows in set, 1 warning (1.42 sec)
 
MariaDB [test]> select calgettrace()\G                                          *************************** 1. row ***************************
calgettrace(): 
Desc Mode Table    TableOID ReferencedColumns       PIO LIO    PBE Elapsed Rows  
BPS  PM   lineitem 3075     (l_quantity,l_shipmode) 0   124493 0   1.409   51261 
TAS  UM   -        -        -                       -   -      -   1.388   7     
 
1 row in set (0.00 sec)

Data reloaded into 1.1:

MariaDB [test]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        | 218525124.00    |  8566164 |
| FOB        | 218469194.00    |  8569760 |
| MAIL       | 218541962.00    |  8569053 |
| RAIL       | 218553230.00    |  8571844 |
| REG AIR    | 218617001.00    |  8570280 |
| SHIP       | 218576988.00    |  8571402 |
| TRUCK      | 218454537.00    |  8567549 |
+------------+-----------------+----------+
7 rows in set, 1 warning (1.28 sec)
 
MariaDB [test]> select calgettrace()\G                                          *************************** 1. row ***************************
calgettrace(): 
Desc Mode Table    TableOID ReferencedColumns       PIO LIO    PBE Elapsed Rows 
BPS  PM   lineitem 3075     (l_quantity,l_shipmode) 0   183123 0   1.274   854  
TAS  UM   -        -        -                       -   -      -   1.249   7    
 
1 row in set (0.00 sec)

The left/right different on the sum column is a MariaDB 10.2 change.

Comment by Daniel Lee (Inactive) [ 2017-09-06 ]

Build verified: 1.1.0 Github source

[root@localhost ~]# cat mariadb-columnstore-1.1.0-1-centos7.x86_64.bin.tar.txt
/root/columnstore/mariadb-columnstore-server
commit 9e855a6415e0edd6771c449a6591c21c3915bfec
Merge: 6ed33d1 c206e51
Author: David.Hall <david.hall@mariadb.com>
Date: Tue Sep 5 09:43:29 2017 -0500

Merge pull request #68 from mariadb-corporation/MCOL-887

MCOL-887 Merge MariaDB 10.2.8

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 4441206050b07986f31402652c3299d36007d78d
Merge: 90353b9 230d013
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Tue Sep 5 20:36:02 2017 +0100

Merge pull request #247 from mariadb-corporation/MCOL-579

Add compiler flag checks and hardening flags

Not sure why, I built both 1.1.0 and 1.0.10 and did not see the performance difference now.

1.1.0
 
MariaDB [tpch10]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        | 218525124.00    |  8566164 |
| FOB        | 218469194.00    |  8569760 |
| MAIL       | 218541962.00    |  8569053 |
| RAIL       | 218553230.00    |  8571844 |
| REG AIR    | 218617001.00    |  8570280 |
| SHIP       | 218576988.00    |  8571402 |
| TRUCK      | 218454537.00    |  8567549 |
+------------+-----------------+----------+
7 rows in set (3.59 sec)
 
MariaDB [tpch10]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        | 218525124.00    |  8566164 |
| FOB        | 218469194.00    |  8569760 |
| MAIL       | 218541962.00    |  8569053 |
| RAIL       | 218553230.00    |  8571844 |
| REG AIR    | 218617001.00    |  8570280 |
| SHIP       | 218576988.00    |  8571402 |
| TRUCK      | 218454537.00    |  8567549 |
+------------+-----------------+----------+
7 rows in set (2.33 sec)
 
1.0.0
 
MariaDB [tpch10]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        |    218525124.00 |  8566164 |
| FOB        |    218469194.00 |  8569760 |
| MAIL       |    218541962.00 |  8569053 |
| RAIL       |    218553230.00 |  8571844 |
| REG AIR    |    218617001.00 |  8570280 |
| SHIP       |    218576988.00 |  8571402 |
| TRUCK      |    218454537.00 |  8567549 |
+------------+-----------------+----------+
7 rows in set (3.90 sec)
 
MariaDB [tpch10]> select l_shipmode, sum(l_quantity), count(*) from lineitem group by 1 order by 1;
+------------+-----------------+----------+
| l_shipmode | sum(l_quantity) | count(*) |
+------------+-----------------+----------+
| AIR        |    218525124.00 |  8566164 |
| FOB        |    218469194.00 |  8569760 |
| MAIL       |    218541962.00 |  8569053 |
| RAIL       |    218553230.00 |  8571844 |
| REG AIR    |    218617001.00 |  8570280 |
| SHIP       |    218576988.00 |  8571402 |
| TRUCK      |    218454537.00 |  8567549 |
+------------+-----------------+----------+
7 rows in set (2.83 sec)

Generated at Thu Feb 08 02:24:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.