[MCOL-4590] Normalization process in TupleUnion step is slow Created: 2021-03-08 Updated: 2023-02-28 Resolved: 2023-02-28 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | None |
| Fix Version/s: | 23.02.1 |
| Type: | Task | Priority: | Major |
| Reporter: | Gagan Goel (Inactive) | Assignee: | Jigao Luo |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | beginner-friendly, gsoc22 | ||
| Epic Link: | ColumnStore Performance Improvements |
| Sprint: | 2021-17, 2022-22, 2022-23, 2023-4 |
| Assigned for Review: | |
| Description |
|
Assume query Q1 is:
And assume query Q2 is:
After the optimization performed in |
| Comments |
| Comment by Jigao Luo [ 2022-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Preimage: Before The Performance ImprovementEnviormentInstance type: c5.4xlarge && Debian 11 && 30GiB EBS SSD Codebaseserver codebase: 10.8 branch Datasethttps://github.com/mariadb-corporation/mariadb-columnstore-samples/
In this work, the table flights is focused, which has 32 columns and over 38M tuples. Query With UNION ALL: Q1
I run this SQL Query Q1 100 times in the release build: the average runtime is 3.49s.
Query Without UNION ALL: Q2
I run this SQL Query Q2 10 times in the release build: the average runtime is 0.65s. The Comparison Between Q1 and Q2 From Flight DatasetQ1 AVG Runtime: 3.49s The conclusion is there exists a runtime overhead due to the computation of the UNION ALL operator inside of the columnstore engine. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jigao Luo [ 2022-06-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Simple Approach on switch AND change on addToOutputIn this commit: The flight table is specially fixed with switch cases: https://github.com/mariadb-corporation/mariadb-columnstore-engine/commit/17f9ffe11a1398f8a32b5dfb7980e3666c4a1fb9 Query With UNION ALL: Q1
I run this SQL Query Q1 100 times in the release build: the average runtime is 3.44s.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jigao Luo [ 2022-07-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Simple Approach WITH change on addToOutputIn this commit: https://github.com/mariadb-corporation/mariadb-columnstore-engine/commit/75987029c277aba42c9920508e23cb1c6480bf2c Query With UNION ALL: Q1`admin@ip-172-31-28-183:~/cs-docker-tools/slapit$ sudo ./sysbench_wrapper.sh 1 100 slapunion.lua bts`
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jigao Luo [ 2022-07-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Baseline WITH change on addToOutputIn this commit: https://github.com/cakebytheoceanLuo/mariadb-columnstore-engine/commits/MCOL-4590-baseline Query With UNION ALL: Q1`admin@ip-172-31-28-183:~/cs-docker-tools/slapit$ sudo ./sysbench_wrapper.sh 1 100 slapunion.lua bts`
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jigao Luo [ 2022-07-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MidtermIn this commit: https://github.com/cakebytheoceanLuo/mariadb-columnstore-engine/commit/f92831d990ed4c18768ec1add3db970bad411ac2 Query With UNION ALL: Q1`admin@ip-172-31-28-183:~/cs-docker-tools/slapit$ sudo ./sysbench_wrapper.sh 1 100 slapunion.lua bts`
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jigao Luo [ 2022-09-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Performance Testing of The PR Fixing This IssueYou can find the same report following at this GitHub PR page: https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/2528#issuecomment-1242706499 Experiment EnvironmentThe experiments are run on the following hardware configuration:
DatasetThe benchmark dataset is provided by the community: https://github.com/mariadb-corporation/mariadb-columnstore-samples/ SchemaThere are details of the table flights:
Benchmark Query Q1The following Query Q1 is the benchmark query in our experiments and the query to be optimized.
Benchmark Query Q2The following Query Q2 has no `UNION` statement. Ideally, Q1 should be close to 2x runtime of Q2.
Benchmark ToolThe benchmark tool & script are provided by the community: https://github.com/drrtuy/cs-docker-tools Here is how I run the Q1: `~/cs-docker-tools/slapit$ sudo ./sysbench_wrapper.sh 1 100 slapunion.lua bts`. The `slapunion.lua bts` is loaded only with Q1. Q2 PerformanceThe average runtime of Q2 is *632.42ms*.
Q1 Performance Without This PRI benchmark with this commit https://github.com/mariadb-corporation/mariadb-columnstore-engine/commits/develop, which is the last commit and this PR is based on. The average runtime of Q1 without the optimization of this PR is *3229.35ms*.
Q1 Performance With This PRThe average runtime of Q1 with the optimization of this PR is *1312.31ms*.
SummaryQ2 AVG Runtime: 0.63s The Runtime Slowdown Ratio of Q1 and Q2 is ~5x which means the Q1 has more than 5 times the runtime of Q2. Ideally, the Runtime Slowdown Ratio should be close to 2. The current `develop` branch has a very inefficient UNION processing logic with overhead. Applying this patch, the runtime of Q1 is optimized to 1.31s, resulting in the Runtime Slowdown Ratio of 2.07. This ratio is very close to the ideal ratio. Moreover, the theoretical minimum is 2, which makes it impossible to optimize this ratio under 2. In summary, I have optimized the UNION processing in ColumnStore. The performance improvement is satisfying and close to a theoretical limit. |