[MCOL-4590] Normalization process in TupleUnion step is slow Created: 2021-03-08  Updated: 2023-02-28  Resolved: 2023-02-28

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: None
Fix Version/s: 23.02.1

Type: Task Priority: Major
Reporter: Gagan Goel (Inactive) Assignee: Jigao Luo
Resolution: Fixed Votes: 1
Labels: beginner-friendly, gsoc22

Epic Link: ColumnStore Performance Improvements
Sprint: 2021-17, 2022-22, 2022-23, 2023-4
Assigned for Review: Gagan Goel Gagan Goel (Inactive)

 Description   

Assume query Q1 is:

select count(c1) from (select * from t1 union all select * from t1)q;

And assume query Q2 is:

select count(c1) from (select * from t1)q;

After the optimization performed in MCOL-4589, Q1's execution time is still higher than 2x compared to Q2 (for a table with 100 columns and 1million records, Q1 takes 0.180s while Q2 takes 0.065s, i.e. Q1 is 2.77x compared to Q2. Ideally, it should be close to 2x). This is primarily due to the TupleUnion::normalize() call that runs in ExeMgr which has an inefficient implementation. The objective of this task is to come up with an effective implementation of TupleUnion::normalize().



 Comments   
Comment by Jigao Luo [ 2022-06-09 ]

The Preimage: Before The Performance Improvement

Enviorment

Instance type: c5.4xlarge && Debian 11 && 30GiB EBS SSD

Codebase

server codebase: 10.8 branch
columnstore codebase: https://github.com/cakebytheoceanLuo/mariadb-columnstore-engine/commit/94ba91c687ec5b293bc5b71de6e2225c2cc379ff

Dataset

https://github.com/mariadb-corporation/mariadb-columnstore-samples/

MariaDB [bts]> describe flights;
+---------------------+-------------+------+-----+---------+-------+
| Field               | Type        | Null | Key | Default | Extra |
+---------------------+-------------+------+-----+---------+-------+
| year                | smallint(6) | YES  |     | NULL    |       |
| month               | tinyint(4)  | YES  |     | NULL    |       |
| day                 | tinyint(4)  | YES  |     | NULL    |       |
| day_of_week         | tinyint(4)  | YES  |     | NULL    |       |
| fl_date             | date        | YES  |     | NULL    |       |
| carrier             | varchar(2)  | YES  |     | NULL    |       |
| tail_num            | varchar(6)  | YES  |     | NULL    |       |
| fl_num              | smallint(6) | YES  |     | NULL    |       |
| origin              | varchar(5)  | YES  |     | NULL    |       |
| dest                | varchar(5)  | YES  |     | NULL    |       |
| crs_dep_time        | varchar(4)  | YES  |     | NULL    |       |
| dep_time            | varchar(4)  | YES  |     | NULL    |       |
| dep_delay           | smallint(6) | YES  |     | NULL    |       |
| taxi_out            | smallint(6) | YES  |     | NULL    |       |
| wheels_off          | varchar(4)  | YES  |     | NULL    |       |
| wheels_on           | varchar(4)  | YES  |     | NULL    |       |
| taxi_in             | smallint(6) | YES  |     | NULL    |       |
| crs_arr_time        | varchar(4)  | YES  |     | NULL    |       |
| arr_time            | varchar(4)  | YES  |     | NULL    |       |
| arr_delay           | smallint(6) | YES  |     | NULL    |       |
| cancelled           | smallint(6) | YES  |     | NULL    |       |
| cancellation_code   | smallint(6) | YES  |     | NULL    |       |
| diverted            | smallint(6) | YES  |     | NULL    |       |
| crs_elapsed_time    | smallint(6) | YES  |     | NULL    |       |
| actual_elapsed_time | smallint(6) | YES  |     | NULL    |       |
| air_time            | smallint(6) | YES  |     | NULL    |       |
| distance            | smallint(6) | YES  |     | NULL    |       |
| carrier_delay       | smallint(6) | YES  |     | NULL    |       |
| weather_delay       | smallint(6) | YES  |     | NULL    |       |
| nas_delay           | smallint(6) | YES  |     | NULL    |       |
| security_delay      | smallint(6) | YES  |     | NULL    |       |
| late_aircraft_delay | smallint(6) | YES  |     | NULL    |       |
+---------------------+-------------+------+-----+---------+-------+
32 rows in set (0.000 sec)
 
MariaDB [bts]> select count(*) from INFORMATION_SCHEMA.COLUMNS where table_name = "flights";
+----------+
| count(*) |
+----------+
|       32 |
+----------+
1 row in set (0.003 sec)
MariaDB [bts]> select count(*) from flights;
+----------+
| count(*) |
+----------+
| 38083735 |
+----------+
1 row in set (0.166 sec)

In this work, the table flights is focused, which has 32 columns and over 38M tuples.

Query With UNION ALL: Q1

MariaDB [bts]> select count(*) from (select * from flights union all select * from flights) as Q1;
+----------+
| count(*) |
+----------+
| 76167470 |
+----------+
1 row in set (3.011 sec)

I run this SQL Query Q1 100 times in the release build: the average runtime is 3.49s.

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
    3151.619 |*****                                    2
    3208.883 |*******                                  3
    3267.187 |************************                 10
    3326.551 |*********************                    9
    3386.993 |****************                         7
    3448.533 |*********************************        14
    3511.192 |*********************************        14
    3574.989 |**************************************** 17
    3639.945 |************************                 10
    3706.081 |**************                           6
    3773.420 |*******                                  3
    3841.981 |*******                                  3
    3911.789 |*****                                    2
 
SQL statistics:
    queries performed:
        read:                            100
        write:                           0
        other:                           0
        total:                           100
    transactions:                        100    (0.29 per sec.)
    queries:                             100    (0.29 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          349.7696s
    total number of events:              100
 
Latency (ms):
         min:                                 3135.96
         avg:                                 3497.67
         max:                                 3928.22
         95th percentile:                     3773.42
         sum:                               349767.47
 
Threads fairness:
    events (avg/stddev):           100.0000/0.00
    execution time (avg/stddev):   349.7675/0.00

Query Without UNION ALL: Q2

MariaDB [bts]> select count(*) from (select * from flights) as Q2;
+----------+
| count(*) |
+----------+
| 38083735 |
+----------+
1 row in set (1.049 sec)

I run this SQL Query Q2 10 times in the release build: the average runtime is 0.65s.

The Comparison Between Q1 and Q2 From Flight Dataset

Q1 AVG Runtime: 3.49s
Q2 AVG Runtime: 0.65s
The Runtime Slowdown Ratio: 4.92x
But the size of Q1 is doubled from the size of Q2.

The conclusion is there exists a runtime overhead due to the computation of the UNION ALL operator inside of the columnstore engine.

Comment by Jigao Luo [ 2022-06-10 ]

Simple Approach on switch AND change on addToOutput

In this commit:

The flight table is specially fixed with switch cases: https://github.com/mariadb-corporation/mariadb-columnstore-engine/commit/17f9ffe11a1398f8a32b5dfb7980e3666c4a1fb9

Query With UNION ALL: Q1

MariaDB [bts]> 
select count(*) from (select * from flights union all select * from flights) as Q1;
+----------+
| count(*) |
+----------+
| 76167470 |
+----------+
1 row in set (3.945 sec)

I run this SQL Query Q1 100 times in the release build: the average runtime is 3.44s.
It is basically a minor performance improvement over the baseline.

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
    1304.208 |***                                      2
    1327.905 |**************                           9
    1352.033 |***********************                  15
    1376.599 |********************************         21
    1401.611 |**************************************** 26
    1427.078 |******************                       12
    1453.007 |*********                                6
    1479.408 |******                                   4
    1506.288 |***                                      2
    1533.657 |**                                       1
    1589.895 |**                                       1
    1618.783 |**                                       1
 
SQL statistics:
    queries performed:
        read:                            100
        write:                           0
        other:                           0
        total:                           100
    transactions:                        100    (0.72 per sec.)
    queries:                             100    (0.72 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          139.6992s
    total number of events:              100
 
Latency (ms):
         min:                                 1306.24
         avg:                                 1396.97
         max:                                 1609.73
         95th percentile:                     1479.41
         sum:                               139697.21
 
Threads fairness:
    events (avg/stddev):           100.0000/0.00
    execution time (avg/stddev):   139.6972/0.00

Comment by Jigao Luo [ 2022-07-09 ]

Simple Approach WITH change on addToOutput

In this commit: https://github.com/mariadb-corporation/mariadb-columnstore-engine/commit/75987029c277aba42c9920508e23cb1c6480bf2c

Query With UNION ALL: Q1

`admin@ip-172-31-28-183:~/cs-docker-tools/slapit$ sudo ./sysbench_wrapper.sh 1 100 slapunion.lua bts`

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
    1376.599 |*                                        1
    1401.611 |*********                                7
    1427.078 |********************                     16
    1453.007 |**************************************** 32
    1479.408 |******************************           24
    1506.288 |***************                          12
    1533.657 |******                                   5
    1561.523 |***                                      2
    1618.783 |*                                        1
 
SQL statistics:
    queries performed:
        read:                            100
        write:                           0
        other:                           0
        total:                           100
    transactions:                        100    (0.68 per sec.)
    queries:                             100    (0.68 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          146.5432s
    total number of events:              100
 
Latency (ms):
         min:                                 1388.30
         avg:                                 1465.41
         max:                                 1612.34
         95th percentile:                     1533.66
         sum:                               146541.11
 
Threads fairness:
    events (avg/stddev):           100.0000/0.00
    execution time (avg/stddev):   146.5411/0.00

Comment by Jigao Luo [ 2022-07-09 ]

Baseline WITH change on addToOutput

In this commit: https://github.com/cakebytheoceanLuo/mariadb-columnstore-engine/commits/MCOL-4590-baseline

Query With UNION ALL: Q1

`admin@ip-172-31-28-183:~/cs-docker-tools/slapit$ sudo ./sysbench_wrapper.sh 1 100 slapunion.lua bts`

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
    1304.208 |********                                 5
    1327.905 |*********************                    13
    1352.033 |**************************************** 25
    1376.599 |******************************           19
    1401.611 |***************************              17
    1427.078 |****************                         10
    1453.007 |**                                       1
    1479.408 |***********                              7
    1506.288 |***                                      2
    1533.657 |**                                       1
 
SQL statistics:
    queries performed:
        read:                            100
        write:                           0
        other:                           0
        total:                           100
    transactions:                        100    (0.72 per sec.)
    queries:                             100    (0.72 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          138.1056s
    total number of events:              100
 
Latency (ms):
         min:                                 1299.39
         avg:                                 1381.04
         max:                                 1521.33
         95th percentile:                     1479.41
         sum:                               138103.60
 
Threads fairness:
    events (avg/stddev):           100.0000/0.00
    execution time (avg/stddev):   138.1036/0.00

Comment by Jigao Luo [ 2022-07-28 ]

Midterm

In this commit: https://github.com/cakebytheoceanLuo/mariadb-columnstore-engine/commit/f92831d990ed4c18768ec1add3db970bad411ac2

Query With UNION ALL: Q1

`admin@ip-172-31-28-183:~/cs-docker-tools/slapit$ sudo ./sysbench_wrapper.sh 1 100 slapunion.lua bts`

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
    1304.208 |*                                        1
    1327.905 |*********                                6
    1352.033 |************                             8
    1376.599 |**************************************** 27
    1401.611 |***************************              18
    1427.078 |*******************************          21
    1453.007 |*************                            9
    1479.408 |******                                   4
    1506.288 |***                                      2
    1533.657 |*                                        1
    1561.523 |*                                        1
    1589.895 |*                                        1
    1678.143 |*                                        1
 
SQL statistics:
    queries performed:
        read:                            100
        write:                           0
        other:                           0
        total:                           100
    transactions:                        100    (0.71 per sec.)
    queries:                             100    (0.71 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          140.8248s
    total number of events:              100
 
Latency (ms):
         min:                                 1305.74
         avg:                                 1408.23
         max:                                 1677.00
         95th percentile:                     1506.29
         sum:                               140822.79
 
Threads fairness:
    events (avg/stddev):           100.0000/0.00
    execution time (avg/stddev):   140.8228/0.00

Comment by Jigao Luo [ 2022-09-15 ]

Performance Testing of The PR Fixing This Issue

You can find the same report following at this GitHub PR page: https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/2528#issuecomment-1242706499

Experiment Environment

The experiments are run on the following hardware configuration:

  • AWS Instance type: c5.4xlarge
  • Debian 11
  • 30GiB EBS SSD

Dataset

The benchmark dataset is provided by the community: https://github.com/mariadb-corporation/mariadb-columnstore-samples/
In this work, the table flights is focused, which has 32 columns and over 38M tuples.

Schema

There are details of the table flights:

MariaDB [bts]> describe flights;
--
+---------------------+-------------+------+-----+---------+-------+
\| Field               \| Type        \| Null \| Key \| Default \| Extra \|
+---------------------+-------------+------+-----+---------+-------+
\| year                \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| month               \| tinyint(4)  \| YES  \|     \| NULL    \|       \|
\| day                 \| tinyint(4)  \| YES  \|     \| NULL    \|       \|
\| day_of_week         \| tinyint(4)  \| YES  \|     \| NULL    \|       \|
\| fl_date             \| date        \| YES  \|     \| NULL    \|       \|
\| carrier             \| varchar(2)  \| YES  \|     \| NULL    \|       \|
\| tail_num            \| varchar(6)  \| YES  \|     \| NULL    \|       \|
\| fl_num              \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| origin              \| varchar(5)  \| YES  \|     \| NULL    \|       \|
\| dest                \| varchar(5)  \| YES  \|     \| NULL    \|       \|
\| crs_dep_time        \| varchar(4)  \| YES  \|     \| NULL    \|       \|
\| dep_time            \| varchar(4)  \| YES  \|     \| NULL    \|       \|
\| dep_delay           \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| taxi_out            \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| wheels_off          \| varchar(4)  \| YES  \|     \| NULL    \|       \|
\| wheels_on           \| varchar(4)  \| YES  \|     \| NULL    \|       \|
\| taxi_in             \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| crs_arr_time        \| varchar(4)  \| YES  \|     \| NULL    \|       \|
\| arr_time            \| varchar(4)  \| YES  \|     \| NULL    \|       \|
\| arr_delay           \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| cancelled           \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| cancellation_code   \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| diverted            \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| crs_elapsed_time    \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| actual_elapsed_time \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| air_time            \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| distance            \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| carrier_delay       \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| weather_delay       \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| nas_delay           \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| security_delay      \| smallint(6) \| YES  \|     \| NULL    \|       \|
\| late_aircraft_delay \| smallint(6) \| YES  \|     \| NULL    \|       \|
+---------------------+-------------+------+-----+---------+-------+
32 rows in set (0.000 sec)
 
MariaDB [bts]> select count(*) from INFORMATION_SCHEMA.COLUMNS where table_name = "flights";
+----------+
\| count(*) \|
+----------+
\|       32 \|
+----------+
1 row in set (0.003 sec)
MariaDB [bts]> select count(*) from flights;
+----------+
\| count(*) \|
+----------+
\| 38083735 \|
+----------+
1 row in set (0.166 sec)

Benchmark Query Q1

The following Query Q1 is the benchmark query in our experiments and the query to be optimized.

MariaDB [bts]> select count(*) from (select * from flights union all select * from flights) as Q1;
--
+----------+
\| count(*) \|
+----------+
\| 76167470 \|
+----------+
1 row in set (3.011 sec)

Benchmark Query Q2

The following Query Q2 has no `UNION` statement. Ideally, Q1 should be close to 2x runtime of Q2.

MariaDB [bts]> select count(*) from (select * from flights) as Q2;
--
+----------+
\| count(*) \|
+----------+
\| 38083735 \|
+----------+
1 row in set (1.049 sec)

Benchmark Tool

The benchmark tool & script are provided by the community: https://github.com/drrtuy/cs-docker-tools

Here is how I run the Q1: `~/cs-docker-tools/slapit$ sudo ./sysbench_wrapper.sh 1 100 slapunion.lua bts`. The `slapunion.lua bts` is loaded only with Q1.
The way to benchmark Q2 is similar.

Q2 Performance

The average runtime of Q2 is *632.42ms*.

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
     623.335 |**************************************** 63
     634.661 |*************                            21
     646.192 |******                                   10
     657.933 |**                                       3
     694.452 |**                                       3
 
SQL statistics:
    queries performed:
        read:                            100
        write:                           0
        other:                           0
        total:                           100
    transactions:                        100    (1.58 per sec.)
    queries:                             100    (1.58 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          63.2444s
    total number of events:              100
 
Latency (ms):
         min:                                  623.54
         avg:                                  632.42
         max:                                  698.28
         95th percentile:                      657.93
         sum:                                63242.44
 
Threads fairness:
    events (avg/stddev):           100.0000/0.00
    execution time (avg/stddev):   63.2424/0.00

Q1 Performance Without This PR

I benchmark with this commit https://github.com/mariadb-corporation/mariadb-columnstore-engine/commits/develop, which is the last commit and this PR is based on.

The average runtime of Q1 without the optimization of this PR is *3229.35ms*.

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
    2828.869 |***                                      1
    2880.269 |******                                   2
    2932.602 |***********                              4
    2985.887 |**************                           5
    3040.139 |*******************************          11
    3095.377 |***********************                  8
    3151.619 |***********************                  8
    3208.883 |*******************************          11
    3267.187 |*************************************    13
    3326.551 |**************************************** 14
    3386.993 |**************************               9
    3448.533 |*****************                        6
    3511.192 |*********                                3
    3574.989 |***                                      1
    3639.945 |*********                                3
    3706.081 |***                                      1
 
SQL statistics:
    queries performed:
        read:                            100
        write:                           0
        other:                           0
        total:                           100
    transactions:                        100    (0.31 per sec.)
    queries:                             100    (0.31 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          322.9365s
    total number of events:              100
 
Latency (ms):
         min:                                 2831.68
         avg:                                 3229.35
         max:                                 3707.17
         95th percentile:                     3511.19
         sum:                               322934.52
 
Threads fairness:
    events (avg/stddev):           100.0000/0.00
    execution time (avg/stddev):   322.9345/0.00

Q1 Performance With This PR

The average runtime of Q1 with the optimization of this PR is *1312.31ms*.

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
    1280.934 |*************************                30
    1304.208 |**************************************** 48
    1327.905 |*********                                11
    1352.033 |****                                     5
    1376.599 |**                                       2
    1401.611 |*                                        1
    1427.078 |*                                        1
    1453.007 |*                                        1
    1771.289 |*                                        1
 
SQL statistics:
    queries performed:
        read:                            100
        write:                           0
        other:                           0
        total:                           100
    transactions:                        100    (0.76 per sec.)
    queries:                             100    (0.76 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
 
General statistics:
    total time:                          131.2332s
    total number of events:              100
 
Latency (ms):
         min:                                 1285.18
         avg:                                 1312.31
         max:                                 1768.78
         95th percentile:                     1376.60
         sum:                               131231.23
 
Threads fairness:
    events (avg/stddev):           100.0000/0.00
    execution time (avg/stddev):   131.2312/0.00

Summary

Q2 AVG Runtime: 0.63s
Q1 AVG Runtime: 3.23s
Optimized Q1 AVG Runtime: 1.31s

The Runtime Slowdown Ratio of Q1 and Q2 is ~5x which means the Q1 has more than 5 times the runtime of Q2. Ideally, the Runtime Slowdown Ratio should be close to 2. The current `develop` branch has a very inefficient UNION processing logic with overhead.

Applying this patch, the runtime of Q1 is optimized to 1.31s, resulting in the Runtime Slowdown Ratio of 2.07. This ratio is very close to the ideal ratio. Moreover, the theoretical minimum is 2, which makes it impossible to optimize this ratio under 2.

In summary, I have optimized the UNION processing in ColumnStore. The performance improvement is satisfying and close to a theoretical limit.

Generated at Thu Feb 08 02:51:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.