[MCOL-4809] Vectorize column scanning/filtering Created: 2021-07-09  Updated: 2022-06-02  Resolved: 2022-02-27

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 6.1.1
Fix Version/s: 6.3.1

Type: New Feature Priority: Critical
Reporter: Roman Assignee: Alexey Antipovsky (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
includes MCOL-4815 Refactor ColumnCommand to have multip... Closed
Relates
relates to MCOL-4876 Separate values and RID vectors sent ... Closed
relates to MCOL-4818 Vectorize in-memory data representation Closed
Epic Link: ColumnStore Performance Improvements
Sprint: 2021-9, 2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15, 2021-16, 2021-17
Epic/Theme: Performance

 Description   

As of now there is no way to vectorize the loops of the scanning/filtering code that resides in primitives/linux-port/column.*

The basic logic is that for the column the mentioned code traverses the block of values:

  • skiping empty values
  • filtering the values using related filters from SQL statement
  • saving the values that satisfies into the output buffer
    The code optionally traverses the column block and touch only those values with specific RIDs sent from upper layers.

The data processing is scalar here with lots of conditions that slows down execution.
The suggested way is to refactor the code to leverage data prefetch and batch processing using SIMD instructions. The available CPU command set should be detected in runtime on PP startup or at least once per column block.



 Comments   
Comment by Roman [ 2021-10-25 ]

Here is the microbench run to compare vectorized code against legacy. JFYI vectorized and templated tests actually follow the same path.

root@f6c7f6d6c651:/git/mdb-server/storage/columnstore/columnstore# ./bin/primitives_scan_bench 
2021-10-25 18:30:56
Running ./bin/primitives_scan_bench
Run on (12 X 5000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
Load Average: 0.53, 0.64, 0.92
-----------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------
FilterBenchFixture/BM_ColumnScan1ByteLegacyCode                151645 ns       151542 ns         4827
FilterBenchFixture/BM_ColumnScan1Byte1FilterLegacyCode         150197 ns       150135 ns         4687
FilterBenchFixture/BM_ColumnScan1ByteTemplatedCode             117699 ns       117634 ns         5948
FilterBenchFixture/BM_ColumnScan1Byte1FilterTemplatedCode      116762 ns       116678 ns         5991
FilterBenchFixture/BM_ColumnScan1ByteVectorizedCode             12960 ns        12954 ns        53812
FilterBenchFixture/BM_ColumnScan1Byte1FilterVectorizedCode      12992 ns        12985 ns        53878
FilterBenchFixture/BM_ColumnScan2ByteLegacyCode                 30773 ns        30763 ns        22778
FilterBenchFixture/BM_ColumnScan2Byte1FilterLegacyCode          30927 ns        30911 ns        22648
FilterBenchFixture/BM_ColumnScan2ByteTemplatedCode               7596 ns         7592 ns        91764
FilterBenchFixture/BM_ColumnScan2Byte1FilterTemplatedCode        7025 ns         7021 ns        99417
FilterBenchFixture/BM_ColumnScan2Byte1FilterVectorizedCode       7034 ns         7029 ns        99361
FilterBenchFixture/BM_ColumnScan4ByteLegacyCode                 17081 ns        17093 ns        40779
FilterBenchFixture/BM_ColumnScan4Byte1FilterLegacyCode          17055 ns        17063 ns        40836
FilterBenchFixture/BM_ColumnScan4ByteTemplatedCode               4757 ns         4757 ns       147177
FilterBenchFixture/BM_ColumnScan4ByteVectorizedCode              4733 ns         4733 ns       147717
FilterBenchFixture/BM_ColumnScan8ByteLegacyCode                  8225 ns         8236 ns        84080
FilterBenchFixture/BM_ColumnScan8Byte1FilterLegacyCode           8204 ns         8216 ns        84243
FilterBenchFixture/BM_ColumnScan8ByteTemplatedCode               3535 ns         3535 ns       196507
FilterBenchFixture/BM_ColumnScan8Byte1FilterTemplatedCode        3552 ns         3552 ns       196760
FilterBenchFixture/BM_ColumnScan8ByteVectorizedCode              3575 ns         3575 ns       196497
FilterBenchFixture/BM_ColumnScan8Byte1FilterVectorizedCode       3531 ns         3530 ns       198363

Comment by David Hall (Inactive) [ 2021-11-17 ]

Some work may not be finished for 6.2.2. We may release with partial implementation as this is only a performance gain at each phase, so they can be implemented separately.

Comment by Roman [ 2022-02-01 ]

Plz review.

Generated at Thu Feb 08 02:53:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.