As of now there is no way to vectorize the loops of the scanning/filtering code that resides in primitives/linux-port/column.*
The basic logic is that for the column the mentioned code traverses the block of values:
- skiping empty values
- filtering the values using related filters from SQL statement
- saving the values that satisfies into the output buffer
The code optionally traverses the column block and touch only those values with specific RIDs sent from upper layers.
The data processing is scalar here with lots of conditions that slows down execution.
The suggested way is to refactor the code to leverage data prefetch and batch processing using SIMD instructions. The available CPU command set should be detected in runtime on PP startup or at least once per column block.