[MCOL-4818] Vectorize in-memory data representation Created: 2021-07-19  Updated: 2022-06-02  Resolved: 2022-04-06

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr, PrimProc
Affects Version/s: N/A
Fix Version/s: 6.3.1

Type: New Feature Priority: Major
Reporter: Roman Assignee: Leonid Fedorov
Resolution: Won't Do Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-4809 Vectorize column scanning/filtering Closed
Sprint: 2021-9, 2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15, 2021-16, 2021-17

 Description   

In-memory records representation consists of two parts:

  • columnar data interface class RowGroup
  • data-agnostic storage class RGData

RGData has boost::shared_array<uint8_t> rowData member that is 2D matrix where rows represent a record in a group of records.
The layout is deffective for vectorized processing that needs a continues space for a single column values in most cases. It also reduces a number of copy operations made in both EM/PP, e.g scanning/filtering code from primitives/linux-port can fill in the columnar buffer that will be later handed over to an RGData instance to store it in the list. (It is worth to note the layout might be an advantage for some SQL operators that needs certain values to be in cache, e.g. GROUP BY, JOIN)
The suggested change is to replace RGData::rowData with a std::vector<boost::shared_array<uint8_t>> columnData. This change forces for significant changes in the interface RowGroup class that provides get/set methods to access the records data. The operations that prev were trivial will be become more complex, e.g. RowGroup::copyRow(), Row::equals.
The change affects the class Row::Pointer that is a uint8_t* to RGData::rowData + optional StringStore and UserStore ptrs. It is widely used as a key in some distinct maps in sorting(dbcon/joblist/limitedorderby.cpp), aggregation(dbcon/joblist/groupconcat.cpp, dbcon/joblist/tupleaggregatestep.cpp), window functions(utils/windowfunction/*), joins(utils/joiner/tuplejoiner.cpp). This might be the biggest design change challenge.
The change must be hidden behind the existing RowGroup iface so that by the end of this issue there should be not so much changes to the code that leverages RowGroup or RGData. Some are inevitable though.


Generated at Thu Feb 08 02:53:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.