[MCOL-4521] Refactor WriteEngine Created: 2021-01-29  Updated: 2023-07-01

Status: Open
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.0.16
Fix Version/s: Icebox

Type: Task Priority: Minor
Reporter: Sergey Zefirov Assignee: Sergey Zefirov
Resolution: Unresolved Votes: 0
Labels: None


 Description   

WriteEngine is riddled with the copy-pasted and somewhat excessive functionality.

Examples include:

  • there are three functions to write values into blocks in writeengine/wrapper/we_colop.(h, cpp): writeRow, writeRows and writeRowsValues.
  • there are two functions to insert values in writeengine/wrapper/writeengine: insertColumnRecs and insertColumnRec. Their functionality can be unified in insertColumnRecs (as inserting several rows can express insertion of one) but some of functionality is somewhat different and the reason for difference is not commented or otherwise documented.
  • big part of functionality of write engine can and should be put into type handlers. There are too many cases where column width is used to compute block index, offset of data inside of block, sizes of temporary buffers and conversion of data to and from network format.

Of course, assumptions used in WE are deeply rooted everywhere in MCS, but to quote old Fortran manual "the value of PI should be made named constant in case its value will ever change". Centralization of concerns will help with implementation of new functionality and with maintenance of old features.



 Comments   
Comment by Sergey Zefirov [ 2021-01-29 ]

Take a look here: https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/writeengine/wrapper/writeengine.cpp#L4559 (function int WriteEngineWrapper::writeColumnRec).

There are two branches there, one for split write and other for single piece write. The code is quite repetetive, it repeats itself three times: first and second part of split write and the same code goes in single write.

Ideally, we would have two branches: code for first part of split write and single piece code is exactly the same, up to some fixes. We have to have then process second part in split write.

Comment by Sergey Zefirov [ 2021-02-02 ]

we_colop.h contains more or less identical code in three functions:

    /**
     * @brief Write row(s)
     */
    EXPORT virtual int writeRow(Column& curCol,
                                uint64_t totalRow,
                                const RID* rowIdArray,
                                const void* valArray,
                                      void* oldValArray =0,
                                bool bDelete = false);
 
    /**
     * @brief Write row(s) for delete  @Bug 1886,2870
     */
    EXPORT virtual int writeRows(Column& curCol,
                                 uint64_t totalRow,
                                 const RIDList& ridList,
                                 const void* valArray,
                                       void* oldValArray = 0,
                                 bool bDelete = false);
 
    /**
     * @brief Write row(s) for update @Bug 1886,2870
     */
    EXPORT virtual int writeRowsValues(Column& curCol,
                                       uint64_t totalRow,
                                       const RIDList& ridList,
                                       const void* valArray,
                                             void* oldValArray = 0);

These should be unified.

Comment by Sergey Zefirov [ 2021-03-11 ]

ColStruct in we_type.h (writeengine/shared) does not have proper strict constructor which will set internal strctures right.

For example, in writeengine.cpp there are several calls to Convertor::convertColType(ColStruct* colStruct) function which set colType field using CalpontSystemCatalog tyle field and column width.

The problem is that it is normally fine to assume that all fields are correct - that constructed record is all good and fine. But it is not. That means if your code depends on the write engine's internal type, you have to check whether it is set properly. And if not, you have to call conversion code. And then code after your change can call the same conversion code (due to legacy I describe) and duplicate work for nothing.

Generated at Thu Feb 08 02:50:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.