Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4791

Fix ColumnCommand fudged data type format to clearly identify CHAR vs VARCHAR

    XMLWordPrintable

Details

    • Task
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 6.1.1
    • 23.10
    • ExeMgr, PrimProc
    • None

    Description

      Under terms of MCOL-4691 we're going to replace the 0-terminated representation of the RowGroup VARCHAR format for short columns and replace it to:

      • One byte length
      • Followed by the actual string data

      This will remove a lot of strnlen() calls used e.g. in the row aggregation code.

      In order to do the format change easier we need to clearly distinguish CHAR vs VARCHAR on the PrimProc side.

      Currently it's not possible to distinguish because ExeProc sends the data type in ColumnCommand in a "fudged" format as follows:

      ExeMgr Real Type PrimProc Fudged Type  PrimProc isDict
      ---------------- --------------------  ---------------
      VARCHAR(1)       VARCHAR(2)            false
      VARCHAR(2)       VARCHAR(4)            false
      VARCHAR(3)       VARCHAR(4)            false
      VARCHAR(4)       CHAR(8)               false
      VARCHAR(5)       CHAR(8)               false
      VARCHAR(6)       CHAR(8)               false
      VARCHAR(7)       CHAR(8)               false
      VARCHAR(8)       VARCHAR(8)            true
      VARCHAR(9)       VARCHAR(8)            true
      VARCHAR(255)     VARCHAR(8)            true
      VARCHAR(8000)    VARCHAR(8)            true
       
      CHAR(1)          CHAR(1)               false
      CHAR(2)          CHAR(2)               false
      CHAR(3)          CHAR(4)               false
      CHAR(4)          CHAR(4)               false
      CHAR(5)          CHAR(8)               false
      CHAR(6)          CHAR(8)               false
      CHAR(7)          CHAR(8)               false
      CHAR(8)          CHAR(8)               false
      CHAR(9)          VARCHAR(8)            true
      CHAR(255)        VARCHAR(8)            true
      

      The current notation uses VARCHAR(8) to mean "a CHAR or VARCHAR dictionary column", no matter what the original data type is (CHAR or VARCHAR).
      Additionally, some tweaks happen when sending VARCHAR(4)..VARCHAR(7). PrimProc sees them as CHAR(8).

      Under terms of this task we'll change the code as follows:

      • PrimProc we'll see the exact ExeMgr side data type: true CHAR or true VARCHAR.
      • isDict will be serialized and deserialized (currently it's detected on the PrimProc side by testing the data type against VARCHAR(8)).

      The new fudged data type mapping will look as follows:

      ExeMgr Real Type PrimProc Fudged Type  PrimProc isDict
      ---------------- --------------------  ---------------
      VARCHAR(1)       VARCHAR(2)            false
      VARCHAR(2)       VARCHAR(4)            false
      VARCHAR(3)       VARCHAR(4)            false
      VARCHAR(4)       VARCHAR(8)            false
      VARCHAR(5)       VARCHAR(8)            false
      VARCHAR(6)       VARCHAR(8)            false
      VARCHAR(7)       VARCHAR(8)            false
      VARCHAR(8)       VARCHAR(8)            true
      VARCHAR(9)       VARCHAR(8)            true
      VARCHAR(255)     VARCHAR(8)            true
      VARCHAR(8000)    VARCHAR(8)            true
       
      CHAR(1)          CHAR(1)               false
      CHAR(2)          CHAR(2)               false
      CHAR(3)          CHAR(4)               false
      CHAR(4)          CHAR(4)               false
      CHAR(5)          CHAR(8)               false
      CHAR(6)          CHAR(8)               false
      CHAR(7)          CHAR(8)               false
      CHAR(8)          CHAR(8)               false
      CHAR(9)          CHAR(8)               true
      CHAR(255)        CHAR(8)               true
      

      Attachments

        Issue Links

          Activity

            People

              drrtuy Roman
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.