[MCOL-3572] Extent Map must have separate linear arrays for different column widths Created: 2019-10-24  Updated: 2022-11-18  Resolved: 2022-11-18

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: None
Fix Version/s: Icebox

Type: New Feature Priority: Major
Reporter: Andrew Hutchings (Inactive) Assignee: Todd Stoffel (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
PartOf
is part of MCOL-4343 umbrella for tech debt issues Open

 Description   

Extent Map is now a linear structure that doesn't look appropriate for large data sets.
The suggested design is to break a single linear array of EMEtries into 5 segments that corresponds to 1, 2, 4, 8, 16 bytes.
ExtentMap will have two additional structures(presumably hashmaps) that will be used to pick an appropriate linear array to work with. First will map OID+partition+segment to width and the second LBID to width. Most of the current EM methods must become dispatchers that finds the width out and calls the apropriate templated method spec that in their turn will contain the logic from the current methods.

The patch should take care of:

  • updating existing on-disk extent map layout on the first load
  • storing an updated ExtentMap on disk
  • existing methods changes

The disk layout of ExtentMap will change. Here is the current layout:

{
   "header":{
      "version":"int",
      "numberOfEMEntries":"int",
      "useless":"int"
   },
   "ExtentMapEntries":[
      {
         "EMEntry1":"string"
      },
      {
         "EMEntry2":"string"
      },
      {
         "EMEntry3":"string"
      }
   ]
}

The suggested layout will be:

{
   "header":{
      "version":"int",
      "offsets":[
         {
            "width_offset_type":"int",
            "width":"int",
            "offset":"int"
         },
         {
            "width_offset_type":"int",
            "width":"int",
            "offset":"int"
         },
         {
            "width_offset_type":"int",
            "width":"int",
            "offset":"int"
         },
         {
            "width_offset_type":"int",
            "width":"int",
            "offset":"int"
         }
      ],
      "data":[
         {
            "WidthSpecificExtentMapEntries":[
               {
                  "EMEntry1":"binary"
               },
               {
                  "EMEntry2":"binary"
               },
               {
                  "EMEntry3":"binary"
               }
            ]
         },
         {
            "WidthSpecificExtentMapEntries":[
               {
                  "EMEntry1":"binary"
               },
               {
                  "EMEntry2":"binary"
               },
               {
                  "EMEntry3":"binary"
               }
            ]
         },
         {
            "WidthSpecificExtentMapEntries":[
               {
                  "EMEntry1":"binary"
               },
               {
                  "EMEntry2":"binary"
               },
               {
                  "EMEntry3":"binary"
               }
            ]
         }
      ]
   }

The suggested change to Casual Partitioning structure is:

struct EMCasualPartition_struct
{
    int32_t sequenceNum;
    char isValid; //CP_INVALID - No min/max and no DML in progress. CP_UPDATING - Update in progress. CP_VALID- min/max is valid
    uint8_t keyLen; // up to 256 bytes
    uint8_t minMax[]; // memcmp-friendly (big-endian for integers) encoding of min and max prefixes.
                                 // The array is twice the keyLen field long, first goes min, then max.
 
    // here we need methods to compute record size, get the min/max keys, etc.
};



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2019-10-24 ]

The size of the compression block may be a limitation here.

Generated at Thu Feb 08 02:43:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.