Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3572

Extent Map must have separate linear arrays for different column widths

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Icebox
    • Component/s: None
    • Labels:
      None

      Description

      Extent Map is now a linear structure that doesn't look appropriate for large data sets.
      The suggested design is to break a single linear array of EMEtries into 5 segments that corresponds to 1, 2, 4, 8, 16 bytes.
      ExtentMap will have two additional structures(presumably hashmaps) that will be used to pick an appropriate linear array to work with. First will map OID+partition+segment to width and the second LBID to width. Most of the current EM methods must become dispatchers that finds the width out and calls the apropriate templated method spec that in their turn will contain the logic from the current methods.

      The patch should take care of:

      • updating existing on-disk extent map layout on the first load
      • storing an updated ExtentMap on disk
      • existing methods changes

      The disk layout of ExtentMap will change. Here is the current layout:

      {
         "header":{
            "version":"int",
            "numberOfEMEntries":"int",
            "useless":"int"
         },
         "ExtentMapEntries":[
            {
               "EMEntry1":"string"
            },
            {
               "EMEntry2":"string"
            },
            {
               "EMEntry3":"string"
            }
         ]
      }
      

      The suggested layout will be:

      {
         "header":{
            "version":"int",
            "offsets":[
               {
                  "width_offset_type":"int",
                  "width":"int",
                  "offset":"int"
               },
               {
                  "width_offset_type":"int",
                  "width":"int",
                  "offset":"int"
               },
               {
                  "width_offset_type":"int",
                  "width":"int",
                  "offset":"int"
               },
               {
                  "width_offset_type":"int",
                  "width":"int",
                  "offset":"int"
               }
            ],
            "data":[
               {
                  "WidthSpecificExtentMapEntries":[
                     {
                        "EMEntry1":"binary"
                     },
                     {
                        "EMEntry2":"binary"
                     },
                     {
                        "EMEntry3":"binary"
                     }
                  ]
               },
               {
                  "WidthSpecificExtentMapEntries":[
                     {
                        "EMEntry1":"binary"
                     },
                     {
                        "EMEntry2":"binary"
                     },
                     {
                        "EMEntry3":"binary"
                     }
                  ]
               },
               {
                  "WidthSpecificExtentMapEntries":[
                     {
                        "EMEntry1":"binary"
                     },
                     {
                        "EMEntry2":"binary"
                     },
                     {
                        "EMEntry3":"binary"
                     }
                  ]
               }
            ]
         }
      

      The suggested change to Casual Partitioning structure is:

      struct EMCasualPartition_struct
      {
          int32_t sequenceNum;
          char isValid; //CP_INVALID - No min/max and no DML in progress. CP_UPDATING - Update in progress. CP_VALID- min/max is valid
          uint8_t keyLen; // up to 256 bytes
          uint8_t minMax[]; // memcmp-friendly (big-endian for integers) encoding of min and max prefixes.
                                       // The array is twice the keyLen field long, first goes min, then max.
       
          // here we need methods to compute record size, get the min/max keys, etc.
      };
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              toddstoffel Todd Stoffel
              Reporter:
              LinuxJedi Andrew Hutchings (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration