Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21829

Use packed sort keys in Unique objects

    XMLWordPrintable

Details

    • New Feature
    • Status: In Review (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 11.9
    • Optimizer
    • None

    Description

      The task deals with packing the values stored in the Unique tree for each record.

      Background

      In the implementation of Unique for variable sized fields like VARCHAR,
      space allocated for each record is the maximum length for the field though
      in most practical cases the data that is being stored in such columns
      will be way less than the maximum length in most cases.

      For example: lets say there is a query like:

      SELECT COUNT(DISTINT col)  FROM t1;
      

      and column col is defined as VARCHAR(200) in latin character set, so for each
      record in table t1 the Unique tree would reserve 200 bytes to store the value
      for column col. This is not the best way to utilize the space because this would lead
      to filling of the Unique tree quickly forcing the Unique tree to be flushed
      to the disk and then start writing to the new Unique tree.
      To get the final result then a sort-merge pass needs to be done on the chunks
      of Unique trees written to the disk

      So this limitation prompted to store packed values inside the Unique tree.
      This task is an extension to MDEV-21580.

      Unique class is currently used in

      • agg_func(DISTINCT col)
        Here most aggregate functions like SUM, AVG accept only fixed size arguments
        so it is not beneficial to use packing for these.
        Currently packing is done for COUNT and GROUP_CONCAT (or JSON_ARRAYAGG)
        aggregate function as these are meaningful
      • index-merge stores row-ids
        index merge stores row-ids which are of fixed size, so packing makes no sense here
      • Engine Independent Table statistics
        Packing is done here for variable length data types

      Implementation details

      1) Packed key format for Unique tree

      The keys are stored in a new variable-size data format called "packed".

      Lets say the record in the Unique tree looks like:

        key_length packed_value_1 packed_value2
      

      key_length is the length of the whole key. key_length is stored in 4 bytes,
      that is for each record we reserve 4 bytes to store the length of the key.

      Each packed value is encoded as follows

      <length> <value>
      

      The format of the packed_value depends on the data type.
      For "non-packable" data types like (INT, DOUBLE) there is no packing.

      NULLS are ignored for COUNT(DISTINCT col) and GROUP_CONCAT(DISTINCT col1 ....) . For EITS we try
      to find the number of distinct NON NULL values, so NULLs are not required for that.

      NULLS are only considered for JSON_ARRAYAGG currently and the packed value
      in this case is encoded as

      <null_byte> <length> <value>
      

      Note:
      For NULL values we don't store anything in the length and value bytes, only the null byte is set to zero.

      Note:
      Let say we have a query like
      SELECT COUNT(DISTINCT a) FROM t1

      This is the case where the Unique tree would store only one column. So when the key is of single component
      then the format looks like <key_length> <length> <value>. The key_length here would hold the value of <length> + bytes allocated to store length (that is let say for VARCHAR , size less than 255 has 1 byte of length else it is 2).
      We could try to save the 4 bytes for key length in this case but we have ignored it for now as this would lead
      to lots of changes in the code (mostly in merge_buffers, this code is shared by filesort and Unique class).

      Note:
      The packing format is almost the same as in Filesort, with the exception that null bytes are stored only with JSON_ARRAYAGG.

      How is packing done ?
      A packed record is made by using the pack() function for all the fields in the record.
      and when the the duplicates are removed then unpack() is called to store the value back in the record pointers
      that is in Field::ptr.

      2) Which format to use

      Contrary to sort keys in Filesort, keys in Unique tree don't have mem-comparable
      form and the comparison of keys is always done on original values of columns.
      This means that the comparisons time would not be increased by using packed
      values. Due to this reasoning packed format is used always where possible.

      We may consider adding a heuristic to disallow packing for small VARCHAR's and CHARs.

      Attachments

        Issue Links

          Activity

            People

              cvicentiu Vicențiu Ciorbaru
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.