Details

    • Technical task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 10.5.2
    • Optimizer
    • None

    Description

      This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

      EDIT: this is the spec of what got implemented:

      Contents:

      1. Background
      1.1 Implementation details
      1.1.1 Why fields and items are treated differently
      2. Solution : Packed Sort Keys
      2.1 Packed key format
      2.2 Which format to use
      3. Special cases
      3.1 Handling very long strings
      3.2 Handling for long binary strings
      3.3 Handling very long strings with Packed sort keys
      4. Sort key columns in addon_fields

      1. Background

      Before this MDEV, filesort() sorted the data using mem-comparable keys.

      That is, if we wanted to sort by

        ORDER BY col1, col2, ... colN
      

      then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

      The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

      1.1 Implementation details

      filesort.cc: make_sortkey() is the function that produces a sort key from a record.

      The function treats Field and Item objects differently.

      class Field has

        void make_sort_key(uchar *buff, uint length);
        virtual void sort_string(uchar *buff,uint length)=0;
      

      sort_string produces mem-comparable image of the field value for each datatype. make_sort_key is a non-virtual function which handles encoding of SQL null values.

      For Items, Type_handler has a virtual function:

        virtual void make_sort_key(uchar *to, Item *item,
                                   const SORT_FIELD_ATTR *sort_field,
                                   Sort_param *param) const= 0;
      

      which various datatypes overload.

      1.1.1 Why fields and items are treated differently

      My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

      create table t7(a int);
      select * from t7 order by a; -- Q1
      select * from t7 order by a+1; -- Q2
      

      Q1 uses a Field. It's a Field_int, so the sort key is 4 bytes long.
      Q2 uses an Item. Its type handler is Type_handler_int_result, and the sort key is 8 bytes long.

      2. Solution : Packed Sort Keys

      Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

      However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

      2.1 Packed key format

      The keys are stored in a new variable-size data format called "packed".

      The format is as follows:

        sort_key_length packed_value_1 packed_value2 ...
      

      sort_key_length is the length of the whole key.
      Each packed value is encoded as follows:

        <null_byte=0>  // This is a an SQL NULL
        [<null_byte=1>] <packed_value>  // this a non-NULL value
      

      null_byte is present if the field/item is NULLable.
      SQL NULL is encoded as just one NULL-indicator byte. The value itself is omitted.

      The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

      The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

      <length> <string>
      

      2.2 Which format to use

      The advantage of Packed Key Format is potential space savings for variable-length fields.
      The disadvantages are:

      • it may actually take more space, because of sort_key_length and length fields.
      • The comparison function is more expensive.

      Currently the logic is: use Packed Key Format if we would save 20 or more bytes when constructing a sort key from values that have empty string for each packable component.

      3. Special cases

      3.1 Handling very long strings

      Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
      It is defined as:

      The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.

      3.2 Handling for long binary strings

      Long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

      3.3 Handling very long strings with Packed sort keys

      Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
      One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

      This combination of circumstances:

      • The string value is very long, so truncation is necessary
      • The collation is "complex", so truncation is dangerous

      is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

      4. Sort key columns in addon_fields

      Currently, each sort key column is actually stored twice
      1. as part of the sort key
      2. in the addon_fields
      This made total sense when sort key stored the mem-comparable image (from which one cannot restore the original value in general case). But since we now store the original value, we could also remove it from the addon_fields and further save space.
      This is a good idea but it is outside of the scope of this MDEV.

      Attachments

        Issue Links

          Activity

            varun Varun Gupta (Inactive) created issue -
            varun Varun Gupta (Inactive) made changes -
            Field Original Value New Value
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            Parent MDEV-6915 [ 46405 ]
            Issue Type Task [ 3 ] Technical task [ 7 ]
            varun Varun Gupta (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Igor Babaev [ igor ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            Assignee Igor Babaev [ igor ] Sergei Petrunia [ psergey ]
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            Ok, this is the spec of what got implemented.

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {codecpp}

            {{sort_string}} produces mem-comparable image of the field value for each
            datatype. make_sort_key is a non-virtual function which handles encoding of
            SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key Each packed value is encoded
            as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype.

            For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed
            format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL
            value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            Ok, this is the spec of what got implemented.

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {codecpp}

            {{sort_string}} produces mem-comparable image of the field value for each
            datatype. make_sort_key is a non-virtual function which handles encoding of
            SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key Each packed value is encoded
            as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype.

            For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed
            format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL
            value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {codecpp}

            {{sort_string}} produces mem-comparable image of the field value for each
            datatype. make_sort_key is a non-virtual function which handles encoding of
            SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key Each packed value is encoded
            as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype.

            For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed
            format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL
            value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {codecpp}

            {{sort_string}} produces mem-comparable image of the field value for each
            datatype. make_sort_key is a non-virtual function which handles encoding of
            SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key Each packed value is encoded
            as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype.

            For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed
            format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL
            value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sory Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key Each packed value is encoded
            as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype.

            For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed
            format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL
            value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sory Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key Each packed value is encoded
            as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype.

            For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed
            format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL
            value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sory Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key Each packed value is encodedas follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype.

            For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL
            value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sory Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key Each packed value is encodedas follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype.

            For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL
            value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sory Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sory Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before
            the truncation.

            h4. With packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort
            keys in this case.


            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Special cases packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.


            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Before this MDEV
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            Also, long binary strings receive special treatment. A sort key for the long binary string is truncated at max_sort_length bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Special cases packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.


            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed keys

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See issue comments for reasoning)

            h3. Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            1.2 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.


            *EDIT*: this is the spec of what got implemented:

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            1.2 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. Special cases

            h4. Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}


            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}

            The value itself is omitted.


            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}

            The value itself is omitted.


            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}

            The value itself is omitted.

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when storing empty string for each varchar value stored.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}

            The value itself is omitted.

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when storing empty string for each varchar value stored.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            h3. Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            2.2 Which format to use
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys
            4. Sort key columns in addon_fields

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}

            The value itself is omitted.

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when storing empty string for each varchar value that is part of the key.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3. 4. Sort key columns in addon_fields
            Currently, each sort key column is actually stored twice
            1. as part of the sort key
            2. in the addon_fields
            This made total sense when sort key stored the mem-comparable image (from which one cannot restore the original value in general case). But since we now store the original value, we could also remove it from the addon_fields and further save space.
            This is a good idea but it is outside of the scope of this MDEV.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            h3. Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            2.2 Which format to use
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys
            4. Sort key columns in addon_fields

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- This will use Field. It's a Field_int, so sort key will be 4 bytes long.
            select * from t7 order by a+1; -- This will use an Item. The type is Type_handler_int_result, the sort key is 8 bytes long.
            {code}

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}

            The value itself is omitted.

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when storing empty string for each varchar value that is part of the key.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3. 4. Sort key columns in addon_fields
            Currently, each sort key column is actually stored twice
            1. as part of the sort key
            2. in the addon_fields
            This made total sense when sort key stored the mem-comparable image (from which one cannot restore the original value in general case). But since we now store the original value, we could also remove it from the addon_fields and further save space.
            This is a good idea but it is outside of the scope of this MDEV.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            h3. Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            2.2 Which format to use
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys
            4. Sort key columns in addon_fields

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- Q1
            select * from t7 order by a+1; -- Q2
            {code}
            Q1 uses a Field. It's a Field_int, so the sort key is 4 bytes long.
            Q2 uses an Item. Its type handler is Type_handler_int_result, and the sort key is 8 bytes long.

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}

            The value itself is omitted.

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when storing empty string for each varchar value that is part of the key.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3. 4. Sort key columns in addon_fields
            Currently, each sort key column is actually stored twice
            1. as part of the sort key
            2. in the addon_fields
            This made total sense when sort key stored the mem-comparable image (from which one cannot restore the original value in general case). But since we now store the original value, we could also remove it from the addon_fields and further save space.
            This is a good idea but it is outside of the scope of this MDEV.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            h3. Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            2.2 Which format to use
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys
            4. Sort key columns in addon_fields

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- Q1
            select * from t7 order by a+1; -- Q2
            {code}
            Q1 uses a Field. It's a Field_int, so the sort key is 4 bytes long.
            Q2 uses an Item. Its type handler is Type_handler_int_result, and the sort key is 8 bytes long.

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              [<null_byte=1>] <packed_value>
            {code}

            null_byte is present if the field/item is NULLable.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            SQL NULL is encoded as one NULL-indicator byte which indicates this is a NULL value:

            {code}
              <null_byte=0> // This is a an SQL NULL value
            {code}

            The value itself is omitted.

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when storing empty string for each varchar value that is part of the key.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3. 4. Sort key columns in addon_fields
            Currently, each sort key column is actually stored twice
            1. as part of the sort key
            2. in the addon_fields
            This made total sense when sort key stored the mem-comparable image (from which one cannot restore the original value in general case). But since we now store the original value, we could also remove it from the addon_fields and further save space.
            This is a good idea but it is outside of the scope of this MDEV.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            h3. Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            2.2 Which format to use
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys
            4. Sort key columns in addon_fields

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- Q1
            select * from t7 order by a+1; -- Q2
            {code}
            Q1 uses a Field. It's a Field_int, so the sort key is 4 bytes long.
            Q2 uses an Item. Its type handler is Type_handler_int_result, and the sort key is 8 bytes long.

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              <null_byte=0> // This is a an SQL NULL
              [<null_byte=1>] <packed_value> // this a non-NULL value
            {code}

            null_byte is present if the field/item is NULLable.
            SQL NULL is encoded as just one NULL-indicator byte. The value itself is omitted.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when storing empty string for each varchar value that is part of the key.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3. 4. Sort key columns in addon_fields
            Currently, each sort key column is actually stored twice
            1. as part of the sort key
            2. in the addon_fields
            This made total sense when sort key stored the mem-comparable image (from which one cannot restore the original value in general case). But since we now store the original value, we could also remove it from the addon_fields and further save space.
            This is a good idea but it is outside of the scope of this MDEV.
            psergei Sergei Petrunia made changes -
            Description This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            h3. Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            2.2 Which format to use
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys
            4. Sort key columns in addon_fields

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- Q1
            select * from t7 order by a+1; -- Q2
            {code}
            Q1 uses a Field. It's a Field_int, so the sort key is 4 bytes long.
            Q2 uses an Item. Its type handler is Type_handler_int_result, and the sort key is 8 bytes long.

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              <null_byte=0> // This is a an SQL NULL
              [<null_byte=1>] <packed_value> // this a non-NULL value
            {code}

            null_byte is present if the field/item is NULLable.
            SQL NULL is encoded as just one NULL-indicator byte. The value itself is omitted.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when storing empty string for each varchar value that is part of the key.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3. 4. Sort key columns in addon_fields
            Currently, each sort key column is actually stored twice
            1. as part of the sort key
            2. in the addon_fields
            This made total sense when sort key stored the mem-comparable image (from which one cannot restore the original value in general case). But since we now store the original value, we could also remove it from the addon_fields and further save space.
            This is a good idea but it is outside of the scope of this MDEV.
            This task in a part of MDEV-6915, where we would like the pack the values of the sort key inside the sort buffer for each record.

            *EDIT*: this is the spec of what got implemented:

            h3. Contents:
            1. Background
            1.1 Implementation details
            1.1.1 Why fields and items are treated differently
            2. Solution : Packed Sort Keys
            2.1 Packed key format
            2.2 Which format to use
            3. Special cases
            3.1 Handling very long strings
            3.2 Handling for long binary strings
            3.3 Handling very long strings with Packed sort keys
            4. Sort key columns in addon_fields

            h3. 1. Background
            Before this MDEV, filesort() sorted the data using mem-comparable keys.

            That is, if we wanted to sort by
            {code}
              ORDER BY col1, col2, ... colN
            {code}
            then the filesort code would for each row generate one "Sort Key" and then sort the rows by their Sort Keys.

            The Sort Keys are mem-comparable (that is, are compared by memcmp()) and they are fixed size: the sort key has the same length regardless of what value it represents. This causes inefficient memory usage.

            h4. 1.1 Implementation details
            filesort.cc: make_sortkey() is the function that produces a sort key from a record.

            The function treats Field and Item objects differently.

            class Field has
            {code:cpp}
              void make_sort_key(uchar *buff, uint length);
              virtual void sort_string(uchar *buff,uint length)=0;
            {code}

            {{sort_string}} produces mem-comparable image of the field value for each datatype. {{make_sort_key}} is a non-virtual function which handles encoding of SQL null values.

            For Items, Type_handler has a virtual function:
            {code:cpp}
              virtual void make_sort_key(uchar *to, Item *item,
                                         const SORT_FIELD_ATTR *sort_field,
                                         Sort_param *param) const= 0;
            {code}
            which various datatypes overload.

            h4. 1.1.1 Why fields and items are treated differently
            My (Sergey P) guess is as follows: if we use fields we get more compact sort keys. For example:

            {code:sql}
            create table t7(a int);
            select * from t7 order by a; -- Q1
            select * from t7 order by a+1; -- Q2
            {code}
            Q1 uses a Field. It's a Field_int, so the sort key is 4 bytes long.
            Q2 uses an Item. Its type handler is Type_handler_int_result, and the sort key is 8 bytes long.

            h3. 2. Solution : Packed Sort Keys

            Note that one can have mem-comparable keys are that are not fixed-size. MyRocks uses such encoding for example.

            However for this MDEV it was decided to store the original (non-mem-comparable) values instead, and use a datatype-aware key comparison function. (See the issue comments for the reasoning)

            h3. 2.1 Packed key format

            The keys are stored in a new variable-size data format called "packed".

            The format is as follows:

            {code}
              sort_key_length packed_value_1 packed_value2 ...
            {code}

            sort_key_length is the length of the whole key.
            Each packed value is encoded as follows:

            {code}
              <null_byte=0> // This is a an SQL NULL
              [<null_byte=1>] <packed_value> // this a non-NULL value
            {code}

            null_byte is present if the field/item is NULLable.
            SQL NULL is encoded as just one NULL-indicator byte. The value itself is omitted.

            The format of the packed_value depends on the datatype. For "non-packable" datatypes it is just their mem-comparable form, as before.

            The "packable" datatypes are currently variable-length strings and the packed format for them is (for binary blobs, see a note below):

            {code}
            <length> <string>
            {code}

            h3. 2.2 Which format to use
            The advantage of Packed Key Format is potential space savings for variable-length fields.
            The disadvantages are:
            * it may actually take more space, because of {{sort_key_length}} and {{length}} fields.
            * The comparison function is more expensive.

            Currently the logic is: *use Packed Key Format if we would save 20 or more bytes when constructing a sort key from values that have empty string for each packable component.*

            h3. 3. Special cases

            h4. 3.1 Handling very long strings
            Before this MDEV, the size of sort key was limited by @@max_sort_length variable.
            It is defined as:
            {quote}
            The number of bytes to use when sorting data values. The server uses only the first max_sort_length bytes of each value and ignores the rest.
            {quote}

            h4. 3.2 Handling for long binary strings
            Long binary strings receive special treatment. A sort key for the long binary string is truncated at {{max_sort_length}} bytes like described above, but then a "suffix" is appended which contains the total length of the value before the truncation.

            h4. 3.3 Handling very long strings with Packed sort keys

            Truncating multi-byte string at N bytes is not safe because one can cut in the middle of a character.
            One is tempted to solve this by discarding the partial character but that's also not a good idea as in some collations multiple characters may produce one weight (this is called "contraction").

            This combination of circumstances:
            - The string value is very long, so truncation is necessary
            - The collation is "complex", so truncation is dangerous

            is deemed to be relatively rare so it was decided to just use the non-packed sort keys in this case.

            h3. 4. Sort key columns in addon_fields
            Currently, each sort key column is actually stored twice
            1. as part of the sort key
            2. in the addon_fields
            This made total sense when sort key stored the mem-comparable image (from which one cannot restore the original value in general case). But since we now store the original value, we could also remove it from the addon_fields and further save space.
            This is a good idea but it is outside of the scope of this MDEV.
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            varun Varun Gupta (Inactive) made changes -
            Assignee Sergei Petrunia [ psergey ] Varun Gupta [ varun ]
            varun Varun Gupta (Inactive) made changes -
            Fix Version/s 10.5.2 [ 24030 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]
            varun Varun Gupta (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 103428 ] MariaDB v4 [ 157252 ]
            oleg.smirnov Oleg Smirnov made changes -
            monty Michael Widenius made changes -

            People

              varun Varun Gupta (Inactive)
              varun Varun Gupta (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.