[MDEV-21829] Use packed sort keys in Unique objects Created: 2020-02-27  Updated: 2024-01-24

Status: In Review
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 11.5

Type: New Feature Priority: Major
Reporter: Sergei Petrunia Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-6915 Allow packed keys and packed values o... Closed
relates to MDEV-6181 EITS could eat all tmpdir space and hang Closed
relates to MDEV-6529 optimize VARCHAR temp storage during ... Confirmed
relates to MDEV-21580 Allow packed sort keys in sort buffer Closed
relates to MDEV-23502 Performance testing for packed keys i... Open
relates to MDEV-33305 Testing for MDEV-21829: Packed keys i... In Progress

 Description   

The task deals with packing the values stored in the Unique tree for each record.

Background

In the implementation of Unique for variable sized fields like VARCHAR,
space allocated for each record is the maximum length for the field though
in most practical cases the data that is being stored in such columns
will be way less than the maximum length in most cases.

For example: lets say there is a query like:

SELECT COUNT(DISTINT col)  FROM t1;

and column col is defined as VARCHAR(200) in latin character set, so for each
record in table t1 the Unique tree would reserve 200 bytes to store the value
for column col. This is not the best way to utilize the space because this would lead
to filling of the Unique tree quickly forcing the Unique tree to be flushed
to the disk and then start writing to the new Unique tree.
To get the final result then a sort-merge pass needs to be done on the chunks
of Unique trees written to the disk

So this limitation prompted to store packed values inside the Unique tree.
This task is an extension to MDEV-21580.

Unique class is currently used in

  • agg_func(DISTINCT col)
    Here most aggregate functions like SUM, AVG accept only fixed size arguments
    so it is not beneficial to use packing for these.
    Currently packing is done for COUNT and GROUP_CONCAT (or JSON_ARRAYAGG)
    aggregate function as these are meaningful
  • index-merge stores row-ids
    index merge stores row-ids which are of fixed size, so packing makes no sense here
  • Engine Independent Table statistics
    Packing is done here for variable length data types

Implementation details

1) Packed key format for Unique tree

The keys are stored in a new variable-size data format called "packed".

Lets say the record in the Unique tree looks like:

  key_length packed_value_1 packed_value2

key_length is the length of the whole key. key_length is stored in 4 bytes,
that is for each record we reserve 4 bytes to store the length of the key.

Each packed value is encoded as follows

<length> <value>

The format of the packed_value depends on the data type.
For "non-packable" data types like (INT, DOUBLE) there is no packing.

NULLS are ignored for COUNT(DISTINCT col) and GROUP_CONCAT(DISTINCT col1 ....) . For EITS we try
to find the number of distinct NON NULL values, so NULLs are not required for that.

NULLS are only considered for JSON_ARRAYAGG currently and the packed value
in this case is encoded as

<null_byte> <length> <value>

Note:
For NULL values we don't store anything in the length and value bytes, only the null byte is set to zero.

Note:
Let say we have a query like
SELECT COUNT(DISTINCT a) FROM t1

This is the case where the Unique tree would store only one column. So when the key is of single component
then the format looks like <key_length> <length> <value>. The key_length here would hold the value of <length> + bytes allocated to store length (that is let say for VARCHAR , size less than 255 has 1 byte of length else it is 2).
We could try to save the 4 bytes for key length in this case but we have ignored it for now as this would lead
to lots of changes in the code (mostly in merge_buffers, this code is shared by filesort and Unique class).

Note:
The packing format is almost the same as in Filesort, with the exception that null bytes are stored only with JSON_ARRAYAGG.

How is packing done ?
A packed record is made by using the pack() function for all the fields in the record.
and when the the duplicates are removed then unpack() is called to store the value back in the record pointers
that is in Field::ptr.

2) Which format to use

Contrary to sort keys in Filesort, keys in Unique tree don't have mem-comparable
form and the comparison of keys is always done on original values of columns.
This means that the comparisons time would not be increased by using packed
values. Due to this reasoning packed format is used always where possible.

We may consider adding a heuristic to disallow packing for small VARCHAR's and CHARs.



 Comments   
Comment by Varun Gupta (Inactive) [ 2020-03-17 ]

Debugging the case with group_concat, I see that the keys are compared with collation specific function and not memcmp

* thread #2, stop reason = step in
  * frame #0: 0x000000010058f230 mysqld`charset_info_st::strnncollsp(this=0x0000000104846760, a="0", alen=1, b="1", blen=1) const at m_ctype.h:771:13
    frame #1: 0x00000001000b1686 mysqld`Field_varstring::cmp_max(this=0x0000619000056f28, a_ptr="\x010", b_ptr="\x011", max_len=4294967295) const at field.cc:7707:26
    frame #2: 0x00000001000e989c mysqld`Field_varstring::cmp(this=0x0000619000056f28, a="\x010", b="\x011") const at field.h:4092:12
    frame #3: 0x0000000100523e39 mysqld`::group_concat_key_cmp_with_distinct(arg=0x000062b0000629a0, key1=0x000063100003c8c0, key2=0x0000619000057019) at item_sum.cc:3547:21
    frame #4: 0x000000010340edfc mysqld`tree_insert(tree=0x000062b000066178, key=0x0000619000057019, key_size=0, custom_arg=0x000062b0000629a0) at tree.c:249:9
    frame #5: 0x0000000100ba7f0b mysqld`Unique::unique_add(this=0x000062b000065fa0, ptr=0x0000619000057019) at uniques.h:66:5
    frame #6: 0x000000010052c2f4 mysqld`Item_func_group_concat::add(this=0x000062b0000629a0, exclude_nulls=true) at item_sum.cc:4029:20

But the allocation for each node of the unique tree is still fixed size

Comment by Varun Gupta (Inactive) [ 2020-03-19 ]

The branch is 10.5-mdev21829

Comment by Varun Gupta (Inactive) [ 2020-06-12 ]

The code needs to be adjusted according to the with the pushes regarding MDEV-11563 and MDEV-22089

Comment by Varun Gupta (Inactive) [ 2020-06-23 ]

After going through the code again, I see that packing is currently not handled for JSON_ARRAYAGG function.

Comment by Varun Gupta (Inactive) [ 2020-06-30 ]

The latest patch is in the branch 10.5-mdev21829

Comment by Varun Gupta (Inactive) [ 2020-12-07 ]

The updated branch is bb-10.6-mdev21829

Comment by Michael Widenius [ 2023-11-17 ]

The old code is now in second review and reworked to take into account some comments from Igor and me.
I hope to have new code that can be given to QA during the new 2 weeks.

Comment by Vicențiu Ciorbaru [ 2024-01-12 ]

https://github.com/MariaDB/server/pull/2993

Comment by Sergei Petrunia [ 2024-01-24 ]

Task for testing this MDEV-33305 (NOT including performance)

Generated at Thu Feb 08 09:10:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.