Details
-
New Feature
-
Status: In Review (View Workflow)
-
Critical
-
Resolution: Unresolved
-
None
Description
The task deals with packing the values stored in the Unique tree for each record.
Background
In the implementation of Unique for variable sized fields like VARCHAR,
space allocated for each record is the maximum length for the field though
in most practical cases the data that is being stored in such columns
will be way less than the maximum length in most cases.
For example: lets say there is a query like:
SELECT COUNT(DISTINT col) FROM t1; |
and column col is defined as VARCHAR(200) in latin character set, so for each
record in table t1 the Unique tree would reserve 200 bytes to store the value
for column col. This is not the best way to utilize the space because this would lead
to filling of the Unique tree quickly forcing the Unique tree to be flushed
to the disk and then start writing to the new Unique tree.
To get the final result then a sort-merge pass needs to be done on the chunks
of Unique trees written to the disk
So this limitation prompted to store packed values inside the Unique tree.
This task is an extension to MDEV-21580.
Unique class is currently used in
- agg_func(DISTINCT col)
Here most aggregate functions like SUM, AVG accept only fixed size arguments
so it is not beneficial to use packing for these.
Currently packing is done for COUNT and GROUP_CONCAT (or JSON_ARRAYAGG)
aggregate function as these are meaningful
- index-merge stores row-ids
index merge stores row-ids which are of fixed size, so packing makes no sense here
- Engine Independent Table statistics
Packing is done here for variable length data types
Implementation details
1) Packed key format for Unique tree
The keys are stored in a new variable-size data format called "packed".
Lets say the record in the Unique tree looks like:
key_length packed_value_1 packed_value2
|
key_length is the length of the whole key. key_length is stored in 4 bytes,
that is for each record we reserve 4 bytes to store the length of the key.
Each packed value is encoded as follows
<length> <value>
|
The format of the packed_value depends on the data type.
For "non-packable" data types like (INT, DOUBLE) there is no packing.
NULLS are ignored for COUNT(DISTINCT col) and GROUP_CONCAT(DISTINCT col1 ....) . For EITS we try
to find the number of distinct NON NULL values, so NULLs are not required for that.
NULLS are only considered for JSON_ARRAYAGG currently and the packed value
in this case is encoded as
<null_byte> <length> <value>
|
Note:
For NULL values we don't store anything in the length and value bytes, only the null byte is set to zero.
Note:
Let say we have a query like
SELECT COUNT(DISTINCT a) FROM t1
This is the case where the Unique tree would store only one column. So when the key is of single component
then the format looks like <key_length> <length> <value>. The key_length here would hold the value of <length> + bytes allocated to store length (that is let say for VARCHAR , size less than 255 has 1 byte of length else it is 2).
We could try to save the 4 bytes for key length in this case but we have ignored it for now as this would lead
to lots of changes in the code (mostly in merge_buffers, this code is shared by filesort and Unique class).
Note:
The packing format is almost the same as in Filesort, with the exception that null bytes are stored only with JSON_ARRAYAGG.
How is packing done ?
A packed record is made by using the pack() function for all the fields in the record.
and when the the duplicates are removed then unpack() is called to store the value back in the record pointers
that is in Field::ptr.
2) Which format to use
Contrary to sort keys in Filesort, keys in Unique tree don't have mem-comparable
form and the comparison of keys is always done on original values of columns.
This means that the comparisons time would not be increased by using packed
values. Due to this reasoning packed format is used always where possible.
We may consider adding a heuristic to disallow packing for small VARCHAR's and CHARs.
Attachments
Issue Links
- blocks
-
MDEV-6529 optimize VARCHAR temp storage during EITS ANALYZE
- Stalled
- causes
-
MDEV-33457 Wrong EITS histogram for large table sizes with charset 'utf8mb4'
- Open
- relates to
-
MDEV-6915 Allow packed keys and packed values of non-sorted fields in the sort buffer
- Closed
-
MDEV-6181 EITS could eat all tmpdir space and hang
- Closed
-
MDEV-21580 Allow packed sort keys in sort buffer
- Closed
-
MDEV-23502 Performance testing for packed keys in Unique objects
- Open
-
MDEV-33305 Testing for MDEV-21829: Packed keys in Unique
- Stalled