[MDEV-21829] Use packed sort keys in Unique objects - Jira

Details

Type: New Feature
Status: Stalled (View Workflow)
Priority: Critical
Resolution: Unresolved
Fix Version/s: 12.1
Component/s: Optimizer
Labels:
- foundation

Sprint:
Server 12.1 dev sprint

Description

The task deals with packing the values stored in the Unique tree for each record.

Background

In the implementation of Unique for variable sized fields like VARCHAR,
space allocated for each record is the maximum length for the field though
in most practical cases the data that is being stored in such columns
will be way less than the maximum length in most cases.

For example: lets say there is a query like:

SELECT COUNT(DISTINT col)  FROM t1;

and column col is defined as VARCHAR(200) in latin character set, so for each
record in table t1 the Unique tree would reserve 200 bytes to store the value
for column col. This is not the best way to utilize the space because this would lead
to filling of the Unique tree quickly forcing the Unique tree to be flushed
to the disk and then start writing to the new Unique tree.
To get the final result then a sort-merge pass needs to be done on the chunks
of Unique trees written to the disk

So this limitation prompted to store packed values inside the Unique tree.
This task is an extension to ~~MDEV-21580~~.

Unique class is currently used in

agg_func(DISTINCT col)
Here most aggregate functions like SUM, AVG accept only fixed size arguments
so it is not beneficial to use packing for these.
Currently packing is done for COUNT and GROUP_CONCAT (or JSON_ARRAYAGG)
aggregate function as these are meaningful

index-merge stores row-ids
index merge stores row-ids which are of fixed size, so packing makes no sense here

Engine Independent Table statistics
Packing is done here for variable length data types

Implementation details

1) Packed key format for Unique tree

The keys are stored in a new variable-size data format called "packed".

Lets say the record in the Unique tree looks like:

  key_length packed_value_1 packed_value2

key_length is the length of the whole key. key_length is stored in 4 bytes,
that is for each record we reserve 4 bytes to store the length of the key.

Each packed value is encoded as follows

<length> <value>

The format of the packed_value depends on the data type.
For "non-packable" data types like (INT, DOUBLE) there is no packing.

NULLS are ignored for COUNT(DISTINCT col) and GROUP_CONCAT(DISTINCT col1 ....) . For EITS we try
to find the number of distinct NON NULL values, so NULLs are not required for that.

NULLS are only considered for JSON_ARRAYAGG currently and the packed value
in this case is encoded as

<null_byte> <length> <value>

Note:
For NULL values we don't store anything in the length and value bytes, only the null byte is set to zero.

Note:
Let say we have a query like
SELECT COUNT(DISTINCT a) FROM t1

This is the case where the Unique tree would store only one column. So when the key is of single component
then the format looks like <key_length> <length> <value>. The key_length here would hold the value of <length> + bytes allocated to store length (that is let say for VARCHAR , size less than 255 has 1 byte of length else it is 2).
We could try to save the 4 bytes for key length in this case but we have ignored it for now as this would lead
to lots of changes in the code (mostly in merge_buffers, this code is shared by filesort and Unique class).

Note:
The packing format is almost the same as in Filesort, with the exception that null bytes are stored only with JSON_ARRAYAGG.

How is packing done ?
A packed record is made by using the pack() function for all the fields in the record.
and when the the duplicates are removed then unpack() is called to store the value back in the record pointers
that is in Field::ptr.

2) Which format to use

Contrary to sort keys in Filesort, keys in Unique tree don't have mem-comparable
form and the comparison of keys is always done on original values of columns.
This means that the comparisons time would not be increased by using packed
values. Due to this reasoning packed format is used always where possible.

We may consider adding a heuristic to disallow packing for small VARCHAR's and CHARs.

Attachments

Issue Links

blocks

MDEV-6529 optimize VARCHAR temp storage during EITS ANALYZE

Stalled

causes

MDEV-33457 Wrong EITS histogram for large table sizes with charset 'utf8mb4'

Open

relates to

MDEV-6915 Allow packed keys and packed values of non-sorted fields in the sort buffer

Closed

MDEV-6181 EITS could eat all tmpdir space and hang

Closed

MDEV-21580 Allow packed sort keys in sort buffer

Closed

MDEV-23502 Performance testing for packed keys in Unique objects

Open

MDEV-33305 Testing for MDEV-21829: Packed keys in Unique

Stalled

(2 relates to)

Activity

Ascending order - Click to sort in descending order

View 7 older comments

Vicențiu Ciorbaru added a comment - 2024-01-12 06:37

https://github.com/MariaDB/server/pull/2993

Vicențiu Ciorbaru added a comment - 2024-01-12 06:37 https://github.com/MariaDB/server/pull/2993

Sergei Petrunia added a comment - 2024-01-24 13:22 - edited

Task for testing this MDEV-33305 (NOT including performance)

Sergei Petrunia added a comment - 2024-01-24 13:22 - edited Task for testing this MDEV-33305 (NOT including performance)

Michael Widenius added a comment - 2024-02-27 15:13

I have reviewed about 35% of the code, plan to review the rest during this week.
After that, I will give it back to Vicentiu to fix things found in review (should take 1-2 hours) and then we have to tackle
https://jira.mariadb.org/browse/MDEV-33457 and then more testing.

Michael Widenius added a comment - 2024-02-27 15:13 I have reviewed about 35% of the code, plan to review the rest during this week. After that, I will give it back to Vicentiu to fix things found in review (should take 1-2 hours) and then we have to tackle https://jira.mariadb.org/browse/MDEV-33457 and then more testing.

Michael Widenius added a comment - 2024-10-02 09:33

Vicentiu started working on this again 2 weeks ago to solve the issues found in the review. I expect to get an updated version withing 3 weeks after Vicentiu is back from the Google summer of code event in USA.
What is left is second review (I do not expect any future changes needed) and then QA before getting pushed into 11.7 or 11.8

Michael Widenius added a comment - 2024-10-02 09:33 Vicentiu started working on this again 2 weeks ago to solve the issues found in the review. I expect to get an updated version withing 3 weeks after Vicentiu is back from the Google summer of code event in USA. What is left is second review (I do not expect any future changes needed) and then QA before getting pushed into 11.7 or 11.8

Michael Widenius added a comment - 2025-01-08 12:58

This code is in review and rework by Vicentiu.
Vicentiu expect to have the rework solved next week, after which it comes back to me for a last review, which should be quick.

Michael Widenius added a comment - 2025-01-08 12:58 This code is in review and rework by Vicentiu. Vicentiu expect to have the rework solved next week, after which it comes back to me for a last review, which should be quick.

MariaDB Server

Use packed sort keys in Unique objects