[MDEV-26664] Store UUIDs in a more efficient manner - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 10.7.1
Component/s: Data types
Labels:
- uuid

Description

UUIDs are stored in a binary form, but not byte-swapped like in MySQL's uuid_to_bin https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_uuid-to-bin

The order provided by UUID_TO_BIN() is not really friendly to index prefix compression:
it only reorders the timestamp segments, while the clock-seq and the node-ID segments are still at the end. This ruins the whole idea of prefix compression.

Let's sort UUID in a really index-friendly order, by moving the constant part to the left and the variable part to the right.

So for comparison purposes we'll reorder segments, but keep the byte order inside the segments (i.e. in big endian order):

MariaDB binary UUIDv1 representation, as returned by the UUID() function:

  llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn

Binary sortable representation:

   nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll

Sign           Section               Bits   Bytes  Pos   PosBinSortable

-------------  -------               ----   -----  ---   --------------

llllllll       time low              32     4        0   12

mmmm           time mid              16     2        4   10

Vhhh           version and time hi   16     2        6   8

vsss           variant and clock seq 16     2        8   6

nnnnnnnnnnnn   node ID               48     6       10   0

Attachments

Issue Links

causes

MDEV-29959 UUID Sorting

Closed

is caused by

MDEV-4958 Adding datatype UUID

Closed

relates to

MDEV-20477 Merge binlog extended metadata support from the upstream

Closed

MDEV-27588 MyISAM/Aria descending index compression is not as efficient as ascending

Closed

links to

How are GUIDs compared in SQL Server 2005?

SQL Server GUID sort algorithm. Why?

The uniqueidentifier data type in SQL server

(2 links to)

Activity

Ascending order - Click to sort in descending order

Alexander Barkov added a comment - 2021-10-12 11:10 - edited

SQL Server has a data type uniqueidentifier with a good sort order.

Tested with SQL Server 2017:

CREATE TABLE t1 (id uniqueidentifier default NewSequentialId());

INSERT INTO t1 VALUES ('00000000-0000-0000-0000-000000000000');

INSERT INTO t1 VALUES ('F0000000-0000-0000-0000-000000000000');

INSERT INTO t1 VALUES ('0000000F-0000-0000-0000-000000000000');

INSERT INTO t1 VALUES ('00000000-F000-0000-0000-000000000000');

INSERT INTO t1 VALUES ('00000000-000F-0000-0000-000000000000');

INSERT INTO t1 VALUES ('00000000-0000-F000-0000-000000000000');

INSERT INTO t1 VALUES ('00000000-0000-000F-0000-000000000000');

INSERT INTO t1 VALUES ('00000000-0000-0000-F000-000000000000');

INSERT INTO t1 VALUES ('00000000-0000-0000-000F-000000000000');

INSERT INTO t1 VALUES ('00000000-0000-0000-0000-F00000000000');

INSERT INTO t1 VALUES ('00000000-0000-0000-0000-00000000000F');

INSERT INTO t1 VALUES ('01C0423E-F36B-1410-8001-F00000000001');

INSERT INTO t1 VALUES ('01C0423E-F36B-1410-F001-F00000000001');

INSERT INTO t1 VALUES ('01C0423E-F36B-F410-8001-F00000000001');

INSERT INTO t1 VALUES ('01C0423E-FF6B-F410-8001-F00000000001');

INSERT INTO t1 VALUES ('01C0423E-F36B-1410-8001-400000000001');

INSERT INTO t1 VALUES ('01C0423E-F36B-1410-8001-400000000002');

INSERT INTO t1 VALUES ('01C0424E-F36B-1410-8001-500000000001');

INSERT INTO t1 VALUES ('02C0424E-F36B-1410-8001-500000000001');

INSERT INTO t1 VALUES (default);

INSERT INTO t1 VALUES (default);

SELECT * FROM t1 ORDER BY id;

00000000-0000-0000-0000-000000000000

F0000000-0000-0000-0000-000000000000

0000000F-0000-0000-0000-000000000000

00000000-F000-0000-0000-000000000000

00000000-000F-0000-0000-000000000000

00000000-0000-F000-0000-000000000000

00000000-0000-000F-0000-000000000000

00000000-0000-0000-000F-000000000000

00000000-0000-0000-F000-000000000000

00000000-0000-0000-0000-00000000000F

01C0423E-F36B-1410-8001-400000000001

01C0423E-F36B-1410-8001-400000000002

01C0424E-F36B-1410-8001-500000000001

02C0424E-F36B-1410-8001-500000000001

0BC0423E-F36B-1410-800B-800000000000

0CC0423E-F36B-1410-800C-800000000000

00000000-0000-0000-0000-F00000000000

01C0423E-F36B-1410-8001-F00000000001

01C0423E-F36B-F410-8001-F00000000001

01C0423E-FF6B-F410-8001-F00000000001

01C0423E-F36B-1410-F001-F00000000001

However, the NewSequentialId() function in SQL Server puts the timestamp segments of the UUID in little endian order, while MariaDB function UUID() puts the timestamp segments in big endian order.

Alexander Barkov added a comment - 2021-10-12 11:10 - edited SQL Server has a data type uniqueidentifier with a good sort order. Tested with SQL Server 2017: CREATE TABLE t1 (id uniqueidentifier default NewSequentialId()); INSERT INTO t1 VALUES ( '00000000-0000-0000-0000-000000000000' ); INSERT INTO t1 VALUES ( 'F0000000-0000-0000-0000-000000000000' ); INSERT INTO t1 VALUES ( '0000000F-0000-0000-0000-000000000000' ); INSERT INTO t1 VALUES ( '00000000-F000-0000-0000-000000000000' ); INSERT INTO t1 VALUES ( '00000000-000F-0000-0000-000000000000' ); INSERT INTO t1 VALUES ( '00000000-0000-F000-0000-000000000000' ); INSERT INTO t1 VALUES ( '00000000-0000-000F-0000-000000000000' ); INSERT INTO t1 VALUES ( '00000000-0000-0000-F000-000000000000' ); INSERT INTO t1 VALUES ( '00000000-0000-0000-000F-000000000000' ); INSERT INTO t1 VALUES ( '00000000-0000-0000-0000-F00000000000' ); INSERT INTO t1 VALUES ( '00000000-0000-0000-0000-00000000000F' ); INSERT INTO t1 VALUES ( '01C0423E-F36B-1410-8001-F00000000001' ); INSERT INTO t1 VALUES ( '01C0423E-F36B-1410-F001-F00000000001' ); INSERT INTO t1 VALUES ( '01C0423E-F36B-F410-8001-F00000000001' ); INSERT INTO t1 VALUES ( '01C0423E-FF6B-F410-8001-F00000000001' ); INSERT INTO t1 VALUES ( '01C0423E-F36B-1410-8001-400000000001' ); INSERT INTO t1 VALUES ( '01C0423E-F36B-1410-8001-400000000002' ); INSERT INTO t1 VALUES ( '01C0424E-F36B-1410-8001-500000000001' ); INSERT INTO t1 VALUES ( '02C0424E-F36B-1410-8001-500000000001' ); INSERT INTO t1 VALUES ( default ); INSERT INTO t1 VALUES ( default ); SELECT * FROM t1 ORDER BY id; 00000000-0000-0000-0000-000000000000 F0000000-0000-0000-0000-000000000000 0000000F-0000-0000-0000-000000000000 00000000-F000-0000-0000-000000000000 00000000-000F-0000-0000-000000000000 00000000-0000-F000-0000-000000000000 00000000-0000-000F-0000-000000000000 00000000-0000-0000-000F-000000000000 00000000-0000-0000-F000-000000000000 00000000-0000-0000-0000-00000000000F 01C0423E-F36B-1410-8001-400000000001 01C0423E-F36B-1410-8001-400000000002 01C0424E-F36B-1410-8001-500000000001 02C0424E-F36B-1410-8001-500000000001 0BC0423E-F36B-1410-800B-800000000000 0CC0423E-F36B-1410-800C-800000000000 00000000-0000-0000-0000-F00000000000 01C0423E-F36B-1410-8001-F00000000001 01C0423E-F36B-F410-8001-F00000000001 01C0423E-FF6B-F410-8001-F00000000001 01C0423E-F36B-1410-F001-F00000000001 However, the NewSequentialId() function in SQL Server puts the timestamp segments of the UUID in little endian order, while MariaDB function UUID() puts the timestamp segments in big endian order.

Alexander Barkov added a comment - 2021-10-15 18:01 - edited

It seems if we change the byte order in Field_uuid::store() to make it better sortable, we'll need to put the exact data type name to the row binary log - implement some extension on top of ~~MDEV-20477~~, so replication can distinguish UUID from BINARY(16). This is needed to fix the byte order properly:

change byte order from binary to sortable when replicating from BINARY(16) to UUID
change byte order from sortable to binary when replicating from UUID to BINARY(16)
preserve byte order as is when replicating from UUID to UUID

Another option is to put UUID to binlog in "visible" binary representation instead of "sortable" binary representation. This is simpler to implement.

Alexander Barkov added a comment - 2021-10-15 18:01 - edited It seems if we change the byte order in Field_uuid::store() to make it better sortable, we'll need to put the exact data type name to the row binary log - implement some extension on top of MDEV-20477 , so replication can distinguish UUID from BINARY(16). This is needed to fix the byte order properly: change byte order from binary to sortable when replicating from BINARY(16) to UUID change byte order from sortable to binary when replicating from UUID to BINARY(16) preserve byte order as is when replicating from UUID to UUID Another option is to put UUID to binlog in "visible" binary representation instead of "sortable" binary representation. This is simpler to implement.

Sergei Golubchik added a comment - 2021-10-19 13:00

commit 5cfb31854a4 is ok to push (to the preview-* branch, of course)

Sergei Golubchik added a comment - 2021-10-19 13:00 commit 5cfb31854a4 is ok to push (to the preview-* branch, of course)

Daniel Black added a comment - 2021-10-20 00:31

As noted by rjames, the clock sequence number looks out of place. The current implementation of UUID uses a random number https://github.com/MariaDB/server/blob/10.7/mysys/my_uuid.c#L71 (allowed as fallback option in https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.5).

So to ensure its randomness is localized and doesn't prevent big splits/sorts of index pages shouldn't the sss be on the far right like this:

nnnnnnnnnnnn-vVhh-hmmm-mllllllllsss

This would mean inserts that happen to have the same except (unlikely as time low is in a 100ns interval) for the sss occur minor local inserts.

Otherwise in the nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll from, sequential uuids from UUIDv1 (or v2) generators, MariaDB or not, could on a very different index page.

Daniel Black added a comment - 2021-10-20 00:31 As noted by rjames , the clock sequence number looks out of place. The current implementation of UUID uses a random number https://github.com/MariaDB/server/blob/10.7/mysys/my_uuid.c#L71 (allowed as fallback option in https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.5 ). So to ensure its randomness is localized and doesn't prevent big splits/sorts of index pages shouldn't the sss be on the far right like this: nnnnnnnnnnnn-vVhh-hmmm-mllllllllsss This would mean inserts that happen to have the same except (unlikely as time low is in a 100ns interval) for the sss occur minor local inserts. Otherwise in the nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll from, sequential uuids from UUIDv1 (or v2) generators, MariaDB or not, could on a very different index page.

Alexander Barkov added a comment - 2021-10-20 02:54 - edited

The "sss" part is initialized only once, during the server start up and is copied to all UUID values
generated by the UUID() function, until the server shutdown. So it rather belongs to the constant part of the value.

Moving it to the right is desirable on one hand, to make sure the rule "the earlier UUID was generated, the higher it goes in the ORDER BY output" is true even after the server restart.

Moving it to the right is not desirable on the other hand: It will make the prefix compression worse.

serg, what do you think?

Alexander Barkov added a comment - 2021-10-20 02:54 - edited The "sss" part is initialized only once, during the server start up and is copied to all UUID values generated by the UUID() function, until the server shutdown. So it rather belongs to the constant part of the value. Moving it to the right is desirable on one hand, to make sure the rule "the earlier UUID was generated, the higher it goes in the ORDER BY output" is true even after the server restart. Moving it to the right is not desirable on the other hand: It will make the prefix compression worse. serg , what do you think?

Rick James added a comment - 2021-10-20 03:20

The format I pushed for (as far back as 2012) was

Vhhh-mmmm-llllllll-vsss-nnnnnnnnnnnn

That shuffled the main time parts so that the index would have good "locality of reference" when being interested in records created about the same time. I did not worry about vsss or nnnnnnnnnnnn.

Putting nnnnnnnnnnnn first is not "bad". It clusters the rows from the same source in preference to the time parts.

V and v are constants since only MySQL/MySQL is building these ids. Their location is not relevant because no one is likely to ever see that the storage has them in the "wrong" places. As a related note, no one cares that INTs are stored big-endian versus little-endian.

There may be a binary compatibility argument for shuffling the bits exactly as MySQL 8.0 does. It might be exactly what I pushed for 9 years go, which is about when I started lobbying Oracle folks.

Rick James added a comment - 2021-10-20 03:20 The format I pushed for (as far back as 2012) was Vhhh-mmmm-llllllll-vsss-nnnnnnnnnnnn That shuffled the main time parts so that the index would have good "locality of reference" when being interested in records created about the same time. I did not worry about vsss or nnnnnnnnnnnn. Putting nnnnnnnnnnnn first is not "bad". It clusters the rows from the same source in preference to the time parts. V and v are constants since only MySQL/MySQL is building these ids. Their location is not relevant because no one is likely to ever see that the storage has them in the "wrong" places. As a related note, no one cares that INTs are stored big-endian versus little-endian. There may be a binary compatibility argument for shuffling the bits exactly as MySQL 8.0 does. It might be exactly what I pushed for 9 years go, which is about when I started lobbying Oracle folks.

Rick James added a comment - 2021-10-20 03:26

Alexander Barkov's point about prefix compression is interesting. Does that imply that all of Vvsssnnnnnnnnnnnn should go first?

Rick James added a comment - 2021-10-20 03:26 Alexander Barkov's point about prefix compression is interesting. Does that imply that all of Vvsssnnnnnnnnnnnn should go first?

Alexander Barkov added a comment - 2021-10-20 03:32 - edited

Yes, from the index size point of view it's more efficient if we put Vvsssnnnnnnnnnnnn first.

The exact segment order in the current prototype
(in https://github.com/MariaDB/server/tree/bb-10.7-bar-uuid)
is:

nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll

"nnnnnnnnnnnn-vsss-V" is a constant part (at least between the server restarts).

"hhh-mmmm-llllllll" is the timestamp, with big endian byte order.

Alexander Barkov added a comment - 2021-10-20 03:32 - edited Yes, from the index size point of view it's more efficient if we put Vvsssnnnnnnnnnnnn first. The exact segment order in the current prototype (in https://github.com/MariaDB/server/tree/bb-10.7-bar-uuid ) is: nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll "nnnnnnnnnnnn-vsss-V" is a constant part (at least between the server restarts). "hhh-mmmm-llllllll" is the timestamp, with big endian byte order.

Rick James added a comment - 2021-10-20 03:41

While on the topic, can someone tell me whether InnoDB always uses index-prefixing? And are there cases where it does not? Is the prefixing done as some number of bytes? Would different forms of column-compression lend themselves better to index-prefixing?

Rick James added a comment - 2021-10-20 03:41 While on the topic, can someone tell me whether InnoDB always uses index-prefixing? And are there cases where it does not? Is the prefixing done as some number of bytes? Would different forms of column-compression lend themselves better to index-prefixing?

Daniel Black added a comment - 2021-10-20 05:20

> The "sss" part is initialized only once, during the server start up and is copied to all UUID values
> generated by the UUID() function, until the server shutdown. So it rather belongs to the constant part of the value.

Ok. But a UUIDv1 generation that occurs in an application language (https://github.com/python/cpython/blob/main/Lib/uuid.py#L694 could just be random).
Or the more_entropy (opt on uniqid) random bit of https://github.com/php/php-src/blob/master/ext/standard/uniqid.c#L82

Breaking apart the 4 bit version from Vhhh or the 3 bit variant from vsss seems too excessive. How does this look?

nnnnnnnnnnnn-Vhhh-mmmm-llllllll-vsss

The node + Vhhh comes to 8 bytes, probably more as mmmm (2 bytes) is pretty constant too for rapid insertion. So we've sacrified 2 bytes of potentially constant (if MariaDB) generated.

> Moving it to the right is desirable on one hand, to make sure the rule "the earlier UUID was generated, the higher it goes in the ORDER BY output" is true even after the server restart.

I think I've still got that above.

> Moving it to the right is not desirable on the other hand: It will make the prefix compression worse.

Sacrifice the prefix compression of 2 bytes out of 10 -> 12 bytes because non-us generators could be random sss. I think that's fair. Thoughts?

Daniel Black added a comment - 2021-10-20 05:20 > The "sss" part is initialized only once, during the server start up and is copied to all UUID values > generated by the UUID() function, until the server shutdown. So it rather belongs to the constant part of the value. Ok. But a UUIDv1 generation that occurs in an application language ( https://github.com/python/cpython/blob/main/Lib/uuid.py#L694 could just be random). Or the more_entropy (opt on uniqid) random bit of https://github.com/php/php-src/blob/master/ext/standard/uniqid.c#L82 Breaking apart the 4 bit version from Vhhh or the 3 bit variant from vsss seems too excessive. How does this look? nnnnnnnnnnnn-Vhhh-mmmm-llllllll-vsss The node + Vhhh comes to 8 bytes, probably more as mmmm (2 bytes) is pretty constant too for rapid insertion. So we've sacrified 2 bytes of potentially constant (if MariaDB) generated. > Moving it to the right is desirable on one hand, to make sure the rule "the earlier UUID was generated, the higher it goes in the ORDER BY output" is true even after the server restart. I think I've still got that above. > Moving it to the right is not desirable on the other hand: It will make the prefix compression worse. Sacrifice the prefix compression of 2 bytes out of 10 -> 12 bytes because non-us generators could be random sss. I think that's fair. Thoughts?

Alexander Barkov added a comment - 2021-10-20 06:38

rjasdfiii, InnoDB specifically does not perform prefix compression. But some other engines do.

Alexander Barkov added a comment - 2021-10-20 06:38 rjasdfiii , InnoDB specifically does not perform prefix compression. But some other engines do.

Alexander Barkov added a comment - 2021-10-20 08:40 - edited

danblack, I agree, for externally generated UUIDs, it's better to place "vsss" at the end.

Just tested this script:

SET default_storage_engine=MyISAM;

CREATE OR REPLACE TABLE t1 (a UUID, INDEX(a));

SHOW CREATE TABLE t1;

BEGIN;

DELIMITER $$

FOR i IN 0..0xFFFF

DO

  INSERT INTO t1 VALUES (UUID());

END FOR

$$

DELIMITER ;

COMMIT;

show table status like 't1'\G

nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll gives the ratio of 0.56:

     Data_length: 1114112

    Index_length: 624640

nnnnnnnnnnnn-Vhhh-mmmm-llllllll-vsss gives the ratio of 0.68:

     Data_length: 1114112

    Index_length: 764928

It's probably still good enough.

Note, the "user visible" order llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn, which is now in the preview branch, gives the ratio of 1.44. This one is really bad.

     Data_length: 1114112

    Index_length: 1611776

Alexander Barkov added a comment - 2021-10-20 08:40 - edited danblack , I agree, for externally generated UUIDs, it's better to place "vsss" at the end. Just tested this script: SET default_storage_engine=MyISAM; CREATE OR REPLACE TABLE t1 (a UUID, INDEX (a)); SHOW CREATE TABLE t1; BEGIN ; DELIMITER $$ FOR i IN 0..0xFFFF DO INSERT INTO t1 VALUES (UUID()); END FOR $$ DELIMITER ; COMMIT ; show table status like 't1' \G nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll gives the ratio of 0.56: Data_length: 1114112 Index_length: 624640 nnnnnnnnnnnn-Vhhh-mmmm-llllllll-vsss gives the ratio of 0.68: Data_length: 1114112 Index_length: 764928 It's probably still good enough. Note, the "user visible" order llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn , which is now in the preview branch, gives the ratio of 1.44. This one is really bad. Data_length: 1114112 Index_length: 1611776

Sergei Golubchik added a comment - 2021-10-21 16:04

The UUIDv1 standard specifies that

sss is initialized to a random value
every time the system clock (hhh-mmmm-llllllll) goes backward (a new generated value is less than the previous one), sss is incremented.

So clearly sss should go before the system clock value, as in sss-hhh-mmmm-llllllll

Sergei Golubchik added a comment - 2021-10-21 16:04 The UUIDv1 standard specifies that sss is initialized to a random value every time the system clock ( hhh-mmmm-llllllll ) goes backward (a new generated value is less than the previous one), sss is incremented. So clearly sss should go before the system clock value, as in sss-hhh-mmmm-llllllll

Sergei Golubchik added a comment - 2021-10-23 18:59 - edited

ok to push (into a preview branch)

Sergei Golubchik added a comment - 2021-10-23 18:59 - edited ok to push (into a preview branch)

Alexander Barkov added a comment - 2021-10-25 10:10

Pushed into preview-10.7-~~MDEV-4958~~-uuid.

Alexander Barkov added a comment - 2021-10-25 10:10 Pushed into preview-10.7- MDEV-4958 -uuid.

Illia Somov added a comment - 2023-04-27 12:50

This should have been only applied to UUIDv1...

Illia Somov added a comment - 2023-04-27 12:50 This should have been only applied to UUIDv1...

People

Assignee:: Alexander Barkov

Reporter:: Ian Gilfillan

Votes:: 2 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2021-09-22 11:35

Updated:: 2023-04-27 12:50

Resolved:: 2021-10-25 10:10

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server