[MDEV-26664] Store UUIDs in a more efficient manner Created: 2021-09-22  Updated: 2023-04-27  Resolved: 2021-10-25

Status: Closed
Project: MariaDB Server
Component/s: Data types
Affects Version/s: None
Fix Version/s: 10.7.1

Type: Bug Priority: Critical
Reporter: Ian Gilfillan Assignee: Alexander Barkov
Resolution: Fixed Votes: 2
Labels: uuid

Issue Links:
Problem/Incident
causes MDEV-29959 UUID Sorting Closed
is caused by MDEV-4958 Adding datatype UUID Closed
Relates
relates to MDEV-20477 Merge binlog extended metadata suppor... Closed
relates to MDEV-27588 MyISAM/Aria descending index compress... Closed

 Description   

UUIDs are stored in a binary form, but not byte-swapped like in MySQL's uuid_to_bin https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_uuid-to-bin

The order provided by UUID_TO_BIN() is not really friendly to index prefix compression:
it only reorders the timestamp segments, while the clock-seq and the node-ID segments are still at the end. This ruins the whole idea of prefix compression.

Let's sort UUID in a really index-friendly order, by moving the constant part to the left and the variable part to the right.

So for comparison purposes we'll reorder segments, but keep the byte order inside the segments (i.e. in big endian order):

MariaDB binary UUIDv1 representation, as returned by the UUID() function:
 
  llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn
 
Binary sortable representation:
 
   nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll
 
Sign           Section               Bits   Bytes  Pos   PosBinSortable
-------------  -------               ----   -----  ---   --------------
llllllll       time low              32     4        0   12
mmmm           time mid              16     2        4   10
Vhhh           version and time hi   16     2        6   8
vsss           variant and clock seq 16     2        8   6
nnnnnnnnnnnn   node ID               48     6       10   0



 Comments   
Comment by Alexander Barkov [ 2021-10-12 ]

SQL Server has a data type uniqueidentifier with a good sort order.

Tested with SQL Server 2017:

CREATE TABLE t1 (id uniqueidentifier default NewSequentialId());
INSERT INTO t1 VALUES ('00000000-0000-0000-0000-000000000000');
INSERT INTO t1 VALUES ('F0000000-0000-0000-0000-000000000000');
INSERT INTO t1 VALUES ('0000000F-0000-0000-0000-000000000000');
INSERT INTO t1 VALUES ('00000000-F000-0000-0000-000000000000');
INSERT INTO t1 VALUES ('00000000-000F-0000-0000-000000000000');
INSERT INTO t1 VALUES ('00000000-0000-F000-0000-000000000000');
INSERT INTO t1 VALUES ('00000000-0000-000F-0000-000000000000');
INSERT INTO t1 VALUES ('00000000-0000-0000-F000-000000000000');
INSERT INTO t1 VALUES ('00000000-0000-0000-000F-000000000000');
INSERT INTO t1 VALUES ('00000000-0000-0000-0000-F00000000000');
INSERT INTO t1 VALUES ('00000000-0000-0000-0000-00000000000F');
INSERT INTO t1 VALUES ('01C0423E-F36B-1410-8001-F00000000001');
INSERT INTO t1 VALUES ('01C0423E-F36B-1410-F001-F00000000001');
INSERT INTO t1 VALUES ('01C0423E-F36B-F410-8001-F00000000001');
INSERT INTO t1 VALUES ('01C0423E-FF6B-F410-8001-F00000000001');
INSERT INTO t1 VALUES ('01C0423E-F36B-1410-8001-400000000001');
INSERT INTO t1 VALUES ('01C0423E-F36B-1410-8001-400000000002');
INSERT INTO t1 VALUES ('01C0424E-F36B-1410-8001-500000000001');
INSERT INTO t1 VALUES ('02C0424E-F36B-1410-8001-500000000001');
INSERT INTO t1 VALUES (default);
INSERT INTO t1 VALUES (default);
SELECT * FROM t1 ORDER BY id;

00000000-0000-0000-0000-000000000000
F0000000-0000-0000-0000-000000000000
0000000F-0000-0000-0000-000000000000
00000000-F000-0000-0000-000000000000
00000000-000F-0000-0000-000000000000
00000000-0000-F000-0000-000000000000
00000000-0000-000F-0000-000000000000
00000000-0000-0000-000F-000000000000
00000000-0000-0000-F000-000000000000
00000000-0000-0000-0000-00000000000F
01C0423E-F36B-1410-8001-400000000001
01C0423E-F36B-1410-8001-400000000002
01C0424E-F36B-1410-8001-500000000001
02C0424E-F36B-1410-8001-500000000001
0BC0423E-F36B-1410-800B-800000000000
0CC0423E-F36B-1410-800C-800000000000
00000000-0000-0000-0000-F00000000000
01C0423E-F36B-1410-8001-F00000000001
01C0423E-F36B-F410-8001-F00000000001
01C0423E-FF6B-F410-8001-F00000000001
01C0423E-F36B-1410-F001-F00000000001

However, the NewSequentialId() function in SQL Server puts the timestamp segments of the UUID in little endian order, while MariaDB function UUID() puts the timestamp segments in big endian order.

Comment by Alexander Barkov [ 2021-10-15 ]

It seems if we change the byte order in Field_uuid::store() to make it better sortable, we'll need to put the exact data type name to the row binary log - implement some extension on top of MDEV-20477, so replication can distinguish UUID from BINARY(16). This is needed to fix the byte order properly:

  • change byte order from binary to sortable when replicating from BINARY(16) to UUID
  • change byte order from sortable to binary when replicating from UUID to BINARY(16)
  • preserve byte order as is when replicating from UUID to UUID

Another option is to put UUID to binlog in "visible" binary representation instead of "sortable" binary representation. This is simpler to implement.

Comment by Sergei Golubchik [ 2021-10-19 ]

commit 5cfb31854a4 is ok to push (to the preview-* branch, of course)

Comment by Daniel Black [ 2021-10-20 ]

As noted by rjames, the clock sequence number looks out of place. The current implementation of UUID uses a random number https://github.com/MariaDB/server/blob/10.7/mysys/my_uuid.c#L71 (allowed as fallback option in https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.5).

So to ensure its randomness is localized and doesn't prevent big splits/sorts of index pages shouldn't the sss be on the far right like this:

nnnnnnnnnnnn-vVhh-hmmm-mllllllllsss

This would mean inserts that happen to have the same except (unlikely as time low is in a 100ns interval) for the sss occur minor local inserts.

Otherwise in the nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll from, sequential uuids from UUIDv1 (or v2) generators, MariaDB or not, could on a very different index page.

Comment by Alexander Barkov [ 2021-10-20 ]

The "sss" part is initialized only once, during the server start up and is copied to all UUID values
generated by the UUID() function, until the server shutdown. So it rather belongs to the constant part of the value.

Moving it to the right is desirable on one hand, to make sure the rule "the earlier UUID was generated, the higher it goes in the ORDER BY output" is true even after the server restart.

Moving it to the right is not desirable on the other hand: It will make the prefix compression worse.

serg, what do you think?

Comment by Rick James [ 2021-10-20 ]

The format I pushed for (as far back as 2012) was

Vhhh-mmmm-llllllll-vsss-nnnnnnnnnnnn

That shuffled the main time parts so that the index would have good "locality of reference" when being interested in records created about the same time. I did not worry about vsss or nnnnnnnnnnnn.

Putting nnnnnnnnnnnn first is not "bad". It clusters the rows from the same source in preference to the time parts.

V and v are constants since only MySQL/MySQL is building these ids. Their location is not relevant because no one is likely to ever see that the storage has them in the "wrong" places. As a related note, no one cares that INTs are stored big-endian versus little-endian.

There may be a binary compatibility argument for shuffling the bits exactly as MySQL 8.0 does. It might be exactly what I pushed for 9 years go, which is about when I started lobbying Oracle folks.

Comment by Rick James [ 2021-10-20 ]

Alexander Barkov's point about prefix compression is interesting. Does that imply that all of Vvsssnnnnnnnnnnnn should go first?

Comment by Alexander Barkov [ 2021-10-20 ]

Yes, from the index size point of view it's more efficient if we put Vvsssnnnnnnnnnnnn first.

The exact segment order in the current prototype
(in https://github.com/MariaDB/server/tree/bb-10.7-bar-uuid)
is:

nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll

"nnnnnnnnnnnn-vsss-V" is a constant part (at least between the server restarts).

"hhh-mmmm-llllllll" is the timestamp, with big endian byte order.

Comment by Rick James [ 2021-10-20 ]

While on the topic, can someone tell me whether InnoDB always uses index-prefixing? And are there cases where it does not? Is the prefixing done as some number of bytes? Would different forms of column-compression lend themselves better to index-prefixing?

Comment by Daniel Black [ 2021-10-20 ]

> The "sss" part is initialized only once, during the server start up and is copied to all UUID values
> generated by the UUID() function, until the server shutdown. So it rather belongs to the constant part of the value.

Ok. But a UUIDv1 generation that occurs in an application language (https://github.com/python/cpython/blob/main/Lib/uuid.py#L694 could just be random).
Or the more_entropy (opt on uniqid) random bit of https://github.com/php/php-src/blob/master/ext/standard/uniqid.c#L82

Breaking apart the 4 bit version from Vhhh or the 3 bit variant from vsss seems too excessive. How does this look?

nnnnnnnnnnnn-Vhhh-mmmm-llllllll-vsss

The node + Vhhh comes to 8 bytes, probably more as mmmm (2 bytes) is pretty constant too for rapid insertion. So we've sacrified 2 bytes of potentially constant (if MariaDB) generated.

> Moving it to the right is desirable on one hand, to make sure the rule "the earlier UUID was generated, the higher it goes in the ORDER BY output" is true even after the server restart.

I think I've still got that above.

> Moving it to the right is not desirable on the other hand: It will make the prefix compression worse.

Sacrifice the prefix compression of 2 bytes out of 10 -> 12 bytes because non-us generators could be random sss. I think that's fair. Thoughts?

Comment by Alexander Barkov [ 2021-10-20 ]

rjasdfiii, InnoDB specifically does not perform prefix compression. But some other engines do.

Comment by Alexander Barkov [ 2021-10-20 ]

danblack, I agree, for externally generated UUIDs, it's better to place "vsss" at the end.

Just tested this script:

SET default_storage_engine=MyISAM;
CREATE OR REPLACE TABLE t1 (a UUID, INDEX(a));
SHOW CREATE TABLE t1;
BEGIN;
DELIMITER $$
FOR i IN 0..0xFFFF
DO
  INSERT INTO t1 VALUES (UUID());
END FOR
$$
DELIMITER ;
COMMIT;
show table status like 't1'\G

nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll gives the ratio of 0.56:

     Data_length: 1114112
    Index_length: 624640

nnnnnnnnnnnn-Vhhh-mmmm-llllllll-vsss gives the ratio of 0.68:

     Data_length: 1114112
    Index_length: 764928

It's probably still good enough.

Note, the "user visible" order llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn, which is now in the preview branch, gives the ratio of 1.44. This one is really bad.

     Data_length: 1114112
    Index_length: 1611776

Comment by Sergei Golubchik [ 2021-10-21 ]

The UUIDv1 standard specifies that

  • sss is initialized to a random value
  • every time the system clock (hhh-mmmm-llllllll) goes backward (a new generated value is less than the previous one), sss is incremented.

So clearly sss should go before the system clock value, as in sss-hhh-mmmm-llllllll

Comment by Sergei Golubchik [ 2021-10-23 ]

ok to push (into a preview branch)

Comment by Alexander Barkov [ 2021-10-25 ]

Pushed into preview-10.7-MDEV-4958-uuid.

Comment by Illia Somov [ 2023-04-27 ]

This should have been only applied to UUIDv1...

Generated at Thu Feb 08 09:47:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.