[MDEV-29959] UUID Sorting - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 11.0.0, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 10.9.8, 10.10.6, 10.11.5, 11.0.3
Component/s: Server
Labels:
- uuid

Description

Ricky's original entry:

Hi, is it possible to revert rearranging bits when inserting new uuid?
I have no idea how did anyone come to conclusion to rearrange uuid
bits before inserting and suppose that every inserted uuid is a
version 1 uuid, and not even bother checking version that's being
inserted. UuidV6,7,8 and ULID are now useless with uuid type in
mariadb and they will cause fragmentation, and it's impossible to sort
rows.

Monty;
This Jira entry is about the new UUID type in MariadB 10.7

It seams that the main confusion is that when doing an ORDER BY
on the new UUID type, the UUID's is not coming in lexical order
but in storage order.

I have addded a little bit of background to allow us to better
understand and discuss the issues with UUID and also give other
readers more understand of the different UUID options that MariaDB
supports.

MariaDB has a few different uuid's (all unique in the server and
across servers)

uuid()

Universal Unique Identifiers (UUIDs), as in DCE 1.1: Remote Procedure Call,
Unique among all MariaDB installations
String, 5 parts separated with '-', 36 bytes
Starts 'scrambled time' (time-low, time-mid, time-high) followed by a set
system unique constants
'Random' from the user point of view

sys_guid()

Like the above but without '-'
String, 32 bytes

uuid_short()

'MySQL's original space efficient uuid'. Unique among a MariaDB cluster.
Server id - timestamp - incrementor
Each call generates a slightly higher number
longlong, 8 bytes ; Efficient storage!
'Sequential' from the user point of view.

UUID data type in 10.7.0

String representation defined in RFC4122.
Stored in 'index-friendly manner'. This means that the last
'constant part' of uuid is stored first and then the time in high-to-low
byte order. Time bytes are store in big-endian format (same as uuid())
This only works when the input comes from the MariaDB UUID() function (UUID v1).
Newer uuid's will be >= than older id's
16 bytes
Storage efficient for storage_engines that can do prefix compression.
Can be confusion when doing ORDER BY UUID_column as the order
is in storage order, not 'string value order'.

Other things

For sorting there is no difference if timestamp is stored first or
timestamp is stored last after a 'constant part'. The main issue
for this Jira entry seams to be the the timestamp value is
internally reordered to get a better storage and bulk insert rate.
There is no problem in sorting any of the above UUIDs. As Barkov
shows with an example, one can always ensure lexical order also
for the UUID type.

For index storage efficiency, storing timestamp last in an UUID is
better as it allows the storage engine to do prefix-compression and
can reduce the 16 byte key to 8 bytes or less.

The main purpose of an UUID is that they should be unique for the
server and also across different systems. The storage order of
bytes does not matter for this to hold true. As long as an
application does not depend on the sort order of UUID's (and
applications should not), the MariaDB UUID type is compatible with
any other UUID or any other database server.

For storage engine performance there are two things that one would
like to optimize related to UUID's:

1) When doing bulk insert, having UUID in order makes inserts of the
primary key faster and makes the primary key index smaller (initially).
2) When doing single insert from multiple threads, having UUID in 'random
order' is better as there will be less page collisions between multiple
threads and one can get higher insert throughput.

In other words, it depends on the usage of UUID keys how they should
be stored. There is no obvious 'best way'.

When using UUID type or uuid_short() one gets benefit 1)
By using UUID() strings, one gets benefit 2)

After reading the comment on this Jira entry, the conclusion with current
code is:

If your application is not depending on the sort order of UUID and you want to optimizer for bulk insert and less storage of the UUID key, then use the UUID type.
If your application requires UUID's to be sorted as strings, if you have your own version of UUID or you want to optimize for concurrency between multiple threads, then use BINARY(32) and the sys_guid() function.
We should consider adding another UUID type that will store things in the given order. Another option would be to add a type that would
store HEX strings in binary and show them as hex strings. This could be universally useful and could be a building block for a new UUID
type that stores things in exact byte order.
The HEX type would also work for UUID's, but then one would miss the '-' in the output string.
As Sergei points out in the comments, we should be able to detect other standard UUID's types > version 5 and not swap bits in them. He has already been working on making this happen (time table not yet defined). For 'own' UUID types, a HEX type could be an efficient.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

UUID-old-new.diff
2023-05-17 14:49
52 kB
Sergei Golubchik

Issue Links

blocks

MDEV-31626 implement inet4->inet6 cast

Closed

causes

MDEV-32112 Random incorrect uuid errors since 10.11.5

Closed

MDEV-33065 Insert in table with constraint by UUID works on 11.0.2 and fails on 11.0.3

Open

MDEV-33442 REPAIR TABLE corrupts UUIDs

Closed

is caused by

MDEV-26664 Store UUIDs in a more efficient manner

Closed

relates to

MDEV-31926 UUID v7 are compared incorrectly

Closed

(1 relates to)

Activity

People

Assignee:: Sergei Golubchik

Reporter:: Rick Strafy

Votes:: 4 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2022-11-06 19:37

Updated:: 2024-02-26 15:00

Resolved:: 2023-07-06 11:28

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.