Ricky's original entry:
Hi, is it possible to revert rearranging bits when inserting new uuid?
I have no idea how did anyone come to conclusion to rearrange uuid
bits before inserting and suppose that every inserted uuid is a
version 1 uuid, and not even bother checking version that's being
inserted. UuidV6,7,8 and ULID are now useless with uuid type in
mariadb and they will cause fragmentation, and it's impossible to sort
This Jira entry is about the new UUID type in MariadB 10.7
It seams that the main confusion is that when doing an ORDER BY
on the new UUID type, the UUID's is not coming in lexical order
but in storage order.
I have addded a little bit of background to allow us to better
understand and discuss the issues with UUID and also give other
readers more understand of the different UUID options that MariaDB
MariaDB has a few different uuid's (all unique in the server and
- Universal Unique Identifiers (UUIDs), as in DCE 1.1: Remote Procedure Call,
- Unique among all MariaDB installations
- String, 5 parts separated with '-', 36 bytes
- Starts 'scrambled time' (time-low, time-mid, time-high) followed by a set
system unique constants
- 'Random' from the user point of view
- Like the above but without '-'
- String, 32 bytes
- 'MySQL's original space efficient uuid'. Unique among a MariaDB cluster.
- Server id - timestamp - incrementor
- Each call generates a slightly higher number
- longlong, 8 bytes ; Efficient storage!
- 'Sequential' from the user point of view.
UUID data type in 10.7.0
- String representation defined in RFC4122.
- Stored in 'index-friendly manner'. This means that the last
'constant part' of uuid is stored first and then the time in high-to-low
byte order. Time bytes are store in big-endian format (same as uuid())
This only works when the input comes from the MariaDB UUID() function (UUID v1).
- Newer uuid's will be >= than older id's
- 16 bytes
- Storage efficient for storage_engines that can do prefix compression.
- Can be confusion when doing ORDER BY UUID_column as the order
is in storage order, not 'string value order'.
- For sorting there is no difference if timestamp is stored first or
timestamp is stored last after a 'constant part'. The main issue
for this Jira entry seams to be the the timestamp value is
internally reordered to get a better storage and bulk insert rate.
- There is no problem in sorting any of the above UUIDs. As Barkov
shows with an example, one can always ensure lexical order also
for the UUID type.
- For index storage efficiency, storing timestamp last in an UUID is
better as it allows the storage engine to do prefix-compression and
can reduce the 16 byte key to 8 bytes or less.
- The main purpose of an UUID is that they should be unique for the
server and also across different systems. The storage order of
bytes does not matter for this to hold true. As long as an
application does not depend on the sort order of UUID's (and
applications should not), the MariaDB UUID type is compatible with
any other UUID or any other database server.
For storage engine performance there are two things that one would
like to optimize related to UUID's:
1) When doing bulk insert, having UUID in order makes inserts of the
primary key faster and makes the primary key index smaller (initially).
2) When doing single insert from multiple threads, having UUID in 'random
order' is better as there will be less page collisions between multiple
threads and one can get higher insert throughput.
In other words, it depends on the usage of UUID keys how they should
be stored. There is no obvious 'best way'.
When using UUID type or uuid_short() one gets benefit 1)
By using UUID() strings, one gets benefit 2)
After reading the comment on this Jira entry, the conclusion with current
- If your application is not depending on the sort order of UUID and you want to optimizer for bulk insert and less storage of the UUID key, then use the UUID type.
- If your application requires UUID's to be sorted as strings, if you have your own version of UUID or you want to optimize for concurrency between multiple threads, then use BINARY(32) and the sys_guid() function.
- We should consider adding another UUID type that will store things in the given order. Another option would be to add a type that would
store HEX strings in binary and show them as hex strings. This could be universally useful and could be a building block for a new UUID
type that stores things in exact byte order.
The HEX type would also work for UUID's, but then one would miss the '-' in the output string.
- As Sergei points out in the comments, we should be able to detect other standard UUID's types > version 5 and not swap bits in them. He has already been working on making this happen (time table not yet defined). For 'own' UUID types, a HEX type could be an efficient.