[MDEV-29959] UUID Sorting - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 11.0.0, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL)
Fix Version/s: 10.9.8, 10.10.6, 10.11.5, 11.0.3
Component/s: Server
Labels:
- uuid

Description

Ricky's original entry:

Hi, is it possible to revert rearranging bits when inserting new uuid?
I have no idea how did anyone come to conclusion to rearrange uuid
bits before inserting and suppose that every inserted uuid is a
version 1 uuid, and not even bother checking version that's being
inserted. UuidV6,7,8 and ULID are now useless with uuid type in
mariadb and they will cause fragmentation, and it's impossible to sort
rows.

Monty;
This Jira entry is about the new UUID type in MariadB 10.7

It seams that the main confusion is that when doing an ORDER BY
on the new UUID type, the UUID's is not coming in lexical order
but in storage order.

I have addded a little bit of background to allow us to better
understand and discuss the issues with UUID and also give other
readers more understand of the different UUID options that MariaDB
supports.

MariaDB has a few different uuid's (all unique in the server and
across servers)

uuid()

Universal Unique Identifiers (UUIDs), as in DCE 1.1: Remote Procedure Call,
Unique among all MariaDB installations
String, 5 parts separated with '-', 36 bytes
Starts 'scrambled time' (time-low, time-mid, time-high) followed by a set
system unique constants
'Random' from the user point of view

sys_guid()

Like the above but without '-'
String, 32 bytes

uuid_short()

'MySQL's original space efficient uuid'. Unique among a MariaDB cluster.
Server id - timestamp - incrementor
Each call generates a slightly higher number
longlong, 8 bytes ; Efficient storage!
'Sequential' from the user point of view.

UUID data type in 10.7.0

String representation defined in RFC4122.
Stored in 'index-friendly manner'. This means that the last
'constant part' of uuid is stored first and then the time in high-to-low
byte order. Time bytes are store in big-endian format (same as uuid())
This only works when the input comes from the MariaDB UUID() function (UUID v1).
Newer uuid's will be >= than older id's
16 bytes
Storage efficient for storage_engines that can do prefix compression.
Can be confusion when doing ORDER BY UUID_column as the order
is in storage order, not 'string value order'.

Other things

For sorting there is no difference if timestamp is stored first or
timestamp is stored last after a 'constant part'. The main issue
for this Jira entry seams to be the the timestamp value is
internally reordered to get a better storage and bulk insert rate.
There is no problem in sorting any of the above UUIDs. As Barkov
shows with an example, one can always ensure lexical order also
for the UUID type.

For index storage efficiency, storing timestamp last in an UUID is
better as it allows the storage engine to do prefix-compression and
can reduce the 16 byte key to 8 bytes or less.

The main purpose of an UUID is that they should be unique for the
server and also across different systems. The storage order of
bytes does not matter for this to hold true. As long as an
application does not depend on the sort order of UUID's (and
applications should not), the MariaDB UUID type is compatible with
any other UUID or any other database server.

For storage engine performance there are two things that one would
like to optimize related to UUID's:

1) When doing bulk insert, having UUID in order makes inserts of the
primary key faster and makes the primary key index smaller (initially).
2) When doing single insert from multiple threads, having UUID in 'random
order' is better as there will be less page collisions between multiple
threads and one can get higher insert throughput.

In other words, it depends on the usage of UUID keys how they should
be stored. There is no obvious 'best way'.

When using UUID type or uuid_short() one gets benefit 1)
By using UUID() strings, one gets benefit 2)

After reading the comment on this Jira entry, the conclusion with current
code is:

If your application is not depending on the sort order of UUID and you want to optimizer for bulk insert and less storage of the UUID key, then use the UUID type.
If your application requires UUID's to be sorted as strings, if you have your own version of UUID or you want to optimize for concurrency between multiple threads, then use BINARY(32) and the sys_guid() function.
We should consider adding another UUID type that will store things in the given order. Another option would be to add a type that would
store HEX strings in binary and show them as hex strings. This could be universally useful and could be a building block for a new UUID
type that stores things in exact byte order.
The HEX type would also work for UUID's, but then one would miss the '-' in the output string.
As Sergei points out in the comments, we should be able to detect other standard UUID's types > version 5 and not swap bits in them. He has already been working on making this happen (time table not yet defined). For 'own' UUID types, a HEX type could be an efficient.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

UUID-old-new.diff
52 kB
2023-05-17 14:49

Issue Links

blocks

MDEV-31626 implement inet4->inet6 cast

Closed

causes

MDEV-32112 Random incorrect uuid errors since 10.11.5

Closed

MDEV-33065 Insert in table with constraint by UUID works on 11.0.2 and fails on 11.0.3

Open

MDEV-33442 REPAIR TABLE corrupts UUIDs

Closed

is caused by

MDEV-26664 Store UUIDs in a more efficient manner

Closed

relates to

MDEV-31926 UUID v7 are compared incorrectly

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

View 31 older comments

Alexander Barkov added a comment - 2023-07-06 10:24

This patch is OK to push:

https://github.com/MariaDB/server/commit/ef84f8137b7c150635c23e1493f8620a4c8527ef

Alexander Barkov added a comment - 2023-07-06 10:24 This patch is OK to push: https://github.com/MariaDB/server/commit/ef84f8137b7c150635c23e1493f8620a4c8527ef

Evgenii added a comment - 2023-07-09 14:33 - edited

@serg
Is there an approximate date when the release 10.11.5 will appear, including this fix?

Evgenii added a comment - 2023-07-09 14:33 - edited @serg Is there an approximate date when the release 10.11.5 will appear, including this fix?

Sergei Golubchik added a comment - 2023-07-09 15:05

Yes. See the home page of https://jira.mariadb.org

Sergei Golubchik added a comment - 2023-07-09 15:05 Yes. See the home page of https://jira.mariadb.org

Evgenii added a comment - 2023-08-15 07:49

@serg Good afternoon, I tried the latest release version 10.11.5 which was released yesterday on August 14th. This bug is present and incorrectly processes queries like "where id < uuid_v7".

How to reproduce:
1. create "TEST" table (id: UUID, model_value: STRING) and fill the data

id;model_value
0189f81d-498e-72f8-b0fa-246d7be1adb1;30
0189f81d-335c-71c7-a620-3f093ccc39c3;29
0189f81d-2f49-72fa-8ed9-4bd2fe509e59;28
0189f81d-2ae6-7204-b694-84602a2560c7;27
0189f81d-2620-717b-b873-1829f25a4d53;26
0189f81c-f940-7161-9c48-9677e11bc40c;25
0189f81c-d4cf-71c1-93f1-ddd005eadb8b;24
0189f81c-d07f-7143-8c8b-b19457138aef;23
0189f81c-cbec-707d-9b45-b007920b5ef6;22
0189f81c-c795-71d0-bc3f-8b7b74abe044;21
0189f81c-c33d-7100-a39c-80ea40a090fc;20
0189f81c-bf35-70a0-a288-8d80473bc015;19
0189f81c-baf8-718a-870c-cd41e677c3f4;18
0189f81c-b5f1-721d-b135-22c38153aab6;17
0189f81c-b011-71cd-bf2a-d6be3cea6dd7;16
0189f81c-9934-721f-90dc-05a65a79f4b6;15
0189f81c-94d6-733d-9953-2addf3a83be4;14
0189f81c-9016-71a6-a340-ec300186a4d9;13
0189f81c-8bca-73c9-8ffb-225a39a5dc49;12
0189f81c-868b-71d1-830e-9b4ae08aa782;11
0189f81c-8219-731a-b648-0e19e64d0ec0;10
0189f81c-7e01-710a-91ce-49e62163fa29;9
0189f81c-7a05-724e-a7e1-9baf65a91ecb;8
0189f81c-75b8-7223-bd6f-b60601be3aa1;7
0189f81c-71fc-7303-b850-4538b119c5fc;6
0189f81c-6e06-71a3-a47a-664a606b08d2;5
0189f81c-6a38-710e-8730-8babd48c28f4;4
0189f81c-650e-7171-b962-045cc4033000;3
0189f81c-4eff-70c7-a361-caffaf2c0666;2
0189f81c-49e9-7279-8087-9f8e25730957;1

If i run query "SELECT id, model_value FROM test ORDER BY ID DESC" - everything works correctly and i got all this rows.

But if i run query "SELECT id, model_value FROM test where ID < '0189f81c-c795-71d0-bc3f-8b7b74abe044' ORDER BY ID DESC", then i got the result

id;model_value
0189f81c-c33d-7100-a39c-80ea40a090fc;20
0189f81c-b5f1-721d-b135-22c38153aab6;17
0189f81c-9934-721f-90dc-05a65a79f4b6;15
0189f81c-94d6-733d-9953-2addf3a83be4;14
0189f81c-8bca-73c9-8ffb-225a39a5dc49;12
0189f81c-8219-731a-b648-0e19e64d0ec0;10
0189f81c-7e01-710a-91ce-49e62163fa29;9
0189f81c-71fc-7303-b850-4538b119c5fc;6
0189f81c-6e06-71a3-a47a-664a606b08d2;5
0189f81c-650e-7171-b962-045cc4033000;3

The "model_value" column shows that some rows are missing - 19, 18, 16, 13, 11, 8, 7, 4, 2, 1

@serg Is there a quick fix for this bug? I have been waiting for this feature since the end of 2022

Evgenii added a comment - 2023-08-15 07:49 @serg Good afternoon, I tried the latest release version 10.11.5 which was released yesterday on August 14th. This bug is present and incorrectly processes queries like "where id < uuid_v7". How to reproduce: 1. create "TEST" table (id: UUID, model_value: STRING) and fill the data id;model_value 0189f81d-498e-72f8-b0fa-246d7be1adb1;30 0189f81d-335c-71c7-a620-3f093ccc39c3;29 0189f81d-2f49-72fa-8ed9-4bd2fe509e59;28 0189f81d-2ae6-7204-b694-84602a2560c7;27 0189f81d-2620-717b-b873-1829f25a4d53;26 0189f81c-f940-7161-9c48-9677e11bc40c;25 0189f81c-d4cf-71c1-93f1-ddd005eadb8b;24 0189f81c-d07f-7143-8c8b-b19457138aef;23 0189f81c-cbec-707d-9b45-b007920b5ef6;22 0189f81c-c795-71d0-bc3f-8b7b74abe044;21 0189f81c-c33d-7100-a39c-80ea40a090fc;20 0189f81c-bf35-70a0-a288-8d80473bc015;19 0189f81c-baf8-718a-870c-cd41e677c3f4;18 0189f81c-b5f1-721d-b135-22c38153aab6;17 0189f81c-b011-71cd-bf2a-d6be3cea6dd7;16 0189f81c-9934-721f-90dc-05a65a79f4b6;15 0189f81c-94d6-733d-9953-2addf3a83be4;14 0189f81c-9016-71a6-a340-ec300186a4d9;13 0189f81c-8bca-73c9-8ffb-225a39a5dc49;12 0189f81c-868b-71d1-830e-9b4ae08aa782;11 0189f81c-8219-731a-b648-0e19e64d0ec0;10 0189f81c-7e01-710a-91ce-49e62163fa29;9 0189f81c-7a05-724e-a7e1-9baf65a91ecb;8 0189f81c-75b8-7223-bd6f-b60601be3aa1;7 0189f81c-71fc-7303-b850-4538b119c5fc;6 0189f81c-6e06-71a3-a47a-664a606b08d2;5 0189f81c-6a38-710e-8730-8babd48c28f4;4 0189f81c-650e-7171-b962-045cc4033000;3 0189f81c-4eff-70c7-a361-caffaf2c0666;2 0189f81c-49e9-7279-8087-9f8e25730957;1 If i run query "SELECT id, model_value FROM test ORDER BY ID DESC" - everything works correctly and i got all this rows. But if i run query "SELECT id, model_value FROM test where ID < '0189f81c-c795-71d0-bc3f-8b7b74abe044' ORDER BY ID DESC", then i got the result id;model_value 0189f81c-c33d-7100-a39c-80ea40a090fc;20 0189f81c-b5f1-721d-b135-22c38153aab6;17 0189f81c-9934-721f-90dc-05a65a79f4b6;15 0189f81c-94d6-733d-9953-2addf3a83be4;14 0189f81c-8bca-73c9-8ffb-225a39a5dc49;12 0189f81c-8219-731a-b648-0e19e64d0ec0;10 0189f81c-7e01-710a-91ce-49e62163fa29;9 0189f81c-71fc-7303-b850-4538b119c5fc;6 0189f81c-6e06-71a3-a47a-664a606b08d2;5 0189f81c-650e-7171-b962-045cc4033000;3 The "model_value" column shows that some rows are missing - 19, 18, 16, 13, 11, 8, 7, 4, 2, 1 @serg Is there a quick fix for this bug? I have been waiting for this feature since the end of 2022

Sergei Golubchik added a comment - 2023-08-15 09:52

xpun, this is a new bug, next time (if there will be one) please, report it separately, otherwise it'll likely be missed. This time, as I've noticed it, I've reported it as ~~MDEV-31926~~.

Sergei Golubchik added a comment - 2023-08-15 09:52 xpun , this is a new bug, next time (if there will be one) please, report it separately, otherwise it'll likely be missed. This time, as I've noticed it, I've reported it as MDEV-31926 .

MariaDB Server

UUID Sorting

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration