[MDEV-31926] UUID v7 are compared incorrectly Created: 2023-08-15  Updated: 2023-12-19  Resolved: 2023-10-30

Status: Closed
Project: MariaDB Server
Component/s: Data types
Affects Version/s: 10.9.8, 10.10.6, 10.11.5, 11.0.3
Fix Version/s: 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2

Type: Bug Priority: Critical
Reporter: Sergei Golubchik Assignee: Sergei Golubchik
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-29959 UUID Sorting Closed

 Description   

create table t1 (id uuid, model_value char(10));
insert t1 values ('0189f81d-498e-72f8-b0fa-246d7be1adb1', '30'),
                 ('0189f81d-335c-71c7-a620-3f093ccc39c3', '29'),
                 ('0189f81d-2f49-72fa-8ed9-4bd2fe509e59', '28'),
                 ('0189f81d-2ae6-7204-b694-84602a2560c7', '27'),
                 ('0189f81d-2620-717b-b873-1829f25a4d53', '26'),
                 ('0189f81c-f940-7161-9c48-9677e11bc40c', '25'),
                 ('0189f81c-d4cf-71c1-93f1-ddd005eadb8b', '24'),
                 ('0189f81c-d07f-7143-8c8b-b19457138aef', '23'),
                 ('0189f81c-cbec-707d-9b45-b007920b5ef6', '22'),
                 ('0189f81c-c795-71d0-bc3f-8b7b74abe044', '21'),
                 ('0189f81c-c33d-7100-a39c-80ea40a090fc', '20'),
                 ('0189f81c-bf35-70a0-a288-8d80473bc015', '19'),
                 ('0189f81c-baf8-718a-870c-cd41e677c3f4', '18'),
                 ('0189f81c-b5f1-721d-b135-22c38153aab6', '17'),
                 ('0189f81c-b011-71cd-bf2a-d6be3cea6dd7', '16'),
                 ('0189f81c-9934-721f-90dc-05a65a79f4b6', '15'),
                 ('0189f81c-94d6-733d-9953-2addf3a83be4', '14'),
                 ('0189f81c-9016-71a6-a340-ec300186a4d9', '13'),
                 ('0189f81c-8bca-73c9-8ffb-225a39a5dc49', '12'),
                 ('0189f81c-868b-71d1-830e-9b4ae08aa782', '11'),
                 ('0189f81c-8219-731a-b648-0e19e64d0ec0', '10'),
                 ('0189f81c-7e01-710a-91ce-49e62163fa29', '9'),
                 ('0189f81c-7a05-724e-a7e1-9baf65a91ecb', '8'),
                 ('0189f81c-75b8-7223-bd6f-b60601be3aa1', '7'),
                 ('0189f81c-71fc-7303-b850-4538b119c5fc', '6'),
                 ('0189f81c-6e06-71a3-a47a-664a606b08d2', '5'),
                 ('0189f81c-6a38-710e-8730-8babd48c28f4', '4'),
                 ('0189f81c-650e-7171-b962-045cc4033000', '3'),
                 ('0189f81c-4eff-70c7-a361-caffaf2c0666', '2'),
                 ('0189f81c-49e9-7279-8087-9f8e25730957', '1');
select id, model_value from t1 where id < '0189f81c-c795-71d0-bc3f-8b7b74abe044' order by id desc;
select id, model_value from t1 where concat(id) < '0189f81c-c795-71d0-bc3f-8b7b74abe044' order by id desc;
drop table t1;

because it compares segments backwards independently from the version:

template <bool force_swap>
class UUID: public FixedBinTypeStorage<MY_UUID_SIZE, MY_UUID_STRING_LENGTH>
{
  ...
  // Compare two in-memory values
  static int cmp(const LEX_CSTRING &a, const LEX_CSTRING &b)
  {
    DBUG_ASSERT(a.length == binary_length());
    DBUG_ASSERT(b.length == binary_length());
    int res;
    if ((res= segment(4).cmp_memory(a.str, b.str)) ||
        (res= segment(3).cmp_memory(a.str, b.str)) ||
        (res= segment(2).cmp_memory(a.str, b.str)) ||
        (res= segment(1).cmp_memory(a.str, b.str)) ||
        (res= segment(0).cmp_memory(a.str, b.str)))
      return  res;
    return 0;
  }
  ...
}



 Comments   
Comment by Evgenii [ 2023-08-15 ]

@serg
Good day again)

Do I understand correctly that you release new versions every few months and this fix will not get into version 10.11.5, but will appear or example, in version 10.11.6?

Or have there been cases when, after a few days / weeks, an already released version was fixed?)

Comment by Sergei Golubchik [ 2023-08-15 ]

There have been cases. If a released version is basically impossible to use, like, some bug in packaging that our testing missed. But luckily it doesn't happen very often. Normally, indeed, this bug won't be fixed until the next release about 3 months from now.

Comment by Evgenii [ 2023-10-10 ]

@serg Good day, saw the status of release 10.11.6 scheduled for 2023-10-26, will this bug fix be included in this release?)

Comment by Sergei Golubchik [ 2023-10-16 ]

I cannot say for sure yet. There are few bugs I have to fix first before getting to this one. I hope it will be, though

Comment by Evgenii [ 2023-10-25 ]

@serg Maybe you can delay the release a little so that this bugfix is included in it? Waiting another 3 months for the next release would be a disaster

Comment by Sergei Golubchik [ 2023-10-27 ]

xpun, I've trying to understand how to define a comparison of two UUID values, how to determine what is "correct".

In your test case you compare

select id, model_value from t1 where id < '0189f81c-c795-71d0-bc3f-8b7b74abe044' order by id desc;
select id, model_value from t1 where concat(id) < '0189f81c-c795-71d0-bc3f-8b7b74abe044' order by id desc;

and get different results. But on itself it doesn't mean the comparison is incorrect. Compare with

create table t1 (a int);
insert t1 values (1),(8),(11),(18);
select * from t1 where a > '5' order by a desc;
a
18
11
8
select * from t1 where concat(a) > '5' order by a desc;
a
8

In other words, comparing values as strings and using the native type comparison can produce different results, it's not a bug.

How can I see that the comparison is incorrect?

Comment by Evgenii [ 2023-10-27 ]

@serg
Let's give an example of how I use uuid in real production.
I'm using uuid as a replacement for unsigned integer auto-increment id. I use the "uuid" type because it only takes 16 bytes instead of char(36) and I can generate "uuid v7" from backend and know id of the insert row before i make insert query to mariadb.

For example, I'm creating a table with "messages (id, user_id). Both columns has "uuid" type.
Uuid v7 - monotonically increasing, so I insert step-by-step message and got 100 messages in this table.

First I want to retrive 20 latest messages. Then the next 20, and so on.
My first query:

select id, user_id from messages where user_id = "example_uuid" order by id desc limit 20

.

Next query:

select id, user_id from messages where user_id = "example_uuid" and id < "uuid_from_first_query" order by id desc limit 20

In version 10.11.4, when using uuid v1, it worked like that. I expect uuid v7 to behave the same as uuid v1.

Comment by Evgenii [ 2023-10-27 ]

Uuid v7 was invented to monotonically increase as a replacement for regular autoincrement. And it's expected to sort essentially like a number. Otherwise if I need a simple random string I can generate uuid v4.

Uuid v1 - also monotonically increasing, but it is more prone to collisions

Comment by Sergei Golubchik [ 2023-10-28 ]

It seems you're right, the latest UUID draft (from 20 October 2023) says

UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g., database indexes) sort as opaque raw bytes, without need for parsing or introspection.

Old draft said the same. That is, indeed, concat(uuid) and uuid should sort the same way, at least for v6 and v7.

By the way, note that UUIDv1 — as generated inside MariaDB — isn't prone to collisions, MariaDB guarantees that all UUID() values are unique, even if requested within the same 100 ns time slot.

Comment by Alexander Barkov [ 2023-10-30 ]

serg,
https://github.com/MariaDB/server/commit/caa033a6c786c6e2503d0489a4437ba4bb8fb85a is ok to push.
Can you please add comments to these methods:

+  static bool mem_need_swap(const char *s)
+  { return s[6] > 0 && s[6] < 0x60 && s[8] & 0x80; }
+
+  static bool rec_need_swap(const char *s)
+  { return s[6] & -s[8] & 0x80; }

.

.. which part of the above two expressions checks what.

Generated at Thu Feb 08 10:27:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.