Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The native UUID data type performs byte-swapping, which was implemented in https://jira.mariadb.org/browse/MDEV-26664 optimize storing of UUIDv1 in indexes.
But this byte-swapping is not desirable for other versions of UUID. This manifested itself when (as I understood it) it broke UUIDv6 and v7, where both versions are already optimized for database indexes. As documented here: https://mariadb.com/docs/server/reference/data-types/string-data-types/uuid-data-type#less-than-10.11.5-10.10.6 :
UUID values version >= 6 are not stored without byte-swapping
I guess this is the accompanying issue: https://jira.mariadb.org/browse/MDEV-31926 .
I consider this a bug for other UUID versions as well, since the optimization is aimed specifically at UUIDv1. Potentially also UUIDv2, which seems to be derived from v1. For other versions, it provides no benefit and can even be harmful; at the very least, it is confusing. Also, nobody knows how future versions will be defined. Since v6 and v7 are already exempt from this, the potential argument of uniform handling does not hold. I believe the "condition" should be inverted - apply byte-swapping only for UUIDv1 (and maybe v2).
The original issue mentions inspiration in https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_uuid-to-bin , which also states this:
> Time-part swapping assumes the use of UUID version 1 values, such as are generated by the UUID() function. For UUID values produced by other means that do not follow version 1 format, time-part swapping provides no benefit.
The main difference is that in MySQL the behavior is opt-in, default is no byte-swapping:
> The one-argument form takes a string UUID value. The binary result is in the same order as the string argument.
In MariaDB this was applied to everything automatically and there is no opt-out. Based on the previously mentioned issue, it appears that this behavior was even applied for v6 and v7, where it caused active harm. This suggests that the impact on different UUID versions may not have been fully considered when the feature was first introduced.
I have also looked at how other databases handle it:
- MySQL - no swapping (but also no "native" UUID column type, apart from the mentioned opt-in parameter)
- Postgres - has native type, AFAIK no byte swapping - https://www.postgresql.org/docs/current/datatype-uuid.html
Why is it a problem, if it "should not matter" for other UUID versions? Because of the problems UUIDs cause when storing them in B-trees, there are a lot of optimizations in the wild, which aim to mitigate this exact problem. However, if such an optimization is already in place and byte-swapping is then applied, it "disables" it.
We were using UUIDv4 with MariaDB for years, which were stored as BINARY, but were made "a bit sequential". For our implementation we were using this as inspiration: https://web.archive.org/web/20260207131427/https://www.enterprisedb.com/blog/sequential-uuid-generators . This article also mentions https://en.wikipedia.org/wiki/Universally_unique_identifier#Combined_Time-GUID which describes similar optimizations taking place already in 2002. This leads me to believe that we are not alone in optimizing UUID generation. Quite on the contrary - I believe larger systems, which notice this problem would have had probably already some optimizations in place, which byte-swapping can disrupt. In our case when we migrated some of the biggest tables (hundreds of millions of rows, hundreds of writes every second) from binary to UUID type it severely degraded performance and nearly rendered the database unusable. On the other hand this was somewhat "fortunate", had it been only a minor slow down, we could have not noticed it immediately and would have much more trouble finding it. But this can exactly be the case for other companies.
But why not use UUIDv7? Since they contain a milli/microsecond level timestamp, they leak potentially sensitive data, making them unsuitable as non-internal identifiers. Additionally, our solution retains more entropy, but that is not the main factor.
Of course such optimizations as mentioned above brings risks of incompatibility, such as discussed here, but I think in this case it is easily avoidable and aligned with handling in other systems.
Summary: Please consider applying byte-swapping only for UUIDv1, where it is only actually needed.
Attachments
Issue Links
- is caused by
-
MDEV-26664 Store UUIDs in a more efficient manner
-
- Closed
-