[MDEV-4958] Adding datatype UUID Created: 2013-08-27  Updated: 2023-11-23  Resolved: 2021-11-10

Status: Closed
Project: MariaDB Server
Component/s: N/A
Fix Version/s: 10.7.1

Type: Task Priority: Critical
Reporter: Leo Unglaub Assignee: Alice Sherepa
Resolution: Fixed Votes: 36
Labels: Preview_10.7, uuid

Issue Links:
Blocks
blocks CONJ-899 Support UUID Object Closed
blocks MDEV-15854 Implement uuid_to_bin, bin_to_uuid an... Open
is blocked by MDEV-4912 Data type plugin API version 1 Closed
is blocked by MDEV-20890 Illegal mix of collations with UUID() Closed
Duplicate
duplicates MDEV-5593 Feature request native support for UU... Closed
Problem/Incident
causes MDEV-26615 uuid() function on default column val... Closed
causes MDEV-26616 uuid data type - convert from text wi... Closed
causes MDEV-26664 Store UUIDs in a more efficient manner Closed
causes MDEV-26732 Assertion `0' failed in Item::val_native Closed
causes MDEV-26742 Assertion `field->type_handler() == t... Closed
causes MDEV-26785 Hyphens inside the value of uuid data... Closed
Relates
relates to MDEV-27207 Assertion `!m_null_value' failed in i... Closed
relates to MDEV-8605 MariaDB not use DEFAULT value even wh... Closed
relates to MDEV-11339 Support UUID v4 generation Open
relates to MDEV-23748 support not-MAC-address based UUID ve... Open
relates to MDEV-27015 Assertion `!is_null()' failed in Fixe... Closed
relates to MDEV-28491 Uuid. "UPDATE/DELETE" not working "WH... Closed
relates to MDEV-31137 UUID type is never used for user vari... Closed
Epic Link: New data types

 Description   

The current way of handling uuid's in MariaDB is not very user friendly. If you want to do it right you have to replace all 4 "-" chars with "" and store it in a binary(16) type and so on. That is a lot of work and sadly people start to use just varchar() for it because it's easier. But that is a huge performence problem.

To fix that i would propose to add a "uuid" datatype to MariaDB the same way PostgreSQL did it. http://www.postgresql.org/docs/9.2/static/datatype-uuid.html

That would make working with uuid's a lot easier and faster.
Thanks and greetings
Leo



 Comments   
Comment by Oliver Hoff (Inactive) [ 2013-08-27 ]

A datatype will maybe less useful for current applications, because of BC schemas, but I would recommend it for new schema.
For current schemas some (robust) conversion between textual and binary representation of UUIDs would be very helpful, although this functions can be shimed on legacy systems.

Comment by Sergei Golubchik [ 2013-08-30 ]

You can convert from text uuid to binary with something like unhex(replace(uuid, "-", "")).

But yes, a dedicated type would be much more convenient, I agree.

Comment by Jacob Rhoden (Inactive) [ 2013-10-24 ]

Recommend that this is implemented as a native data type, not plugin or other method, given the fact that people will be using this as a primary key.

Comment by Sergei Golubchik [ 2014-07-21 ]

Also, the benefits of the dedicated type would be (quoting MDEV-5593):

  • more comfortable because they do not have to do hex / unhex and remove dash
  • faster index than varchar variant
  • automatic generation of UUID if inserted record with NULL PK
Comment by Michael Amado [ 2015-10-08 ]

Just in case performance could be improved in the initial implementation I thought I'd link this article:
https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/ (author: Karthik Appigatla)
It suggests reordering the bits when storing UUIDs (version 1) has a big performance benefit when the UUID is used as a PK for InnoDB. I assume XtraDB would be the same. Ideally this bit shuffling would be handled so clients can naively insert/select the conventionally arranged version 1 UUIDs, but get the performance benefits of the reordering.

You can count me as someone who would benefit greatly from a performant UUID data type. Thanks!

Comment by Federico Razzoli [ 2015-10-21 ]

I hope that automatic generation will work even if the UUID column is not a PK - just like timestamps.

Comment by VAROQUI Stephane [ 2015-11-11 ]

Storing UUID with time locality :
http://mysqlserverteam.com/author/guilhem/

Comment by Daniel Black [ 2018-12-13 ]

Functions implemented/ported in MDEV-15854 which include the lower level functions for a UUID datatype.

Comment by Rick James [ 2020-01-31 ]

Recommend: When storing the bits into BINARY(16), rearrange them. Rationale...

Type-1 UUIDs (which MySQL has always used) include a timestamp. But the bits of the time are scrambled. This means that indexing on a UUID leads to random accesses. When an index (or table) is big, 'consecutive' rows tend to be scattered.

By simply shuffling the bits, you can make a Type-1 UUID act like an AUTO_INCREMENT or TIMESTAMP – you can get "locality of reference" for accessing chronologically 'close' records.

More discussion here: http://mysql.rjweb.org/doc.php/uuid , and the 8.0.0 Changelog: UUID_TO_BIN() and BIN_TO_UUID() convert between UUID values in string and binary formats (represented as hexadecimal characters and VARBINARY(16), respectively). This permits conversion of string UUID values to binary values that take less storage space. UUID values converted to binary can be represented in a way that permits improved indexing efficiency.

Comment by Alexander Barkov [ 2020-02-11 ]

SQL Server introduces a separate data type for this:

https://docs.microsoft.com/en-us/sql/t-sql/functions/newsequentialid-transact-sql?view=sql-server-ver15

Comment by Steven Ayre [ 2020-08-12 ]

Rick, would you suggest that it always reorders? That incurs a small performance penalty which is unnecessary for v4 UUIDs.

Perhaps a UUID(0) and UUID(1) argument to replicate the swap_flag on UUID_TO_BIN() could be used to toggle that behaviour?

Alexander, NEWSEQUENTIALID is a function returning a uuid-like 128bit value to store in a uniqueidentifier column. It isn't compatible with v1/v2 UUIDs. Are you suggesting two separate data types?

A flag affecting the storage would seem preferable to me, similar to the fsp argument on the DATETIME type.

Comment by Rick James [ 2020-08-13 ]

I lean toward always reordering. (But I don't have a strong opinion.) Some debating points (on both sides):

  • Nothing else will look at the layout on disk, so always swapping works.
  • MySQL 8 has a flag – I see this as clutter with no benefit.
  • Would mysqldump (etc) always convert to the standard 36-char HEX layout?
  • Performance penalty is insignificant. (I repeatedly remind users that all the overhead of fetching a row is much higher than simple function calls.)
  • Compatibility with MySQL 8 (with its optional flag) may be more important than other arguments. (MariaDB has been drifting away from its "drop-in compatibility" claim.)
  • Swapping does not hurt type-4 (etc) uuids.
Comment by Steven Ayre [ 2020-08-18 ]

> MySQL 8 has a flag – I see this as clutter with no benefit.
I've implemented the MySQL 8 functions as a UDF (https://github.com/SteveAyre/uuid2bin) to use until this type is added, and the reordering has a 5% performance hit. So there is a benefit. What impact it will have on the data type will depend on the implementation and yes fetching the row is more significant, but it won't be free. If you're working on millions of rows it may be important.

Comment by Rick James [ 2020-08-19 ]

In some situations, there is a significant performance advantage for the reshuffle.

Given:

  • A huge table (bigger than the buffer_pool)
  • PRIMARY KEY(uuid) – Type 1 with the bits rearranged as described
  • The working set is "recent" data
    Then the activity on the table will use much less I/O, hence be much faster than if the bits were not rearranged.

(A similar argument can be made for a secondary index starting with UUID.)

Comment by Steven Ayre [ 2020-08-20 ]

Should consider an upgrade path from existing databases using byte reordering (MySQL 8 or MariaDB with either stored functions or UDF)...

  • Importing a dump containing BINARY(16) into a schema using the UUID type
  • ALTER TABLE MODIFY on a column from the BINARY(16) to UUID type
    In these cases the UUID type will need to know whether the timestamp bytes are reordered or not.

Personally I like the idea of UUID(0) and UUID(1) a lot as it gives a way to specify this, or as an alternative two different UUID types.

Perhaps the default behaviour should be to reorder if the DBA hasn't specified one.

Comment by Daniel Black [ 2021-02-15 ]

An example of the ideal case for this: https://dba.stackexchange.com/questions/285410/innodb-primary-key-advice

Comment by Rick James [ 2021-02-15 ]

Another thought... Instead of shuffling the bits when the UUID is stored, make use of the "datatype UUID" to shuffle the bits as it is being used in BTree accesses.

That is, the storage has the straightforward mapping between bits and hex with dashes; no rearranging.

Just as FLOAT and DECIMAL and SIGNED/UNSIGNED must interpret the bits differently when comparing, UUID would shuffle the bits when comparing.

Comment by Daniel Black [ 2021-09-20 ]

To all those that are watching, a preview release is available for testing and feedback:

https://mariadb.org/10-7-preview-feature-uuid-data-type/

Comment by Alice Sherepa [ 2021-10-29 ]

I think it could be merged into 10.7

Comment by Slawomir Dymitrow [ 2022-01-22 ]

Hi guys, so excited to see the UUID data type as a feature! One question - does the actually implemented feature use the bits-shuffle solution to optimize performance? Or was this only an idea that didn't get implemented?

Comment by Ian Gilfillan [ 2022-01-22 ]

eXsio see MDEV-26664

Comment by Alexander Barkov [ 2022-01-22 ]

eXsio,

Also please see these comments:
https://github.com/MariaDB/server/blob/10.7/plugin/type_uuid/sql_type_uuid.h#L29

Generated at Thu Feb 08 07:00:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.