XMLWordPrintable

Details

    Description

      Ricky's original entry:

      Hi, is it possible to revert rearranging bits when inserting new uuid?
      I have no idea how did anyone come to conclusion to rearrange uuid
      bits before inserting and suppose that every inserted uuid is a
      version 1 uuid, and not even bother checking version that's being
      inserted. UuidV6,7,8 and ULID are now useless with uuid type in
      mariadb and they will cause fragmentation, and it's impossible to sort
      rows.

      Monty;
      This Jira entry is about the new UUID type in MariadB 10.7

      It seams that the main confusion is that when doing an ORDER BY
      on the new UUID type, the UUID's is not coming in lexical order
      but in storage order.

      I have addded a little bit of background to allow us to better
      understand and discuss the issues with UUID and also give other
      readers more understand of the different UUID options that MariaDB
      supports.

      MariaDB has a few different uuid's (all unique in the server and
      across servers)

      uuid()

      • Universal Unique Identifiers (UUIDs), as in DCE 1.1: Remote Procedure Call,
      • Unique among all MariaDB installations
      • String, 5 parts separated with '-', 36 bytes
      • Starts 'scrambled time' (time-low, time-mid, time-high) followed by a set
        system unique constants
      • 'Random' from the user point of view

      sys_guid()

      • Like the above but without '-'
      • String, 32 bytes

      uuid_short()

      • 'MySQL's original space efficient uuid'. Unique among a MariaDB cluster.
      • Server id - timestamp - incrementor
      • Each call generates a slightly higher number
      • longlong, 8 bytes ; Efficient storage!
      • 'Sequential' from the user point of view.

      UUID data type in 10.7.0

      • String representation defined in RFC4122.
      • Stored in 'index-friendly manner'. This means that the last
        'constant part' of uuid is stored first and then the time in high-to-low
        byte order. Time bytes are store in big-endian format (same as uuid())
        This only works when the input comes from the MariaDB UUID() function (UUID v1).
      • Newer uuid's will be >= than older id's
      • 16 bytes
      • Storage efficient for storage_engines that can do prefix compression.
      • Can be confusion when doing ORDER BY UUID_column as the order
        is in storage order, not 'string value order'.

      Other things

      • For sorting there is no difference if timestamp is stored first or
        timestamp is stored last after a 'constant part'. The main issue
        for this Jira entry seams to be the the timestamp value is
        internally reordered to get a better storage and bulk insert rate.
      • There is no problem in sorting any of the above UUIDs. As Barkov
        shows with an example, one can always ensure lexical order also
        for the UUID type.
      • For index storage efficiency, storing timestamp last in an UUID is
        better as it allows the storage engine to do prefix-compression and
        can reduce the 16 byte key to 8 bytes or less.
      • The main purpose of an UUID is that they should be unique for the
        server and also across different systems. The storage order of
        bytes does not matter for this to hold true. As long as an
        application does not depend on the sort order of UUID's (and
        applications should not), the MariaDB UUID type is compatible with
        any other UUID or any other database server.

      For storage engine performance there are two things that one would
      like to optimize related to UUID's:

      1) When doing bulk insert, having UUID in order makes inserts of the
      primary key faster and makes the primary key index smaller (initially).
      2) When doing single insert from multiple threads, having UUID in 'random
      order' is better as there will be less page collisions between multiple
      threads and one can get higher insert throughput.

      In other words, it depends on the usage of UUID keys how they should
      be stored. There is no obvious 'best way'.

      When using UUID type or uuid_short() one gets benefit 1)
      By using UUID() strings, one gets benefit 2)

      After reading the comment on this Jira entry, the conclusion with current
      code is:

      • If your application is not depending on the sort order of UUID and you want to optimizer for bulk insert and less storage of the UUID key, then use the UUID type.
      • If your application requires UUID's to be sorted as strings, if you have your own version of UUID or you want to optimize for concurrency between multiple threads, then use BINARY(32) and the sys_guid() function.
      • We should consider adding another UUID type that will store things in the given order. Another option would be to add a type that would
        store HEX strings in binary and show them as hex strings. This could be universally useful and could be a building block for a new UUID
        type that stores things in exact byte order.
        The HEX type would also work for UUID's, but then one would miss the '-' in the output string.
      • As Sergei points out in the comments, we should be able to detect other standard UUID's types > version 5 and not swap bits in them. He has already been working on making this happen (time table not yet defined). For 'own' UUID types, a HEX type could be an efficient.

      Attachments

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              rixafy Rick Strafy
              Votes:
              4 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.