Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32016

change the hash used for hash unique

Details

    Description

      After MDEV-27653 the hash used for hash uniques was changed to be 32-bit. This is a portable value (unlike the old one) but 64-bit hash would've produced less collisions. Let's change the hash to be

      • not using ulong and other non-portable data types
      • 64-bit always, on all architectures
      • generally providing good and uniform distribution
      • fast

      Old tables must keep working, of course. This is the third but supposedly not the last time we'll change the hash function, so let's store the used hash function in the extra2 area. It'll simplify future changes.

      May be we can change all non-persistent hash tables in the server to use the new function.

      This bug will not cause wrong answers, only affects performance (as there are more collisions).
      MariaDB will remember if a hash key was created as 32 or 64 bit and continue to use the original length even if we change back the default to 64 bit at some point.

      Attachments

        Issue Links

          Activity

            loosely translated from slack discussion:

            A stream hash from a continuous buffer is best done by xxHash - they use simd and claim to be cross-platform. It is clear that xxHash won't support all our platforms thought.

            Surprisingly low hash collision rates were shown by our hash function:
            https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/utils/common/hasher.h#L142

            A function from RobinHood isn't bad, even though it's not supported:
            https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/utils/common/robin_hood.h#L700

            There's a tooling in the xxHash repo for comparing function parameters - I highly recommend using it in research: https://github.com/Cyan4973/xxHash and https://github.com/Cyan4973/xxHash/tree/dev/tests/collisions

            By the way, https://github.com/Cyan4973/xxHash/tree/dev/tests/collisions works amazingly fast with a stream that is not stored in contiguous memory. I, as you understand, tested x86_64, we did not support ARM back then. Supposedly on other platforms you can see a different picture.

            serg Sergei Golubchik added a comment - loosely translated from slack discussion: A stream hash from a continuous buffer is best done by xxHash - they use simd and claim to be cross-platform. It is clear that xxHash won't support all our platforms thought. Surprisingly low hash collision rates were shown by our hash function: https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/utils/common/hasher.h#L142 A function from RobinHood isn't bad, even though it's not supported: https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/utils/common/robin_hood.h#L700 There's a tooling in the xxHash repo for comparing function parameters - I highly recommend using it in research: https://github.com/Cyan4973/xxHash and https://github.com/Cyan4973/xxHash/tree/dev/tests/collisions By the way, https://github.com/Cyan4973/xxHash/tree/dev/tests/collisions works amazingly fast with a stream that is not stored in contiguous memory. I, as you understand, tested x86_64, we did not support ARM back then. Supposedly on other platforms you can see a different picture.

            People

              serg Sergei Golubchik
              serg Sergei Golubchik
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.