Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25779

long uniques aren't 32/64-bit portable

Details

    Description

      long uniques use internally Item_func_hash, which works as

      longlong  Item_func_hash::val_int()
      {
        DBUG_EXECUTE_IF("same_long_unique_hash", return 9;);
        unsigned_flag= true;
        ulong nr1= 1,nr2= 4;
        ...
        cs->hash_sort(l, sizeof(l), &nr1, &nr2);
        ...
        return   (longlong)nr1;
      }
      

      But ulong has a different size on 32-bit and 64-bit. All other code that uses hash_sort(), seems to be using only the lower 32-bit of the returned hash value. This item uses all 64, which makes hash values dependent on whether the platform is 32-bit or 64-bit.

      Considerations:

      • a simple fix would be to use only the lowest 32 bit from nr1 in the Item_func_hash::val_int(), but perhaps this hash function should internally use uint32 and be changed everywhere?
      • perhaps HA_HASH_FIELD_LENGTH should be 4 not 8?
      • compatibility. how to fix it and not break existing tables?

      Attachments

        Issue Links

          Activity

            Observation:

            The low level macros which is used all around the hash code:

            #define MY_HASH_ADD(A, B, value) \
              do { A^= (((A & 63)+B)*((value)))+ (A << 8); B+=3; } while(0)
            

            is implemented in the way that after the hash calculation, this value:

             (uint32_t) nr1
            

            is equal on all platforms.

            In other words, these two values are equal:

            • nr1 (as is) on 32bit platforms
            • nr1 & 0xFFFFFFFF on 64bit platforms

            So the code which uses only the low 4 bytes of nr1 should be safe.

            For example, this code in ha_partition.cc is safe:

            uint32 ha_partition::calculate_key_hash_value(Field **field_array)
            {
              ...
              return (uint32) nr1;
            }
            

            But Item_func_hash::val_int() is obviously not safe.

            bar Alexander Barkov added a comment - Observation: The low level macros which is used all around the hash code: #define MY_HASH_ADD(A, B, value) \ do { A^= (((A & 63)+B)*((value)))+ (A << 8); B+=3; } while (0) is implemented in the way that after the hash calculation, this value: (uint32_t) nr1 is equal on all platforms. In other words, these two values are equal: nr1 (as is) on 32bit platforms nr1 & 0xFFFFFFFF on 64bit platforms So the code which uses only the low 4 bytes of nr1 should be safe. For example, this code in ha_partition.cc is safe: uint32 ha_partition::calculate_key_hash_value(Field **field_array) { ... return (uint32) nr1; } But Item_func_hash::val_int() is obviously not safe.
            Elkin Andrei Elkin added a comment -

            serg: Maybe Bar should own this one now?

            Elkin Andrei Elkin added a comment - serg : Maybe Bar should own this one now?

            Wasn’t this addressed by MDEV-27653?

            marko Marko Mäkelä added a comment - Wasn’t this addressed by MDEV-27653 ?

            People

              bar Alexander Barkov
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.