[MDEV-25779] long uniques aren't 32/64-bit portable Created: 2021-05-26  Updated: 2023-09-04  Resolved: 2023-09-04

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Create Table
Affects Version/s: 10.4, 10.5
Fix Version/s: 10.11.2, 10.4.28, 10.5.19, 10.6.12, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Major
Reporter: Sergei Golubchik Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-27653 long uniques don't work with unicode ... Closed
Relates
relates to MDEV-371 Unique indexes for blobs Closed
relates to MDEV-27653 long uniques don't work with unicode ... Closed

 Description   

long uniques use internally Item_func_hash, which works as

longlong  Item_func_hash::val_int()
{
  DBUG_EXECUTE_IF("same_long_unique_hash", return 9;);
  unsigned_flag= true;
  ulong nr1= 1,nr2= 4;
  ...
  cs->hash_sort(l, sizeof(l), &nr1, &nr2);
  ...
  return   (longlong)nr1;
}

But ulong has a different size on 32-bit and 64-bit. All other code that uses hash_sort(), seems to be using only the lower 32-bit of the returned hash value. This item uses all 64, which makes hash values dependent on whether the platform is 32-bit or 64-bit.

Considerations:

  • a simple fix would be to use only the lowest 32 bit from nr1 in the Item_func_hash::val_int(), but perhaps this hash function should internally use uint32 and be changed everywhere?
  • perhaps HA_HASH_FIELD_LENGTH should be 4 not 8?
  • compatibility. how to fix it and not break existing tables?


 Comments   
Comment by Alexander Barkov [ 2021-05-26 ]

Observation:

The low level macros which is used all around the hash code:

#define MY_HASH_ADD(A, B, value) \
  do { A^= (((A & 63)+B)*((value)))+ (A << 8); B+=3; } while(0)

is implemented in the way that after the hash calculation, this value:

 (uint32_t) nr1

is equal on all platforms.

In other words, these two values are equal:

  • nr1 (as is) on 32bit platforms
  • nr1 & 0xFFFFFFFF on 64bit platforms

So the code which uses only the low 4 bytes of nr1 should be safe.

For example, this code in ha_partition.cc is safe:

uint32 ha_partition::calculate_key_hash_value(Field **field_array)
{
  ...
  return (uint32) nr1;
}

But Item_func_hash::val_int() is obviously not safe.

Comment by Andrei Elkin [ 2022-02-22 ]

serg: Maybe Bar should own this one now?

Comment by Marko Mäkelä [ 2023-09-03 ]

Wasn’t this addressed by MDEV-27653?

Generated at Thu Feb 08 09:40:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.