Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38549

MDEV-9826 derived a bug for numeric columns from the old hash design

    XMLWordPrintable

Details

    • Unexpected results
    • Q1/2026 Server Maintenance

    Description

      The old hash algorithm had a mistake in its design for numeric columns.

      Numeric columns derived hash_not_null() from the top level Field:

      void Field::hash_not_null(Hasher *hasher)
      {
        DBUG_ASSERT(marked_for_read());
        DBUG_ASSERT(!is_null());
        hasher->add(sort_charset(), ptr, pack_length());
      }
      

      sort_charset() returns my_charset_latin1 for numeric columns.

      So for example, Field_long re-interprets its buffer containing a binary number as a character string with:

      • length 4
      • collation latin1_swedish_ci

      As a result, the hash is calculated doing the following transformation:

      • All bytes 0x20 are truncated from the "string"
      • Lower case "characters" are converted to their upper case counter parts

      The new hash algorithms (MDEV-9826) derive this wrong behavior. This scripts with CRC32C demonstrates the problem:

      CREATE OR REPLACE TABLE t1 (c1 INT NOT NULL) PARTITION BY KEY ALGORITHM=CRC32C (c1) PARTITIONS 16;
      INSERT INTO t1 VALUES (0x41),(0x61);
      SELECT table_name, partition_name, table_rows
      FROM information_schema.partitions
      WHERE table_name='t1' AND table_rows>0;
      

      +------------+----------------+------------+
      | table_name | partition_name | table_rows |
      +------------+----------------+------------+
      | t1         | p8             |          2 |
      +------------+----------------+------------+
      

      Notice, both records got written to the same partition.

      This behavior should be fixed for numeric data types for new algorithms.

      It seems we'll have to add a new virtual function to hash numbers into my_hasher_st.
      In the old algorithm (my_hasher_mysql5x) the numeric hashing function should still use my_charset_latin1 for backward compatibility.
      In the new algorithms the numeric hashing functions should use my_charset_bin.

      Attachments

        Issue Links

          Activity

            People

              ycp Yuchen Pei
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.