Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9826

change the hash algorithm for PARTITION BY KEY

    XMLWordPrintable

Details

    • Q4/2025 Server Development

    Description

      Test case from https://bugs.launchpad.net/percona-server/+bug/1348295 gives the same result (uneven data disctribution) on recent MariaDB 10.1.13:

      [openxs@fc23 maria10.1]$ bin/mysql -uroot test
      Reading table information for completion of table and column names
      You can turn off this feature to get a quicker startup with -A
       
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 3
      Server version: 10.1.13-MariaDB Source distribution
       
      Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [test]> drop table t1;
      Query OK, 0 rows affected (0.18 sec)
       
      MariaDB [test]> create table t1(c1 int, c2 date) partition by key(c2) partitions 8;
      Query OK, 0 rows affected (1.92 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-22');
      Query OK, 1 row affected (0.04 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-23');
      Query OK, 1 row affected (0.04 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-24');
      Query OK, 1 row affected (0.04 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-25');
      Query OK, 1 row affected (0.04 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-26');
      Query OK, 1 row affected (0.15 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-27');
      Query OK, 1 row affected (0.05 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-28');
      Query OK, 1 row affected (0.04 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-29');
      Query OK, 1 row affected (0.12 sec)
       
      MariaDB [test]> insert into t1 values (1, '2014-04-30');
      Query OK, 1 row affected (0.07 sec)
       
      MariaDB [test]> select partition_name, table_rows from information_schema.partitions where table_name='t1';
      +----------------+------------+
      | partition_name | table_rows |
      +----------------+------------+
      | p0             |          3 |
      | p1             |          0 |
      | p2             |          0 |
      | p3             |          0 |
      | p4             |          2 |
      | p5             |          0 |
      | p6             |          4 |
      | p7             |          0 |
      +----------------+------------+
      8 rows in set (0.00 sec)
       
      MariaDB [test]>
      

      Increasing the number of inserts does not result in amortised even distribution, at all:

      --source include/have_sequence.inc
      create table t1(c1 int, c2 date) partition by key  (c2) partitions 8;
      insert into t1 select 1, from_days(seq) from seq_735000_to_736000;
      select partition_name, table_rows from information_schema.partitions where table_name='t1';
      partition_name	table_rows
      p0	297
      p1	0
      p2	0
      p3	0
      p4	428
      p5	0
      p6	276
      p7	0
      drop table t1;
      

      No wonder, as this is upstream bug known also as:

      https://bugs.mysql.com/bug.php?id=72428
      https://bugs.mysql.com/bug.php?id=65112

      Percona set the bug to "Won't fix" and Oracle seems to postpone fixing as the fix requires changing of hash algorithm and may result in backward compatibility issues.

      I wonder if mariaDB can implement new hashing algorithm for KEY partitioning (maybe under some new "name"/partition type) in 10.1 or 10.2, one that distributes data in a more even/expected way?

      See also MDEV-22831 for a proposal and the first comment there for extra information.

      This task does not apply to PARTITION BY HASH, as despite the name, PARTITION BY HASH simply calculate the modulus of user supplied expression over the number of partitions, and no server supplied hash functions are used:

      hash-partitioning-type.md

      To determine which partition to use, the following calculation is performed:\
      MOD(partitioning\_expression, number\_of\_partitions)
      

      Attachments

        Issue Links

          Activity

            People

              ycp Yuchen Pei
              valerii Valerii Kravchuk
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: