Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30095

Unexpected duplicate entry error for UNIQUE USING HASH + NOPAD

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
    • 10.4(EOL)
    • Character Sets
    • None

    Description

      This script returns an unexpected duplicate entry error:

      CREATE OR REPLACE TABLE t1 (
        a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
        UNIQUE KEY(a(33000)) USING HASH
      );
      INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
      INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
      

      ERROR 1062 (23000): Duplicate entry 'td                                                           ...' for key 'a'
      

      The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

      The same problem is repeatable with these two strings:

      CREATE OR REPLACE TABLE t1 (
        a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
        UNIQUE KEY(a(33000)) USING HASH
      );
      INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
      INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
      

      ERROR 1062 (23000): Duplicate entry 'xh                                                           ...' for key 'a'
      

      Problem details

      These two data pairs are special - they were found programatically.

      The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:

      • CONCAT('td',SPACE(31696))
      • CONCAT('td',SPACE(31728))

      The problem is that the further duplicate hash resolution (performed to know if the actual values are really equal) erroneously treat these strings as equal using strnncollsp_nchars(). Using this function is correct for the CHAR data type, but it's not correct for VARCHAR/TEXT.

      The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:

      • CONCAT('xh',SPACE(32457))
      • CONCAT('xh',SPACE(32489))

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.