Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30095

Unexpected duplicate entry error for UNIQUE USING HASH + NOPAD

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
    • 10.4(EOL)
    • Character Sets
    • None

    Description

      This script returns an unexpected duplicate entry error:

      CREATE OR REPLACE TABLE t1 (
        a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
        UNIQUE KEY(a(33000)) USING HASH
      );
      INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
      INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
      

      ERROR 1062 (23000): Duplicate entry 'td                                                           ...' for key 'a'
      

      The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

      The same problem is repeatable with these two strings:

      CREATE OR REPLACE TABLE t1 (
        a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
        UNIQUE KEY(a(33000)) USING HASH
      );
      INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
      INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
      

      ERROR 1062 (23000): Duplicate entry 'xh                                                           ...' for key 'a'
      

      Problem details

      These two data pairs are special - they were found programatically.

      The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:

      • CONCAT('td',SPACE(31696))
      • CONCAT('td',SPACE(31728))

      The problem is that the further duplicate hash resolution (performed to know if the actual values are really equal) erroneously treat these strings as equal using strnncollsp_nchars(). Using this function is correct for the CHAR data type, but it's not correct for VARCHAR/TEXT.

      The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:

      • CONCAT('xh',SPACE(32457))
      • CONCAT('xh',SPACE(32489))

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Description This script returns an unexpected duplicate entry errors:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            Note, these two data pairs are special - they were found programmatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}
            The problems is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            Note, these two data pairs are special - they were found programmatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}
            The problems is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            bar Alexander Barkov made changes -
            Description This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            Note, these two data pairs are special - they were found programmatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}
            The problems is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            Note, these two data pairs are special - they were found programmatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}

            The problem is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            bar Alexander Barkov made changes -
            Description This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            Note, these two data pairs are special - they were found programmatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}

            The problem is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            h2. Problem details

            These two data pairs are special - they were found programatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}

            The problem is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            bar Alexander Barkov made changes -
            Description This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            h2. Problem details

            These two data pairs are special - they were found programatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}

            The problem is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            h2. Problem details

            These two data pairs are special - they were found programatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}

            The problem is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal using strnncollsp_nchars(). Using this function is correct for the CHAR data type, but it's not correct for VARCHAR/TEXT.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            bar Alexander Barkov made changes -
            Description This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            h2. Problem details

            These two data pairs are special - they were found programatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}

            The problem is that the further duplicate hash resolution (to know if the actual values are really equal) erroneously treat these strings as equal using strnncollsp_nchars(). Using this function is correct for the CHAR data type, but it's not correct for VARCHAR/TEXT.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            This script returns an unexpected duplicate entry error:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
            INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'td ...' for key 'a'
            {noformat}

            The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

            The same problem is repeatable with these two strings:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (
              a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
              UNIQUE KEY(a(33000)) USING HASH
            );
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
            INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));
            {code}
            {noformat}
            ERROR 1062 (23000): Duplicate entry 'xh ...' for key 'a'
            {noformat}

            h2. Problem details

            These two data pairs are special - they were found programatically.

            The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('td',SPACE(31696))}}
            - {{CONCAT('td',SPACE(31728))}}

            The problem is that the further duplicate hash resolution (performed to know if the actual values are really equal) erroneously treat these strings as equal using strnncollsp_nchars(). Using this function is correct for the CHAR data type, but it's not correct for VARCHAR/TEXT.


            The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:
            - {{CONCAT('xh',SPACE(32457))}}
            - {{CONCAT('xh',SPACE(32489))}}
            marko Marko Mäkelä made changes -

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.