Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32012

hash unique corrupts index on virtual blobs

Details

    Description

      create table t1 (
        f1 varchar(25),
        v1 mediumtext generated always as (concat('f1:', f1)) virtual,
        unique key (f1) using hash,
        key (v1(1000))
      );
      insert t1 (f1) values (9599),(94410);
      check table t1 extended;
      drop table t1;
      

      What happens here is the following sequence of events:

      • There's an update, it modifies the value of a bob virtual column
        • The column is indexed, so it's recalculated
      • The pointer to the value (it's a blob) is stored in the record[0]. As it's a virtual calculated column, the value is owned by the Field_blob and stored in the Field_blob::value.
      • The previous value of the field (also calculated) is in Field_blob::read_value, and the pointer is in record[1].
      • Now comes check_duplicate_long_entry_key(), it calculates the hash value, saves record[0] in the lookup_buffer, and performs an index_read() using a separate lookup_handler.
      • In this case we have a hash collision, so a row is found, it's read into the record[0]
      • Virtual columns are computed and the new value is stored in Field_blob::value, replacing the old one
      • The row from the hash collision isn't identical to the one in the lookup_buffer, so check_duplicate_long_entry_key() decides it's not a duplicate key, restores record[0] from lookup_buffer and proceeds with the update
      • but the blob value in Field_blob::value is lost

      This isn't exactly a new kind of problems, this has happened before a few times in partitioning (MDEV-18734, 160d97a4aaac), in innodb (MDEV-15114, ab194666564a), in REPLACE and UPDATE (ea1b25046c81, that's how Field_blob::read_value was born). The fix is to detach temporarily the pointer in the record from Field_blob::value and reattach it back later.

      Attachments

        Activity

          monty Michael Widenius added a comment - - edited

          This bug only happens the following context (all have to apply)

          • One has a generated virtual blob/text column
          • One has an unique hash index that includes the virtual blob column.

          In this case the error will happen when there is a duplicate hash value for the generated key.

          Temporary workarounds (any of):

          • Change the blob/text to a varchar
          • Use persistent instead of generated virtual column (this will of course use more take up some more space)

          If the generated blob was only used for unique checks, changing the blob to a varchar will have no effect on any applications using the table.

          monty Michael Widenius added a comment - - edited This bug only happens the following context (all have to apply) One has a generated virtual blob/text column One has an unique hash index that includes the virtual blob column. In this case the error will happen when there is a duplicate hash value for the generated key. Temporary workarounds (any of): Change the blob/text to a varchar Use persistent instead of generated virtual column (this will of course use more take up some more space) If the generated blob was only used for unique checks, changing the blob to a varchar will have no effect on any applications using the table.
          monty Michael Widenius added a comment - - edited

          Thus bug had been around a long time. It has not been known before this reported issue.
          The bug was normally not noticeable until the hash function was changed some time ago, which caused more hash collisions, which in turn exposed this bug.

          As a follow-up improvement we will fix the hash function to have less collision, which will improve performance a bit.

          monty Michael Widenius added a comment - - edited Thus bug had been around a long time. It has not been known before this reported issue. The bug was normally not noticeable until the hash function was changed some time ago, which caused more hash collisions, which in turn exposed this bug. As a follow-up improvement we will fix the hash function to have less collision, which will improve performance a bit.

          Testing for the fix was performed, as far as it's possible with other existing bugs around virtual columns and unique hash keys. It didn't reveal any issues which hadn't been known before, and the reported problem was resolved, so it should be pushed into main.

          elenst Elena Stepanova added a comment - Testing for the fix was performed, as far as it's possible with other existing bugs around virtual columns and unique hash keys. It didn't reveal any issues which hadn't been known before, and the reported problem was resolved, so it should be pushed into main.

          People

            serg Sergei Golubchik
            serg Sergei Golubchik
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.