[MDEV-32012] hash unique corrupts index on virtual blobs Created: 2023-08-25  Updated: 2023-11-22  Resolved: 2023-09-07

Status: Closed
Project: MariaDB Server
Component/s: Virtual Columns
Affects Version/s: 10.4, 10.5, 10.6, 10.10, 10.11, 11.0, 11.1
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3

Type: Bug Priority: Blocker
Reporter: Sergei Golubchik Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks

 Description   

create table t1 (
  f1 varchar(25),
  v1 mediumtext generated always as (concat('f1:', f1)) virtual,
  unique key (f1) using hash,
  key (v1(1000))
);
insert t1 (f1) values (9599),(94410);
check table t1 extended;
drop table t1;

What happens here is the following sequence of events:

  • There's an update, it modifies the value of a bob virtual column
    • The column is indexed, so it's recalculated
  • The pointer to the value (it's a blob) is stored in the record[0]. As it's a virtual calculated column, the value is owned by the Field_blob and stored in the Field_blob::value.
  • The previous value of the field (also calculated) is in Field_blob::read_value, and the pointer is in record[1].
  • Now comes check_duplicate_long_entry_key(), it calculates the hash value, saves record[0] in the lookup_buffer, and performs an index_read() using a separate lookup_handler.
  • In this case we have a hash collision, so a row is found, it's read into the record[0]
  • Virtual columns are computed and the new value is stored in Field_blob::value, replacing the old one
  • The row from the hash collision isn't identical to the one in the lookup_buffer, so check_duplicate_long_entry_key() decides it's not a duplicate key, restores record[0] from lookup_buffer and proceeds with the update
  • but the blob value in Field_blob::value is lost

This isn't exactly a new kind of problems, this has happened before a few times in partitioning (MDEV-18734, 160d97a4aaac), in innodb (MDEV-15114, ab194666564a), in REPLACE and UPDATE (ea1b25046c81, that's how Field_blob::read_value was born). The fix is to detach temporarily the pointer in the record from Field_blob::value and reattach it back later.



 Comments   
Comment by Michael Widenius [ 2023-08-25 ]

This bug only happens the following context (all have to apply)

  • One has a generated virtual blob/text column
  • One has an unique hash index that includes the virtual blob column.

In this case the error will happen when there is a duplicate hash value for the generated key.

Temporary workarounds (any of):

  • Change the blob/text to a varchar
  • Use persistent instead of generated virtual column (this will of course use more take up some more space)

If the generated blob was only used for unique checks, changing the blob to a varchar will have no effect on any applications using the table.

Comment by Michael Widenius [ 2023-08-25 ]

Thus bug had been around a long time. It has not been known before this reported issue.
The bug was normally not noticeable until the hash function was changed some time ago, which caused more hash collisions, which in turn exposed this bug.

As a follow-up improvement we will fix the hash function to have less collision, which will improve performance a bit.

Comment by Elena Stepanova [ 2023-09-02 ]

Testing for the fix was performed, as far as it's possible with other existing bugs around virtual columns and unique hash keys. It didn't reveal any issues which hadn't been known before, and the reported problem was resolved, so it should be pushed into main.

Generated at Thu Feb 08 10:28:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.