[MDEV-371] Unique indexes for blobs Created: 2012-06-28  Updated: 2023-12-12  Resolved: 2019-02-23

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Create Table
Fix Version/s: 10.4.3

Type: Task Priority: Critical
Reporter: roberto spadim Assignee: Sachin Setiya (Inactive)
Resolution: Fixed Votes: 1
Labels: gsoc14, gsoc15, gsoc16

Issue Links:
PartOf
includes MDEV-5795 Full-Column Unique Index Closed
includes MDEV-21606 Improve update handler (long unique k... Closed
Problem/Incident
causes MDEV-18707 Server crash in my_hash_sort_bin, ASA... Closed
causes MDEV-18708 Server crash in Item_field::register_... Closed
causes MDEV-18709 Server crash or ASAN heap-buffer-over... Closed
causes MDEV-18710 Unexpected ER_EXPRESSION_REFERS_TO_UN... Closed
causes MDEV-18711 Assertion `key_info->key_part->field-... Closed
causes MDEV-18712 InnoDB indexes are inconsistent with ... Closed
causes MDEV-18713 Assertion `strcmp(share->unique_file_... Closed
causes MDEV-18720 Assertion `inited==NONE' failed in ha... Closed
causes MDEV-18722 Assertion `templ->mysql_null_bit_mask... Closed
causes MDEV-18725 Assertion failure in file storage/inn... Closed
causes MDEV-18747 InnoDB: Failing assertion: table->get... Closed
causes MDEV-18748 REPLACE doesn't work with unique blob... Closed
causes MDEV-18763 mi_rrnd: Conditional jump or move dep... Closed
causes MDEV-18790 Server crash in fields_in_hash_keyinf... Closed
causes MDEV-18791 Wrong error upon creating Aria table ... Closed
causes MDEV-18792 ASAN unknown-crash in _mi_pack_key up... Closed
causes MDEV-18793 Assertion `0' failed in row_sel_conve... Closed
causes MDEV-18795 InnoDB: Failing assertion: field->pre... Closed
causes MDEV-18798 InnoDB: No matching column for `DB_RO... Closed
causes MDEV-18800 Server crash in instant_alter_column_... Closed
causes MDEV-18801 InnoDB: Failing assertion: field->col... Closed
causes MDEV-18809 Server crash in fields_in_hash_keyinf... Closed
causes MDEV-18820 Assertion `lock_table_has(trx, index-... Closed
causes MDEV-18887 ha_key_cmp: Conditional jump or move ... Closed
causes MDEV-18888 Server crashes in Item_field::registe... Closed
causes MDEV-18889 Long unique on virtual fields crashes... Closed
causes MDEV-18891 ASAN heap-use-after-free in innobase_... Closed
causes MDEV-18897 InnoDB indexes are inconsistent with ... Closed
causes MDEV-18901 Wrong results after ADD UNIQUE INDEX(... Closed
causes MDEV-18904 Assertion `m_part_spec.start_part >= ... Closed
causes MDEV-18910 Hash value unique long column is misc... Closed
causes MDEV-18922 Alter on long unique varchar column m... Closed
causes MDEV-18967 Load data in system version with long... Closed
causes MDEV-19011 Assertion `file->s->base.reclength < ... Closed
causes MDEV-19045 Change in behavior upon creation of u... Closed
causes MDEV-19049 Server crashes in check_duplicate_lon... Closed
causes MDEV-20131 Assertion `!pk->has_virtual ()' faile... Closed
causes MDEV-21540 Initialization of already inited long... Closed
causes MDEV-21624 Unique index length is able to exceed... Closed
causes MDEV-22676 InnoDB: Failing assertion: result != ... Open
causes MDEV-22759 Failing assertion: !cursor->index->is... Confirmed
causes MDEV-22760 Bulk INSERT...ON DUPLICATE KEY UPDATE... Confirmed
causes MDEV-23218 InnoDB: Flagged corruption, Assertion... Open
causes MDEV-23264 Unique blobs allow duplicate values u... Closed
causes MDEV-26020 Server crash on converting table to u... Closed
causes MDEV-26035 Assertion `file->s->base.reclength < ... Confirmed
causes MDEV-26253 ERROR 1032 (HY000): Can't find record... Confirmed
causes MDEV-28190 sql_mode makes MDEV-371 virtual colum... Closed
causes MDEV-28192 ERROR 1901 During ALTER Leading to fr... Closed
causes MDEV-28238 Incorrect information in file: './tes... Closed
causes MDEV-28514 Assertion `file->s->base.reclength < ... Confirmed
causes MDEV-28710 Replication broken after upgrading to... Open
causes MDEV-29199 Unique hash key is ignored upon INSER... Closed
causes MDEV-29203 Incorrect information in file: ... #s... Confirmed
causes MDEV-29345 update case insensitive (large) uniqu... Confirmed
causes MDEV-29954 Unique hash key on column prefix is c... Closed
causes MDEV-30034 UNIQUE USING HASH accepts duplicate e... Closed
causes MDEV-31093 "ON DUPLICATE KEY UPDATE" saves wrong... Open
causes MDEV-32668 tables with UNIQUE blob columns canno... Open
Relates
relates to MDEV-10177 Invisible columns Closed
relates to MDEV-10178 correct nullability for generated col... Open
relates to MDEV-19252 Warning about assertion failure marke... Closed
relates to MDEV-22277 LeakSanitizer: detected memory leaks ... Closed
relates to MDEV-25047 SIGSEGV in mach_read_from_n_little_en... Closed
relates to MDEV-29949 Unique blobs allow duplicate values u... Open
relates to MDEV-6547 select count(*) on hash partitions do... Open
relates to MDEV-13445 Hash-Index Type for InnoDB Open
relates to MDEV-20001 Potential dangerous regression: INSER... Closed
relates to MDEV-20661 Virtual fields are not recalculated o... Closed
relates to MDEV-20918 INSTANT algorithm doesn't work with u... Open
relates to MDEV-22722 Assertion "inited==NONE" failed in h... Closed
relates to MDEV-22756 SQL Error (1364): Field 'DB_ROW_HASH_... Closed
relates to MDEV-23206 SIGSEGV in btr_search_sys_t::get_part... Open
relates to MDEV-23547 InnoDB: Failing assertion: *len in ro... Closed
relates to MDEV-23713 Replication stops with "Index for tab... Stalled
relates to MDEV-24096 Server crash, InnoDB fatal error, Ass... Closed
relates to MDEV-24522 Assertion `inited==NONE' fails upon U... Closed
relates to MDEV-25779 long uniques aren't 32/64-bit portable Closed
relates to MDEV-27160 main.long_unique failed on ppc64el on... Closed
relates to MDEV-27371 REPLACE INTO creates 0 value on autoi... Confirmed
relates to MDEV-27653 long uniques don't work with unicode ... Closed
relates to MDEV-28098 incorrect key in "dup value" error af... Closed
relates to MDEV-30046 wrong row targeted with "insert ... o... Stalled
relates to MDEV-30087 Prefix UNIQUE HASH raises unexpected ... Open
relates to MDEV-30095 Unexpected duplicate entry error for ... Open
relates to MDEV-30441 ASAN heap-use-after-free in Field_blo... Open
relates to MDEV-30588 Failed to update duplicate data when ... Closed
relates to MDEV-31072 InnoDB is USING HASH and Optimizer is... Open
relates to MDEV-31290 Document index lengths, HASH index usage Open
relates to MDEV-32190 Index corruption with unique key and ... Confirmed

 Description   

Allow a user to create unique constraints of arbitrary length. This will be done on the upper layer, in the server, not in the engine. The server will create the invisible virtual column with a hash over the to-be-unique columns. And a normal BTREE index over this column. On insert or update it'll check the index for hash collisions and, if needed, will retrieve the actual rows to compare the data.

original bug report:

hi guys, i was reading about index... and i have a interesting problem...
i need to check if a file (ok a big row... no problem...) is inside my table...
what i´m thinking....
create table a(b int not null default '',c longblob not null, primary key b)
...
ok no problems....
the problem is... how to know if a file, let´s sai a file of 16MB is inside my table...
first solution is... MD5 and check each row... OK nice work....
but could be a other nicer solution?!
i was thinking something like:

alter table a
add index some_index(c) using hash;

could this work? since it´s a hash index, i don´t see why should i use a part of c value like c (100) for example...

could check if this is possible? today not... i tryed and it return:
/* SQL Error (1170): Coluna BLOB 'hash_automatico' usada na especificação de chave sem o comprimento da chave */ (in portugues PT_BR)

i think that´s all



 Comments   
Comment by Sergei Golubchik [ 2012-06-28 ]

yes, this was discussed for quite a while, and has a long history in MySQL bug database.
there was even an attempt to implement this, but somehow it failed, don't really know why.
I guess, we can give it another try.

Comment by roberto spadim [ 2012-06-28 ]

humm, nice =)
well i think the problem was sending big quanty of bytes in/out database, but... maybe a function to help index could be nice... i think that many guys implement something like
select count(*) from files where some_hash_field=(length || ';' || some_hash_value_calculated_in_a_script)
if count(*)> 0
select * from files where some_hash_field=(length || ';' || some_hash_value_calculated_in_a_script)
well, the first part is realy nice... but if it return >1 well send file is nicer than read many files with same size...

i don´t know if it could be nice, but implementation is close to hash with any binary char field, since it´s a hash not a btree... (i´m wrong?)
well =) let´s see what happen =)
it´s a feature request, don´t think that´s really needed but a 'formal' or a 'recommended' way to check if a 'file' (blob) is in database could be good, at least in documentation...

Comment by roberto spadim [ 2013-06-09 ]

hi sergey any idea where the patch about this try was saved? i want to see if it's easy to implement but i don't know how to start

Comment by Sergei Golubchik [ 2013-06-10 ]

No, I don't. Any anyway, if I would like to do it, I would rather start
from scratch, than from some old incomplete patch.

Comment by roberto spadim [ 2013-06-10 ]

ok, at least a hash index/unique index could help a lot

Comment by roberto spadim [ 2013-06-10 ]

this should be done in each storage engine, or index is a general "feature" of mariadb?

Comment by Sergei Golubchik [ 2013-06-10 ]

in each storage engine. in particular, MyISAM and Aria almost support this already, and that "attempt" that I was referring to was exactly about making them support it fully.

Comment by roberto spadim [ 2013-06-10 ]

hummm nice, in first step make myisam and aria blob index possible, innodb and others is a second step, right?

Comment by smit hinsu [ 2014-03-17 ]

Hi Sergei,

I am interested in working on this feature as part of GSoC. From Google search it seems that this feature is really important as many people report problem related to having BLOB/TEXT as primary key or creating index for it.

I have good experience with databases but currently I am new to mariadb source code. I would appreciate if you can help me in getting started.

Thanks

Comment by Rick James [ 2022-01-27 ]

How does this handle COLLATION of a large TEXT field? Is there a way to "hash" while honoring complex UTF-* case and accent handling?

Comment by Sergei Golubchik [ 2022-01-27 ]

It should do that automatically. But that part has a bug, though so it doesn't always work. Reported as MDEV-27653

Comment by Marko Mäkelä [ 2023-05-19 ]

For the record, the following patch should disable the MDEV-371 functionality:

diff --git a/sql/sql_table.cc b/sql/sql_table.cc
index 6e8a4795f21..f8f3eefc114 100644
--- a/sql/sql_table.cc
+++ b/sql/sql_table.cc
@@ -2441,6 +2441,8 @@ static inline void make_long_hash_field_name(LEX_CSTRING *buf, uint num)
 static Create_field * add_hash_field(THD * thd, List<Create_field> *create_list,
                                       KEY *key_info)
 {
+  my_error(ER_TOO_LONG_KEY, MYF(0), 1000);
+  return nullptr;
   List_iterator<Create_field> it(*create_list);
   Create_field *dup_field, *cf= new (thd->mem_root) Create_field();
   cf->flags|= UNSIGNED_FLAG | LONG_UNIQUE_HASH_FIELD;

This may be useful for testing, because there are many open bugs related to indexed virtual columns, and MDEV-371 is internally creating hidden indexed virtual columns.

Generated at Thu Feb 08 06:28:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.