Details

    Description

      Allow a user to create unique constraints of arbitrary length. This will be done on the upper layer, in the server, not in the engine. The server will create the invisible virtual column with a hash over the to-be-unique columns. And a normal BTREE index over this column. On insert or update it'll check the index for hash collisions and, if needed, will retrieve the actual rows to compare the data.

      original bug report:

      hi guys, i was reading about index... and i have a interesting problem...
      i need to check if a file (ok a big row... no problem...) is inside my table...
      what i´m thinking....
      create table a(b int not null default '',c longblob not null, primary key b)
      ...
      ok no problems....
      the problem is... how to know if a file, let´s sai a file of 16MB is inside my table...
      first solution is... MD5 and check each row... OK nice work....
      but could be a other nicer solution?!
      i was thinking something like:

      alter table a
      add index some_index(c) using hash;

      could this work? since it´s a hash index, i don´t see why should i use a part of c value like c (100) for example...

      could check if this is possible? today not... i tryed and it return:
      /* SQL Error (1170): Coluna BLOB 'hash_automatico' usada na especificação de chave sem o comprimento da chave */ (in portugues PT_BR)

      i think that´s all

      Attachments

        Issue Links

          Activity

            yes, this was discussed for quite a while, and has a long history in MySQL bug database.
            there was even an attempt to implement this, but somehow it failed, don't really know why.
            I guess, we can give it another try.

            serg Sergei Golubchik added a comment - yes, this was discussed for quite a while, and has a long history in MySQL bug database. there was even an attempt to implement this, but somehow it failed, don't really know why. I guess, we can give it another try.
            rspadim roberto spadim added a comment - - edited

            humm, nice =)
            well i think the problem was sending big quanty of bytes in/out database, but... maybe a function to help index could be nice... i think that many guys implement something like
            select count(*) from files where some_hash_field=(length || ';' || some_hash_value_calculated_in_a_script)
            if count(*)> 0
            select * from files where some_hash_field=(length || ';' || some_hash_value_calculated_in_a_script)
            well, the first part is realy nice... but if it return >1 well send file is nicer than read many files with same size...

            i don´t know if it could be nice, but implementation is close to hash with any binary char field, since it´s a hash not a btree... (i´m wrong?)
            well =) let´s see what happen =)
            it´s a feature request, don´t think that´s really needed but a 'formal' or a 'recommended' way to check if a 'file' (blob) is in database could be good, at least in documentation...

            rspadim roberto spadim added a comment - - edited humm, nice =) well i think the problem was sending big quanty of bytes in/out database, but... maybe a function to help index could be nice... i think that many guys implement something like select count(*) from files where some_hash_field=(length || ';' || some_hash_value_calculated_in_a_script) if count(*)> 0 select * from files where some_hash_field=(length || ';' || some_hash_value_calculated_in_a_script) well, the first part is realy nice... but if it return >1 well send file is nicer than read many files with same size... i don´t know if it could be nice, but implementation is close to hash with any binary char field, since it´s a hash not a btree... (i´m wrong?) well =) let´s see what happen =) it´s a feature request, don´t think that´s really needed but a 'formal' or a 'recommended' way to check if a 'file' (blob) is in database could be good, at least in documentation...

            hi sergey any idea where the patch about this try was saved? i want to see if it's easy to implement but i don't know how to start

            rspadim roberto spadim added a comment - hi sergey any idea where the patch about this try was saved? i want to see if it's easy to implement but i don't know how to start

            No, I don't. Any anyway, if I would like to do it, I would rather start
            from scratch, than from some old incomplete patch.

            serg Sergei Golubchik added a comment - No, I don't. Any anyway, if I would like to do it, I would rather start from scratch, than from some old incomplete patch.

            ok, at least a hash index/unique index could help a lot

            rspadim roberto spadim added a comment - ok, at least a hash index/unique index could help a lot

            this should be done in each storage engine, or index is a general "feature" of mariadb?

            rspadim roberto spadim added a comment - this should be done in each storage engine, or index is a general "feature" of mariadb?

            in each storage engine. in particular, MyISAM and Aria almost support this already, and that "attempt" that I was referring to was exactly about making them support it fully.

            serg Sergei Golubchik added a comment - in each storage engine. in particular, MyISAM and Aria almost support this already, and that "attempt" that I was referring to was exactly about making them support it fully.

            hummm nice, in first step make myisam and aria blob index possible, innodb and others is a second step, right?

            rspadim roberto spadim added a comment - hummm nice, in first step make myisam and aria blob index possible, innodb and others is a second step, right?
            smit_hinsu smit hinsu added a comment -

            Hi Sergei,

            I am interested in working on this feature as part of GSoC. From Google search it seems that this feature is really important as many people report problem related to having BLOB/TEXT as primary key or creating index for it.

            I have good experience with databases but currently I am new to mariadb source code. I would appreciate if you can help me in getting started.

            Thanks

            smit_hinsu smit hinsu added a comment - Hi Sergei, I am interested in working on this feature as part of GSoC. From Google search it seems that this feature is really important as many people report problem related to having BLOB/TEXT as primary key or creating index for it. I have good experience with databases but currently I am new to mariadb source code. I would appreciate if you can help me in getting started. Thanks
            rjasdfiii Rick James added a comment -

            How does this handle COLLATION of a large TEXT field? Is there a way to "hash" while honoring complex UTF-* case and accent handling?

            rjasdfiii Rick James added a comment - How does this handle COLLATION of a large TEXT field? Is there a way to "hash" while honoring complex UTF-* case and accent handling?

            It should do that automatically. But that part has a bug, though so it doesn't always work. Reported as MDEV-27653

            serg Sergei Golubchik added a comment - It should do that automatically. But that part has a bug, though so it doesn't always work. Reported as MDEV-27653

            For the record, the following patch should disable the MDEV-371 functionality:

            diff --git a/sql/sql_table.cc b/sql/sql_table.cc
            index 6e8a4795f21..f8f3eefc114 100644
            --- a/sql/sql_table.cc
            +++ b/sql/sql_table.cc
            @@ -2441,6 +2441,8 @@ static inline void make_long_hash_field_name(LEX_CSTRING *buf, uint num)
             static Create_field * add_hash_field(THD * thd, List<Create_field> *create_list,
                                                   KEY *key_info)
             {
            +  my_error(ER_TOO_LONG_KEY, MYF(0), 1000);
            +  return nullptr;
               List_iterator<Create_field> it(*create_list);
               Create_field *dup_field, *cf= new (thd->mem_root) Create_field();
               cf->flags|= UNSIGNED_FLAG | LONG_UNIQUE_HASH_FIELD;
            

            This may be useful for testing, because there are many open bugs related to indexed virtual columns, and MDEV-371 is internally creating hidden indexed virtual columns.

            marko Marko Mäkelä added a comment - For the record, the following patch should disable the MDEV-371 functionality: diff --git a/sql/sql_table.cc b/sql/sql_table.cc index 6e8a4795f21..f8f3eefc114 100644 --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -2441,6 +2441,8 @@ static inline void make_long_hash_field_name(LEX_CSTRING *buf, uint num) static Create_field * add_hash_field(THD * thd, List<Create_field> *create_list, KEY *key_info) { + my_error(ER_TOO_LONG_KEY, MYF(0), 1000); + return nullptr; List_iterator<Create_field> it(*create_list); Create_field *dup_field, *cf= new (thd->mem_root) Create_field(); cf->flags|= UNSIGNED_FLAG | LONG_UNIQUE_HASH_FIELD; This may be useful for testing, because there are many open bugs related to indexed virtual columns, and MDEV-371 is internally creating hidden indexed virtual columns.

            People

              sachin.setiya.007 Sachin Setiya (Inactive)
              rspadim roberto spadim
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.