Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36317

vector search with Cosine Distance, the recall rate of the returned results is very low

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • 11.8
    • Vector search
    • None

    Description

      vector search with Cosine Distance, the recall rate of the returned results is very low.
      abs2 in FVector is memory value, when we load data from disk, we need init abs2 = 1.0f
      The fix code is as follows

      diff --git a/sql/vector_mhnsw.cc b/sql/vector_mhnsw.cc
      index d8a63a7558c..91256a31910 100644
      --- a/sql/vector_mhnsw.cc
      +++ b/sql/vector_mhnsw.cc
      @@ -820,7 +820,7 @@ int FVectorNode::load_from_record(TABLE *graph)
         FVector *vec_ptr= FVector::align_ptr(tref() + tref_len());
         memcpy(vec_ptr->data(), v->ptr(), v->length());
         vec_ptr->postprocess(ctx->vec_len);
      -
      +  if (ctx->metric == COSINE) vec_ptr->abs2 = 1.0f;
         longlong layer= graph->field[FIELD_LAYER]->val_int();
         if (layer > 100) // 10e30 nodes at M=2, more at larger M's
           return my_errno= HA_ERR_CRASHED;
      

      Attachments

        Issue Links

          Activity

            serg Sergei Golubchik added a comment -

            Do you have a test case? Normally vec_ptr->abs2 should be 1.0f here already. What values of vec_ptr->abs2 do you see?

            serg Sergei Golubchik added a comment - Do you have a test case? Normally vec_ptr->abs2 should be 1.0f here already. What values of vec_ptr->abs2 do you see?
            myx myx added a comment -

            I run ann bench mark with dataset nytimes-256-angular

            myx myx added a comment - I run ann bench mark with dataset nytimes-256-angular
            serg Sergei Golubchik added a comment -

            Yes, I've run ann-benchmarks with nytimes-256-angular. Is this patch your way of fixing zero-length vectors? Does it help to improve recall?

            serg Sergei Golubchik added a comment - Yes, I've run ann-benchmarks with nytimes-256-angular. Is this patch your way of fixing zero-length vectors? Does it help to improve recall?
            myx myx added a comment -

            yes, it significantly increased the recall rate

            myx myx added a comment - yes, it significantly increased the recall rate

            People

              serg Sergei Golubchik
              myx myx
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.