Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36313

VEC_DISTANCE_COSINE return error value

Details

    Description

      sql: SELECT VEC_DISTANCE_COSINE(VEC_FROMTEXT('[1,2,3]'), VEC_FROMTEXT('[0,0,0]'));
      return value is 0.

      static double calc_distance_cosine(float *v1, float *v2, size_t v_len)
      {
        double dotp=0, abs1=0, abs2=0;
        for (size_t i= 0; i < v_len; i++, v1++, v2++)
        {
          float f1= get_float(v1), f2= get_float(v2);
          abs1+= f1 * f1;
          abs2+= f2 * f2;
          dotp+= f1 * f2;
        }
        return 1 - dotp/sqrt(abs1*abs2);
      }
      

      abs1 or abs2 maybe zero, so in this case,1 should be returned

      Attachments

        Issue Links

          Activity

            HNOONa-0 Ahmed Hani added a comment -

            This operation is undefined, you can't calculate cosine distance when either vectors is the 0 vector.

            HNOONa-0 Ahmed Hani added a comment - This operation is undefined, you can't calculate cosine distance when either vectors is the 0 vector.
            myx myx added a comment -

            I run ann bench mark with dataset nytimes-256-angular, this is zero data in dataset

            myx myx added a comment - I run ann bench mark with dataset nytimes-256-angular, this is zero data in dataset
            serg Sergei Golubchik added a comment - - edited

            Right, the operation is undefined, 1 it no better than 0 here.

            And, right, ann-benchmarks has 239 such vectors. But, again, because the distance for them isn't defined, one cannot say whether they're close to anything or not. They shouldn't cause bugs thpough — like data corruption, wrong results (for other vectors) or crashes.

            serg Sergei Golubchik added a comment - - edited Right, the operation is undefined, 1 it no better than 0 here. And, right, ann-benchmarks has 239 such vectors. But, again, because the distance for them isn't defined, one cannot say whether they're close to anything or not. They shouldn't cause bugs thpough — like data corruption, wrong results (for other vectors) or crashes.
            serg Sergei Golubchik added a comment -

            MDEV-36317 claims that returning a particular value significantly improves the recall on the nytimes-256-angular data set. This will be investigated in MDEV-36317

            serg Sergei Golubchik added a comment - MDEV-36317 claims that returning a particular value significantly improves the recall on the nytimes-256-angular data set. This will be investigated in MDEV-36317

            People

              serg Sergei Golubchik
              myx myx
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.