Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38284

Add VEC_DISTANCE_DOT_PRODUCT Vector Function

    XMLWordPrintable

Details

    Description

      MariaDB already features the VEC_DISTANCE_COSINE vector function (https://mariadb.com/docs/server/reference/sql-functions/vector-functions/vec_distance_cosine) to compute the distance between two (not necessarily normalized) vectors by measuring the Cosine of the angle between them. This function is often used with text embeddings, i.e. vector representations of texts. The Cosine distance equals the dot product between the vectors divided by the square root of the product of the norms of the two vectors. The current implementation can be found here: https://github.com/MariaDB/server/blob/main/sql/item_vectorfunc.cc (calc_distance_cosine).

      However, many/most embedding providers such as OpenAI, Cohere, Qwen3, etc. already provide normalized vectors, which means that computing the Cosine distance is unnecessary complex. Simply computing the dot product would be sufficient in this case, since Cosine distance(u,v) = dot product(u,v) for any two normalized vectors u, v.

      Computing the dot product is substantially faster than computing the Cosine distance (should be about 3-5x faster), since computing the Cosine distance requires ~3x more multiplications, ~2x additions, 1 expensive sqrt() operation, and 1 division (+ more memory accesses and consumption).

      Hence, I propose to implement VEC_DISTANCE_DOT_PRODUCT as additional vector function to substantially improve distance computation in most real world application. The implementation is fairly simple, as computation of the dot product is a subset of operations of computation of the Cosine distance. The corresponding function may look like (very similar to calc_distance_cosine as linked above):

      static double calc_distance_dot_product(float *v1, float *v2, size_t v_len)
      {
        double dotp=0;
        for (size_t i= 0; i < v_len; i++, v1++, v2++)
        {
          float f1= get_float(v1), f2= get_float(v2);
          dotp+= f1 * f2;
        }
        return 1 - dotp;
      }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              MarkusZopf Markus Zopf
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.