Details
-
New Feature
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
Description
VECTOR(N) data type, that is an array of N floating poing numbers in some unspecified internal format (IEEE 754 or something else)
Questions:
Should we allow any particular operators on these?
Attachments
Issue Links
- blocks
-
MDEV-35031 Update on vector column returns error but modifies the value, results in further ER_KEY_NOT_FOUND
-
- Closed
-
-
MDEV-35831 MYSQL_TYPE_VECTOR
-
- Open
-
- relates to
-
MDEV-31053 UUID(size) should be disallowed
-
- Confirmed
-
-
MDEV-32885 VEC_DISTANCE() function
-
- Closed
-
-
MDEV-32886 VEC_FromText() and VEC_ToText() functions
-
- Closed
-
-
MDEV-35063 Assertion `v->distance_to_target >= threshold' fails upon adding certain values to vector key
-
- Closed
-
-
MDEV-32887 vector search
-
- Stalled
-
I'm new to this but it seems to me that a column should be defined as something like VECTOR(3072, 32) (3072 dimensions of 32 bit floats) where the 32bit is the default and is an optional paramter.
Vector values shoud be able to be input (insert/update) via JSON syntax and when you SELECT on a vector the default syntax seems like it should be JSON (for human readable render).
Also, it might be helpful if you can specify a col as VECTOR(dim, bit_depth, model_name) so the column definition has an optional user defined model name (maybe varchar(60) ) associated with it. The vector is so tightly tied to the model that produced it (say openai-text-embedding-3-large) that the model is conceptually part of the datatype... i.e. you have to generate embeddings in the same model to match against that vector stored in the db. Things are evolving so quickly and models are changing... seems like it would be nice to be able to track the model that applies to those vectors. Up to the user to ensure they do of course. I can see using different embedding models for different use cases and over time... having this as part of the datatype will reduce confusion.
Please forgive me if any of what I've said is stupid, as I'm on a learning curve with all this stuff.