Details
-
New Feature
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
Description
VECTOR(N) data type, that is an array of N floating poing numbers in some unspecified internal format (IEEE 754 or something else)
Questions:
Should we allow any particular operators on these?
Attachments
Issue Links
- blocks
-
MDEV-35031 Update on vector column returns error but modifies the value, results in further ER_KEY_NOT_FOUND
-
- Closed
-
-
MDEV-35831 MYSQL_TYPE_VECTOR
-
- Open
-
- relates to
-
MDEV-31053 UUID(size) should be disallowed
-
- Confirmed
-
-
MDEV-32885 VEC_DISTANCE() function
-
- Closed
-
-
MDEV-32886 VEC_FromText() and VEC_ToText() functions
-
- Closed
-
-
MDEV-35063 Assertion `v->distance_to_target >= threshold' fails upon adding certain values to vector key
-
- Closed
-
-
MDEV-32887 vector search
-
- Stalled
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue relates to MDEV-32887 [ MDEV-32887 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Description | VECTOR(N) data type, that is an array of N floating poing numbers in some unspecified internal format (IEEE 754 or something else) |
VECTOR(N) data type, that is an array of N floating poing numbers in some unspecified internal format (IEEE 754 or something else)
*Questions:* Should we allow any particular operators on these? |
Priority | Major [ 3 ] | Critical [ 2 ] |
Issue Type | Task [ 3 ] | New Feature [ 2 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Labels | vector |
Link |
This issue blocks |
Priority | Major [ 3 ] | Critical [ 2 ] |
Assignee | Sergei Golubchik [ serg ] |
Link |
This issue relates to |
Fix Version/s | 11.7 [ 29815 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | In Testing [ 10301 ] |
Component/s | Data types [ 13906 ] | |
Component/s | Vector search [ 20205 ] | |
Fix Version/s | 11.7.1 [ 29913 ] | |
Fix Version/s | 11.7 [ 29815 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Testing [ 10301 ] | Closed [ 6 ] |
Labels | vector |
Link | This issue relates to MDEV-31053 [ MDEV-31053 ] |
Link | This issue blocks MDEV-35831 [ MDEV-35831 ] |
Two proposals related to this:
First, we should standardize on a table-creation syntax. PlanetScale is using CREATE TABLE t(val VECTOR(4)), with the dimension being a required part of the type. We consider a vector a strong type, with a mandatory dimension, so any attempt to mix string data or vectors of different dimensions can be caught immediately.
Second, the storage format should have a version so it is self-describing. This is what keeps people from accidentally inserting a binary blob for a vector of the wrong format. PlanetScale's chosen format is:
For version==1, the "data" field is an array of 32-bit IEEE754 floats. Other versions could support other number formats like bfloat16, compression and packing schemes, and/or dimensions larger than 65535. Every vector serialization format will start with a 16-bit version, but the other fields can vary by version.
Why an explicit dimension, when we know that dimension is generally (blob_size-4)/4 ? First, it acts as a redundancy check against truncated data. And second, it allows for formats that involve compression, sparse storage, or float formats shorter than 8 bits. It's also a productive use of the otherwise empty space if you want the "data" field to be 32-bit aligned.