We discussed the possibility of renaming the variables before. If it is to be done, it should be done before the release, after that it gets more complicated.
Currently we have 4 system variables:
7fce19bd215ac0671855044520092aa4210049d1
+--------------------------+-----------+
| Variable_name | Value |
+--------------------------+-----------+
| mhnsw_cache_size | 16777216 |
| mhnsw_distance_function | euclidean |
| mhnsw_max_edges_per_node | 6 |
| mhnsw_min_limit | 20 |
+--------------------------+-----------+
Considerations:
the presence of HNSW in the name suggests there may be other algorithms in the future; if so, I think it would be more user-friendly to group all vector-related variables together, by giving them a common prefix. vector_ is first that comes to mind, for further use as vector_mhnsw_xxx, vector_lsh_xxx, etc., but maybe there are better ideas.
I don't know whether there is already a vision how it will be configured when there are alternative algorithms, e.g. whether it would make sense to have, for example, distance_function variable for each algorithm separately, it seems too cumbersome given that the function can also be set in the table definition. If, however, any of the options will be shared among different algorithms, they should lose the algorithm prefix already now, e.g. be not [vector_]mhnsw_distance_function or some [vector_]lsh_distance_function in the future, but just vector_distance_function.
it was also discussed that min_limit and max_edges_per_node as such are not very meaningful name and could be improved. I don't have suggestions for the better naming for them, though.
also, perhaps mhnsw_cache_size should be max cache size ? Because it doesn't allocate all that memory at once, instead memory usage grows until it reaches that value.
Sergei Golubchik
added a comment - also, perhaps mhnsw_cache_size should be max cache size ? Because it doesn't allocate all that memory at once, instead memory usage grows until it reaches that value.
If you think it's more consistent with other similar variables, sure.
I thought when "max" is used for such purposes in MariaDB, it usually means that a query which hits it will actually fail complaining that it cannot be executed. But I don't actually have any statistics to support it, it was just a subjective impression.
Elena Stepanova
added a comment - If you think it's more consistent with other similar variables, sure.
I thought when "max" is used for such purposes in MariaDB, it usually means that a query which hits it will actually fail complaining that it cannot be executed. But I don't actually have any statistics to support it, it was just a subjective impression.
I think these are better at least, although the term "quality" is very subjective. For somebody, the main "quality" of an approximate search will be the correctness, for others the performance. Maybe mhnsw_search_precision_level? For the index, "quality" may even be all right.
mhnsw_search_precision_level is a bit too long to my taste, but, of course, it's not a deciding factor.
Having "quality" in both highlights that they're complementary, both improve results when increased and improve speed when decreased, and one can increase one and compensate by decreasing the other. So I'd suggest to have the same suffix for both.
search/index precision level? Or may be "accuracy" So
Sergei Golubchik
added a comment - I think these are better at least, although the term "quality" is very subjective. For somebody, the main "quality" of an approximate search will be the correctness, for others the performance. Maybe mhnsw_search_precision_level ? For the index, "quality" may even be all right.
mhnsw_search_precision_level is a bit too long to my taste, but, of course, it's not a deciding factor.
Having "quality" in both highlights that they're complementary, both improve results when increased and improve speed when decreased, and one can increase one and compensate by decreasing the other. So I'd suggest to have the same suffix for both.
search/index precision level? Or may be "accuracy" So
mhnsw_index_precision_level — mhnsw_search_precision_level
mhnsw_index_precision — mhnsw_search_precision
mhnsw_index_quality — mhnsw_search_quality
mhnsw_index_accuracy — mhnsw_search_accuracy
Right, we can lose "level", it doesn't mean anything anyway, and given the allowed ranges (e.g. no "level 1" for the index) it can even be confusing.
From the above, "accuracy" sounds most universal to me, but I rarely represent the majority in such matters.
Elena Stepanova
added a comment - Right, we can lose "level", it doesn't mean anything anyway, and given the allowed ranges (e.g. no "level 1" for the index) it can even be confusing.
From the above, "accuracy" sounds most universal to me, but I rarely represent the majority in such matters.
Another consideration (from cvicentiu) that users mainly use vector stores through an AI framework, hardly anyone does it directly. Meaning, it's much less important whether there variables are intuitively understandable by end users, as whether they're intuitively understandable by people, writing vector store connectors for AI frameworks. And for that we should use names "same as everyone else". That is "ef" (or "ef_search") and "M". And may be simply "distance" without "_function" part for brevity.
Thus, an alternative proposal is
SET @@mhnsw_default_m=16;
SET @@mhnsw_default_distance=euclidean;
SET @@mhnsw_ef_search=30;
CREATETABLE t1 (
v VECTOR(10),
VECTOR INDEX (v) M=24 DISTANCE=COSINE
)
Sergei Golubchik
added a comment - - edited Another consideration (from cvicentiu ) that users mainly use vector stores through an AI framework, hardly anyone does it directly. Meaning, it's much less important whether there variables are intuitively understandable by end users, as whether they're intuitively understandable by people, writing vector store connectors for AI frameworks. And for that we should use names "same as everyone else". That is "ef" (or "ef_search") and "M". And may be simply "distance" without "_function" part for brevity.
Thus, an alternative proposal is
SET @@mhnsw_default_m=16;
SET @@mhnsw_default_distance=euclidean;
SET @@mhnsw_ef_search=30;
CREATE TABLE t1 (
v VECTOR(10),
VECTOR INDEX (v) M=24 DISTANCE=COSINE
)
People
Sergei Golubchik
Elena Stepanova
Votes:
0Vote for this issue
Watchers:
4Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1193.0999999046326,"ttfb":360.90000009536743,"pageVisibility":"visible","entityId":131332,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"881e7dea-299c-4ba9-a710-6963693aa503","navigationType":0,"readyForUser":1315.5,"redirectCount":0,"resourceLoadedEnd":1369.7000002861023,"resourceLoadedStart":367.90000009536743,"resourceTiming":[{"duration":146.80000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":367.90000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":367.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":514.7000002861023,"responseStart":0,"secureConnectionStart":0},{"duration":147.59999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":368.2000002861023,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":368.2000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":515.8000001907349,"responseStart":0,"secureConnectionStart":0},{"duration":229.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":368.5,"connectEnd":368.5,"connectStart":368.5,"domainLookupEnd":368.5,"domainLookupStart":368.5,"fetchStart":368.5,"redirectEnd":0,"redirectStart":0,"requestStart":368.5,"responseEnd":598.0999999046326,"responseStart":598.0999999046326,"secureConnectionStart":368.5},{"duration":376.2000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":368.59999990463257,"connectEnd":368.59999990463257,"connectStart":368.59999990463257,"domainLookupEnd":368.59999990463257,"domainLookupStart":368.59999990463257,"fetchStart":368.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":368.59999990463257,"responseEnd":744.8000001907349,"responseStart":744.8000001907349,"secureConnectionStart":368.59999990463257},{"duration":381.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":368.80000019073486,"connectEnd":368.80000019073486,"connectStart":368.80000019073486,"domainLookupEnd":368.80000019073486,"domainLookupStart":368.80000019073486,"fetchStart":368.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":368.80000019073486,"responseEnd":750.4000000953674,"responseStart":750.3000001907349,"secureConnectionStart":368.80000019073486},{"duration":381.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":369.09999990463257,"connectEnd":369.09999990463257,"connectStart":369.09999990463257,"domainLookupEnd":369.09999990463257,"domainLookupStart":369.09999990463257,"fetchStart":369.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":369.09999990463257,"responseEnd":751,"responseStart":751,"secureConnectionStart":369.09999990463257},{"duration":382.2999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":369.30000019073486,"connectEnd":369.30000019073486,"connectStart":369.30000019073486,"domainLookupEnd":369.30000019073486,"domainLookupStart":369.30000019073486,"fetchStart":369.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":369.30000019073486,"responseEnd":751.5999999046326,"responseStart":751.5999999046326,"secureConnectionStart":369.30000019073486},{"duration":401.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":369.40000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":369.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":770.5,"responseStart":0,"secureConnectionStart":0},{"duration":383.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":369.59999990463257,"connectEnd":369.59999990463257,"connectStart":369.59999990463257,"domainLookupEnd":369.59999990463257,"domainLookupStart":369.59999990463257,"fetchStart":369.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":369.59999990463257,"responseEnd":752.9000000953674,"responseStart":752.9000000953674,"secureConnectionStart":369.59999990463257},{"duration":401,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":369.7000002861023,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":369.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":770.7000002861023,"responseStart":0,"secureConnectionStart":0},{"duration":384.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":369.90000009536743,"connectEnd":369.90000009536743,"connectStart":369.90000009536743,"domainLookupEnd":369.90000009536743,"domainLookupStart":369.90000009536743,"fetchStart":369.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":369.90000009536743,"responseEnd":754.5,"responseStart":754.5,"secureConnectionStart":369.90000009536743},{"duration":946.2999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":370.80000019073486,"connectEnd":370.80000019073486,"connectStart":370.80000019073486,"domainLookupEnd":370.80000019073486,"domainLookupStart":370.80000019073486,"fetchStart":370.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":370.80000019073486,"responseEnd":1317.0999999046326,"responseStart":1317.0999999046326,"secureConnectionStart":370.80000019073486},{"duration":982,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":387.7000002861023,"connectEnd":387.7000002861023,"connectStart":387.7000002861023,"domainLookupEnd":387.7000002861023,"domainLookupStart":387.7000002861023,"fetchStart":387.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":387.7000002861023,"responseEnd":1369.7000002861023,"responseStart":1369.5999999046326,"secureConnectionStart":387.7000002861023},{"duration":535.6000003814697,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":790.0999999046326,"connectEnd":790.0999999046326,"connectStart":790.0999999046326,"domainLookupEnd":790.0999999046326,"domainLookupStart":790.0999999046326,"fetchStart":790.0999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":790.0999999046326,"responseEnd":1325.7000002861023,"responseStart":1325.7000002861023,"secureConnectionStart":790.0999999046326}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":196,"responseStart":361,"responseEnd":366,"domLoading":365,"domInteractive":1512,"domContentLoadedEventStart":1512,"domContentLoadedEventEnd":1563,"domComplete":2124,"loadEventStart":2124,"loadEventEnd":2124,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1468.8000001907349},{"name":"bigPipe.sidebar-id.end","time":1471.2000002861023},{"name":"bigPipe.activity-panel-pipe-id.start","time":1471.4000000953674},{"name":"bigPipe.activity-panel-pipe-id.end","time":1474},{"name":"activityTabFullyLoaded","time":1594.8000001907349}],"measures":[],"correlationId":"629323ba6c1f32","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":103,"dbReadsTimeInMs":10,"dbConnsTimeInMs":18,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
also, perhaps mhnsw_cache_size should be max cache size ? Because it doesn't allocate all that memory at once, instead memory usage grows until it reaches that value.