Details

    Description

      Let's keep the graph in memory. Like, in TABLE_SHARE.
      Add nodes there as they're read from the table, e.g. on searches.
      Update on inserts.

      When it'll grow beyond some (configurable) limit, start removing nodes that are close to each other. Or remove the bottom level in HNSW. The goal is to still keep enough in memory to find any row with no more than one table access.

      Not clear how to do transaction isolation.

      Attachments

        Issue Links

          Activity

            serg Sergei Golubchik added a comment - - edited

            Update:

            • cache works
            • it's shared
            • concurrent searches (SELECT) work
            • readers can populate the cache (from disk) concurrently
            • inserts are not concurrent, MyISAM-only (protected by the table lock)

            Next step: InnoDB. Problems:

            1. rollbacks, add few nodes, then rollback
            2. transaction isolation, long running transaction won't see nodes added after it started
            3. concurrent writes, two connections adding new nodes to the graph
            4. savepoints

            note that two concurrent transactions cannot modify the same node — InnoDB row locks will guarantee that it doesn't happen

            serg Sergei Golubchik added a comment - - edited Update: cache works it's shared concurrent searches (SELECT) work readers can populate the cache (from disk) concurrently inserts are not concurrent, MyISAM-only (protected by the table lock) Next step: InnoDB. Problems: rollbacks, add few nodes, then rollback transaction isolation, long running transaction won't see nodes added after it started concurrent writes, two connections adding new nodes to the graph savepoints note that two concurrent transactions cannot modify the same node — InnoDB row locks will guarantee that it doesn't happen

            InnoDB solutions:

            1. we'll use a local cache
              • tt must be the same for all instances of the same table in a transaction and flushed on commit
              • so it'll be a pseudo-hton, registered for this transaction, and the cache will be in thd_ha_data.
              • on rollback the cache is discarded, on commit it's merged into the shared cache
              • for simplicity in the first version we'll only support one vector index per transaction
              • for simplicity in the first version we won't support savepoints
            2. this will make the shared graph slightly more "approximate"
              • in the first version it's fine, if the benchmark will agree
              • later we can extend the handler API to force read-committed for a specific table
            3. solved by 1
            4. see 1
              • will be implemented by deducting local cache from global, instead of merging it in if a savepoint rollback was used
            serg Sergei Golubchik added a comment - InnoDB solutions: we'll use a local cache tt must be the same for all instances of the same table in a transaction and flushed on commit so it'll be a pseudo-hton, registered for this transaction, and the cache will be in thd_ha_data . on rollback the cache is discarded, on commit it's merged into the shared cache for simplicity in the first version we'll only support one vector index per transaction for simplicity in the first version we won't support savepoints this will make the shared graph slightly more "approximate" in the first version it's fine, if the benchmark will agree later we can extend the handler API to force read-committed for a specific table solved by 1 see 1 will be implemented by deducting local cache from global, instead of merging it in if a savepoint rollback was used
            serg Sergei Golubchik added a comment - - edited
            • savepoints work
            • many vector indexes per transaction work
            • on commit the trx cache is not merged into ctx cache, but deducted, like for savepoints
              • because ann_benchmarks inserts into an empty table, so merging causes a slowdown
              • we might want to add merging later again, protected by a heuristic and supported by an adequate benchmark
              • same for the optimization where trx cache would load nodes from the ctx cache, not from the table
            serg Sergei Golubchik added a comment - - edited savepoints work many vector indexes per transaction work on commit the trx cache is not merged into ctx cache, but deducted, like for savepoints because ann_benchmarks inserts into an empty table, so merging causes a slowdown we might want to add merging later again, protected by a heuristic and supported by an adequate benchmark same for the optimization where trx cache would load nodes from the ctx cache, not from the table

            People

              serg Sergei Golubchik
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.