Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35897

Questions about memory allocation done for vector index search

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.7.1
    • 11.8
    • Vector search
    • None

    Description

      I started to look at PMP and perf profiles while running ann-benchmarks and when ef_search is large I see much time in memory allocation functions, most from VisitedSet code, during search_layer.

      I am curious about two things:
      1) Why is such a large value for est_size passed to the VisitedSet ctor here? This creates huge bloom filters as each uses about 2*est_size bytes, and with ef_search=400 (see example below) that is ~32MB. Such large alloc / dealloc cycles are slow when frequent.

      2) Why does "bv" use longlong which is 8 bytes while (see here) while the num_blocks math (see here) appears to assume the bv vector is 4 bytes per entry. For one example:

      • n = 15,044,334
      • m = 143,931,441 (~= 10 * n)
      • log_num_blocks = 22
      • num_blocks = 4,194,304
      • bv.resize(4194304) is called, which provides about 33,554,432 bits

      The est_size variable is: est_heuristic * (ef ^ ef_power), from here. This ends up creating huge bloom filters when used by the VisitedSet ctor.

      All of these are from queries with LIMIT 10

      With ef_search=120 the math is: 39.x * (120 ^ 2) = 1135647 and visited.count is <= 2000 at the end of search_layer

      With ef_search=200 the math is: 39.x * (200 ^ 2) = 3398947 and visited.count is <= 2000 at the end of search_layer()

      With ef_search=300 the math is: 39.x * (300 ^ 2) = 8114222 and visited.count is <= 3000 at the end of search_layer()

      With ef_search=400 the math is: 39.x * (400 ^ 2) = 15044334 and visited.count is <= 3000 at the end of search_layer()

      Attachments

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              mdcallag Mark Callaghan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.