Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36568

Research: Enhancing MariaDB Full Text Search

    XMLWordPrintable

Details

    • Q1/2026 Server Development

    Description

      Objective

      To determine the feasibility and performance benefits of implementing advanced full-text search in MariaDB, considering options such as the BM25 search algorithm using a sparse vector approach or BM25, or another option.

      1. Background & Hypothesis
      MariaDB's current TF-IDF-based full-text search can be improved. We hypothesize that integrating either the BM25 algorithm via sparse vectors, inspired by modern search solutions like Milvus, or BM25, or another method will provide enhanced search relevance with acceptable performance overhead. This research will validate that hypothesis by evaluating options.

      2. Key Research Questions
      This investigation will answer:

      Integration: What are the most viable paths to integrate either a BM25 ranking function via sparse vectors into MariaDB's query engine and FTS syntax, BM25, or are there other options?
      Performance: How do prototypes for BM25 sparse vectors and Tantivy (BM25 written in Rust) compare against the existing InnoDB FTS in terms of query latency, relevance, and resource usage? Are there other algorithm options?

      3. Research Plan & Deliverables

      Analyze: Review existing implementations of BM25 sparse embeddings (Milvus, Postgres) and Tantivy, along with MariaDB's FTS architecture, to identify integration strategies for each option. Other options are good to too.
      Prototype: Develop minimal Proofs of Concept (PoCs) to model the BM25 algorithm with sparse vector data storage
      Benchmark: Compare the PoCs' performance and relevance against MariaDB's native FTS using a MS MARCO Passage Ranking v1 (8.8M passages, 1M queries)
      Subsample: 100K passages, 5K dev queries (for iteration speed) dataset.

      Key Deliverable: A technical write-up containing:

      A clear recommendation on whether to proceed with a full implementation, which option (BM25, BM25 sparse vectors, or other option) is preferred, and if any alternatives should be considered.
      Comparative performance and relevance data across the options.
      A proposed high-level design for the recommended approach if the project is deemed feasible.

      Benefits

      Better results: Deliver more relevant outcomes across varied queries and document sizes.
      Market strength: Position MariaDB as a stronger full-text search option.
      User control: Enable customization through configurable settings.
      Performance: Leverage efficient approaches for quick searches with minimal resource use.

      Attachments

        Activity

          People

            Unassigned Unassigned
            adamluciano Adam Luciano
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.