Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38485

Pluggable Full-Text Search Framework with BM25 Ranking

    XMLWordPrintable

Details

    • New Feature
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • ROADMAP
    • None
    • None

    Description

      Introduce a pluggable full-text search framework in MariaDB with native BM25 relevance scoring, enabling modern, extensible, and more accurate text search compared to existing natural language and boolean modes.

      Problem Statement

      MariaDB’s current full-text search capabilities rely on basic natural language and boolean modes that:

      Do not account for document length normalization or inverse document frequency

      Are difficult to extend or customize

      As a result:

      Search results are often poorly ranked for content-heavy applications

      Users must integrate external search engines for acceptable relevance

      MariaDB is less competitive for applications requiring high-quality text search

      User & Use Case

      Primary Users

      • MariaDB developers
      • Database administrators
      • Platform engineers building content-driven applications

      Primary Use Cases

      • Ranking blog posts, documentation, or articles by relevance
      • Searching user-generated content (comments, reviews, messages)
      • Enabling in-database search for applications that cannot rely on external search services

      Secondary Use Cases

      • Hybrid relational + search workloads
      • AI-assisted search pipelines that require deterministic relevance scoring at the database layer

      Competitive Research & Market Context

      PostgreSQL

      • Supports BM25-like ranking via ts_rank and extensions
      • Strong extensibility through custom ranking functions

      Limitations: complexity of configuration, fragmented ecosystem, limited pluggability at the index engine level

      MySQL

      • Supports full-text search with basic ranking
      • No native BM25 implementation
      • Limited extensibility and weak relevance tuning

      Elasticsearch / OpenSearch

      • Native BM25 with advanced relevance tuning
      • Highly configurable and scalable

      Limitations: operational complexity, separate infrastructure, eventual consistency, cost

      Key Market Gaps

      • In-database BM25 with first-class support
      • Pluggable architecture without requiring external search systems
      • Simpler operational model compared to dedicated search engines

      Feature Behavior & Scope

      In Scope

      • Pluggable full-text index framework
      • Native BM25 ranking implementation
      • Configurable scoring parameters (e.g., k1, b)
      • SQL-level configuration and usage
      • Compatibility with existing full-text index syntax where feasible

      Behavior

      • Users can create a full-text index specifying BM25 as the ranking algorithm
      • Query execution uses BM25 scoring by default for supported indexes
      • Framework allows future ranking models or third-party implementations to be plugged in

      Acceptance Criteria

      • A pluggable full-text index framework is implemented
      • Native BM25 ranking is available as a first-party implementation
      • Users can configure BM25 parameters at index or global level
      • Query plans clearly indicate when BM25-based ranking is used

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adamluciano Adam Luciano
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.