Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
Description
Motivation
I would like to enhance full-text search by incorporating the BM25 ranking algorithm, drawing inspiration from how Milvus leverages sparse embeddings. BM25 is a highly regarded ranking function in information retrieval, celebrated for delivering precise and relevant search results. By bringing this capability to MariaDB, we aim to empower users with superior search accuracy, especially for large text datasets or intricate queries, positioning MariaDB as a more competitive option among databases and search solutions.
Why BM25 Matters
BM25 (Best Matching 25) stands out as an advanced ranking algorithm that refines relevance scoring beyond traditional methods like TF-IDF. It excels by:
- Accounting for document length differences to ensure equitable rankings.
- Balancing the influence of repeated terms for more natural results.
- Offering adjustable parameters to tailor its behavior.
Currently, MariaDB’s full-text search relies on a TF-IDF-based approach within the InnoDB engine. While functional, it doesn’t match BM25’s precision or flexibility. Adding BM25 would significantly boost relevance, making it a game-changer for applications like e-commerce, content platforms, and knowledge management systems. It will also enhance Hybrid Search use cases with dense embeddings.
Learning from Milvus
Milvus, an open-source vector database, has effectively integrated BM25 using sparse embeddings—compact text representations where most values are zero. This approach has proven efficient for storage and retrieval, even at scale. For MariaDB, tapping into this concept could pave the way for a faster and smoother integration of BM25.
High-Level Goals
- Boost search relevance: Introduce BM25 for top-tier result accuracy.
- Optimize with sparse embeddings: Achieve efficiency in storage and performance.
- Simplify usage: Blend the feature naturally into MariaDB’s existing framework.
- Support flexibility: Allow BM25 to enhance, not replace, current full text search search options.
Benefits
- Better results: Deliver more relevant outcomes across varied queries and document sizes.
- Market strength: Position MariaDB as a stronger full text search option.
- User control: Enable customization through BM25’s configurable settings.
- Performance: Leverage sparse embeddings for quick searches with minimal resource use.
Conclusion
Bringing BM25 into MariaDB via sparse embeddings can transform our full-text search capabilities. This enhancement will deliver sharper relevance, elevate MariaDB’s standing, and delight users with powerful, efficient search tools.
Additional References:
Milvus
Postgres Extension