Basically, get the BM25 results and normalize them to be between 0 and 1, then take the (potentially weighted) average of them and the cosine similarity results (already between 0 and 1) to get the final ranking.
This is going to be only marginally helpful as I don't have references but I think I implemented this in ElasticSearch.
You can do approximate KNN search with ES by adding a setting on the index that enables KNN and then creating mappings for your embedding objects that defines their vector length. Then index your data as you normally would plus embeddings.
Once you have those in place you can construct your query and include the embedding similarity in how the query gets scored. When a query is submitted you embed it and pass into your ES query the embedding as well as the original query. ES will combine all of these elements together to score the results.
TL;DR - doing hybrid vector + keyword search provides more relevant results for text searches than vector search alone. And using sparse vector embeddings for the “keyword” part provides even more relevant results than using BM25.
- any references for how this hybrid retrieval is done?