ractogateway.rag.page_index._bm25
Pure-Python BM25 index and decision-tree inverted index.
No external dependencies required — everything is implemented with the Python standard library.
Two components work together for two-stage retrieval:
_DecisionIndex— an inverted keyword index that maps content terms to page entry IDs. Given a tokenised query it returns the union of candidate entry IDs in O(|query terms|) time. This is the “decision tree” routing layer.BM25Index— Okapi BM25 (k1=1.5, b=0.75) that scores the candidates returned by the decision index. Only candidates are scored, so the full corpus is never re-ranked on every query.
- ractogateway.rag.page_index._bm25.extract_keywords(text, top_n=20)[source]
Return the top-n most frequent content tokens from text.
- class ractogateway.rag.page_index._bm25.BM25Index(k1=1.5, b=0.75)[source]
Bases:
objectOkapi BM25 scorer over a corpus of
PageEntrytexts.- Parameters:
- score(query, candidate_ids=None)[source]
Score candidates against query and return ranked results.
- Parameters:
- Return type:
- Returns:
list of (entry_id, bm25_score, matched_terms) – Sorted descending by score, ties broken by entry_id for stability.
- property entry_count: int