ractogateway.rag 

list[PageIndexResult]

Returns:

list[RetrievalResult] – Ranked results (rank 1 = most relevant).

async aretrieve(query, top_k=5, filters=None)[source]

Async variant of retrieve().

Return type:: list[RetrievalResult]

query(question, top_k=5, filters=None, prompt=None, temperature=0.0, max_tokens=2048)[source]

Retrieve relevant chunks and generate an answer.

Parameters:

question (str) – The user’s question.
top_k (int) – Number of context chunks to retrieve.
filters (dict[str, Any] | None) – Optional metadata filters for retrieval.
prompt (RactoPrompt | None) – Override the default RACTO prompt for generation.
temperature (float) – LLM temperature (default 0.0 for factual answers).
max_tokens (int) – Maximum tokens in the generated answer.

Return type:

RAGResponse

Returns:

RAGResponse – Contains the generated answer plus the retrieved source chunks.

Raises:

RuntimeError – If no llm_kit was provided.

async aquery(question, top_k=5, filters=None, prompt=None, temperature=0.0, max_tokens=2048)[source]

Async variant of query().

Return type:: RAGResponse

property store: BaseVectorStore: The underlying vector store.

property embedder: BaseEmbedder: The underlying embedder.

count()[source]

Return the total number of indexed chunks.

Return type:: int

clear()[source]

Remove all indexed chunks from the vector store.

Return type:: None

class ractogateway.rag.PageIndexRAG(llm_kit=None, *, processors=None, reader_registry=None, context_template="Use the following retrieved page excerpts to answer the user's question.\\nIf the excerpts do not contain enough information, say so clearly.\\n\\n--- CONTEXT ---\\n{context}\\n--- END CONTEXT ---\\n\\nQuestion: {question}", default_prompt=None, page_size=1000, page_overlap=100, k1=1.5, b=0.75, top_keywords=20, ocr_backend=None, ocr_fallback=True, min_ocr_confidence=0.0)[source]

Bases: object

Vectorless RAG pipeline that indexes documents at the page level.

Parameters:

llm_kit (Any) – Any RactoGateway developer kit (OpenAI, Anthropic, Google, Ollama, HuggingFace). Required only for query() / aquery(). Pass None to use the pipeline in retrieve-only mode.
processors (Sequence[BaseProcessor] | None) – Text processors applied to each page before indexing. Defaults to [TextCleaner()].
reader_registry (FileReaderRegistry | None) – File reader registry used to load non-PDF documents. Defaults to a FileReaderRegistry with all built-in readers registered.
context_template (str) – Jinja-style template with {context} and {question} placeholders used when building the LLM prompt.
default_prompt (RactoPrompt | None) – RactoPrompt used for generation. Defaults to a built-in factual Q&A prompt.
page_size (int) – Maximum character length of each page window for non-PDF files (default 1 000).
page_overlap (int) – Character overlap between consecutive windows (default 100).
k1 (float) – BM25 term-frequency saturation parameter (default 1.5).
b (float) – BM25 length-normalisation parameter (default 0.75).
top_keywords (int) – Number of top TF-weighted keywords to extract per page for the decision index (default 20).

retrieve(query, top_k=5)[source]

Retrieve the most relevant pages for query.

Uses two-stage retrieval: decision index (candidate selection) → BM25 scoring (ranking).

Parameters:

query (str) – Natural-language question or keyword string.
top_k (int) – Maximum number of results to return.

Return type:

Returns:

list[PageIndexResult] – Pages ranked by BM25 score (most relevant first).

async aretrieve(query, top_k=5)[source]

Async variant of retrieve().

Return type:: list[PageIndexResult]

ingest(path, **metadata)[source]

Read a file and add its pages to the index.

PDFs are split page-by-page; all other file types are split into fixed-size character windows.

Parameters:

path (str) – Absolute or relative path to the file.
**metadata (Any) – Arbitrary key/value pairs stored in PageEntry.extra.

Return type:

Returns:

list[PageEntry] – All page entries created from this file.

async aingest(path, **metadata)[source]

Async variant of ingest().

Return type:: list[PageEntry]

ingest_text(text, source='manual', **metadata)[source]

Index raw text directly (no file I/O).

Parameters:

text (str) – Plain text to index.
source (str) – Descriptive label stored in PageEntry.source.
**metadata (Any) – Arbitrary key/value pairs stored in PageEntry.extra.

Return type:

async aingest_text(text, source='manual', **metadata)[source]

Async variant of ingest_text().

Return type:: list[PageEntry]

ingest_dir(directory, pattern='**/*', *, on_progress=None, **metadata)[source]

Ingest all files matching pattern inside directory.

Files that cannot be read are logged and skipped; the rest are indexed normally.

Parameters:

directory (str) – Root directory to search.
pattern (str) – Glob pattern relative to directory (default "**/*").
on_progress (Callable[[int, int], None] | None) – Optional callback (done, total) -> None called after each file is processed (or skipped). Useful for progress bars.
**metadata (Any) – Forwarded to every ingest() call.

Return type:

async aingest_dir(directory, pattern='**/*', *, max_concurrent=4, on_progress=None, **metadata)[source]

Async parallel variant of ingest_dir().

Parameters:

directory (str) – Root directory to search.
pattern (str) – Glob pattern relative to directory (default "**/*").
max_concurrent (int) – Maximum number of files ingested concurrently (default 4).
on_progress (Callable[[int, int], None] | None) – Optional callback (done, total) -> None called after each file finishes (thread-safe; called from the event loop).
**metadata (Any) – Forwarded to every aingest() call.

Return type:

add_document(path, **metadata)[source]

Alias for ingest().

Return type:: list[PageEntry]

add_texts(texts, source='manual', **metadata)[source]

Ingest a list of text strings.

Return type:: list[PageEntry]

search(query, *, top_k=5, prompt=None, temperature=0.0, max_tokens=2048)[source]

Alias for query().

Return type:: PageIndexResponse

query(question, *, top_k=5, prompt=None, temperature=0.0, max_tokens=2048)[source]

Retrieve relevant pages and generate an answer with the LLM kit.

Parameters:

question (str) – Natural-language question to answer.
top_k (int) – Number of pages to retrieve.
prompt (RactoPrompt | None) – Override the kit’s default prompt for this call.
temperature (float) – Sampling temperature for generation.
max_tokens (int) – Maximum generation tokens.

Return type:

PageIndexResponse

Returns:

PageIndexResponse – Contains the generated answer, ranked sources, and the context string that was supplied to the model.

Raises:

ValueError – If no llm_kit was provided and generation is requested.

async aquery(question, *, top_k=5, prompt=None, temperature=0.0, max_tokens=2048)[source]

Async variant of query().

Return type:: PageIndexResponse

remove_document(doc_id)[source]

Remove all pages belonging to doc_id from the index.

Parameters:: doc_id (str) – The doc_id value from any PageEntry returned during ingestion.
Return type:: int
Returns:: int – Number of page entries removed.

clear()[source]

Remove all indexed entries and reset the pipeline to empty state.

Return type:: None

save(path)[source]

Serialise the full index to a JSON file.

The saved file contains all PageEntry records, BM25 term weights, and deduplication hashes. Reload with load().

Parameters:: path (str) – Destination file path (will be created or overwritten).
Return type:: None

classmethod load(path, **kwargs)[source]

Load a previously saved index from path.

Parameters:

path (str) – JSON file written by save().
**kwargs (Any) – Forwarded to the constructor (e.g. llm_kit=kit).

Return type:

PageIndexRAG

Returns:

PageIndexRAG – A new instance with the index fully restored.

property entry_count: int: Total number of indexed page entries.

property document_count: int: Number of distinct documents ingested.

class ractogateway.rag.PageEntry(**data)[source]

Bases: BaseModel

A single page (or fixed-size window) extracted from a document.

Produced by PageIndexRAG during ingestion and stored in the in-process index.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

entry_id: str

page_number: int | None

content: str

source: str

section_title: str | None

keywords: list[str]

doc_id: str

char_count: int

extra: dict[str, Any]

ocr_applied: bool

ocr_confidence: float | None

content_hash: str | None

property text: str: Alias for content.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.PageIndexResult(**data)[source]

Bases: BaseModel

A single retrieved page together with its BM25 relevance score.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

entry: PageEntry

score: float

rank: int

matched_terms: list[str]

property content: str: Alias for entry.content.

property text: str: Alias for entry.content.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.PageIndexResponse(**data)[source]

Bases: BaseModel

Full response from PageIndexRAG.query() / PageIndexRAG.aquery().

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

answer: LLMResponse | None

sources: list[PageIndexResult]

query: str

context_used: str

property results: list[PageIndexResult]: Alias for sources.

property pages: list[PageIndexResult]: Alias for sources.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.Chunk(**data)[source]

Bases: BaseModel

A single embeddable slice of a document.

Produced by a BaseChunker and enriched with an embedding vector by a BaseEmbedder.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

chunk_id: str

doc_id: str

content: str

embedding: list[float] | None

metadata: ChunkMetadata

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.ChunkMetadata(**data)[source]

Bases: BaseModel

Provenance and positional data attached to every chunk.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

source: str

page: int | None

chunk_index: int

total_chunks: int

start_char: int

end_char: int

doc_id: str

extra: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.Document(**data)[source]

Bases: BaseModel

A raw document loaded from a file or supplied as plain text.

Parameters:

content (str) – The full extracted text of the document.
source (str) – Absolute file path, URL, or a descriptive label (e.g. "manual").
metadata (dict[str, Any]) – Free-form key/value pairs (file size, author, MIME type, …).
doc_id (str) – Auto-generated UUID; override only when you need stable IDs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

doc_id: str

content: str

source: str

metadata: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.RAGResponse(**data)[source]

Bases: BaseModel

Combined output from a RAG query (retrieval + generation).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

answer: LLMResponse

sources: list[RetrievalResult]

query: str

context_used: str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.RetrievalConfig(**data)[source]

Bases: BaseModel

Input parameters for a vector-store search.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

query: str

top_k: int

filters: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.RetrievalResult(**data)[source]

Bases: BaseModel

A single retrieved chunk together with its relevance score.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

chunk: Chunk

score: float

rank: int

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag.FixedChunker(chunk_size=512, overlap=50)[source]

Bases: BaseChunker

Split text into fixed-size character windows with overlap.

Parameters:

chunk_size (int) – Maximum number of characters per chunk.
overlap (int) – Number of characters to repeat at the start of the next chunk. Must be less than chunk_size.

chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

class ractogateway.rag.RecursiveChunker(chunk_size=512, overlap=50, separators=None)[source]

Bases: BaseChunker

Split text recursively using a priority list of separators.

Parameters:

chunk_size (int) – Maximum number of characters per chunk.
overlap (int) – Number of characters of overlap between consecutive chunks.
separators (list[str] | None) – Ordered list of separator strings to try. The first separator that produces pieces within chunk_size is used.

chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

class ractogateway.rag.SemanticChunker(embedder, threshold=0.5, min_chunk_size=2, language='english')[source]

Bases: BaseChunker

Split documents where the semantic similarity between adjacent sentences drops below a threshold.

Parameters:

embedder (BaseEmbedder) – Any BaseEmbedder instance.
threshold (float) – Cosine similarity below which a split is inserted (default: 0.5).
min_chunk_size (int) – Minimum number of sentences per chunk (prevents ultra-fine splits).
language (str) – NLTK sentence tokenizer language.

chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

class ractogateway.rag.SentenceChunker(sentences_per_chunk=5, overlap_sentences=1, language='english')[source]

Bases: BaseChunker

Split text into groups of sentences using NLTK.

Parameters:

sentences_per_chunk (int) – Number of sentences per chunk.
overlap_sentences (int) – Number of sentences to repeat at the start of the next chunk.
language (str) – Language for the NLTK sentence tokenizer (default: "english").

chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

class ractogateway.rag.GoogleEmbedder(model='text-embedding-004', *, api_key=None, task_type=None, batch_size=100)[source]

Bases: BaseEmbedder

Embed texts using the Google Gemini Embeddings API.

Parameters:

model (str) – Gemini embedding model (default "text-embedding-004").
api_key (str | None) – Gemini API key. Falls back to GEMINI_API_KEY env var.
task_type (str | None) – Gemini task type hint (e.g. "RETRIEVAL_DOCUMENT", "RETRIEVAL_QUERY"). None lets the API decide.
batch_size (int) – Maximum number of texts per API call.

property dimension: int

Dimensionality of the embedding vectors.

Returns -1 if not known until after the first call.

embed(texts)[source]

Embed texts synchronously.

Parameters:: texts (list[str]) – Non-empty list of strings to embed.
Return type:: list[list[float]]
Returns:: list[list[float]] – One embedding vector per input text, in the same order.

async aembed(texts)[source]

Async variant of embed().

Return type:: list[list[float]]

class ractogateway.rag.OpenAIEmbedder(model='text-embedding-3-small', *, api_key=None, base_url=None, dimensions=None, batch_size=256)[source]

Bases: BaseEmbedder

Embed texts using the OpenAI Embeddings API.

Parameters:

model (str) – OpenAI embedding model (default "text-embedding-3-small").
api_key (str | None) – OpenAI API key. Falls back to OPENAI_API_KEY env var.
base_url (str | None) – Custom base URL (Azure OpenAI or proxy).
dimensions (int | None) – Override output dimensionality (supported for text-embedding-3-*).
batch_size (int) – Maximum number of texts per API call.

property dimension: int

Dimensionality of the embedding vectors.

Returns -1 if not known until after the first call.

embed(texts)[source]

Embed texts synchronously.

Parameters:: texts (list[str]) – Non-empty list of strings to embed.
Return type:: list[list[float]]
Returns:: list[list[float]] – One embedding vector per input text, in the same order.

async aembed(texts)[source]

Async variant of embed().

Return type:: list[list[float]]

class ractogateway.rag.VoyageEmbedder(model='voyage-3', *, api_key=None, input_type='document', batch_size=128)[source]

Bases: BaseEmbedder

Embed texts using the Voyage AI API.

Voyage AI embeddings are optimised for Anthropic Claude RAG pipelines and are the recommended choice when using Claude as the generation LLM.

Parameters:

model (str) – Voyage model name (default "voyage-3").
api_key (str | None) – Voyage API key. Falls back to VOYAGE_API_KEY env var.
input_type (str | None) – "query" for queries, "document" for documents to index. Using the correct type improves retrieval quality.
batch_size (int) – Maximum texts per API call.

property dimension: int

Dimensionality of the embedding vectors.

Returns -1 if not known until after the first call.

embed(texts)[source]

Embed texts synchronously.

Parameters:: texts (list[str]) – Non-empty list of strings to embed.
Return type:: list[list[float]]
Returns:: list[list[float]] – One embedding vector per input text, in the same order.

async aembed(texts)[source]

Async variant of embed().

Return type:: list[list[float]]

class ractogateway.rag.Lemmatizer(use_pos_tagging=True)[source]

Bases: BaseProcessor

Reduce words to their base (lemma) form using NLTK WordNet.

Parameters:: use_pos_tagging (bool) – If True, use POS tagging to improve lemmatization accuracy. Slightly slower but produces better results.

process(text)[source]

Process text and return the transformed string.

Parameters:: text (str) – Input text (chunk content or raw document content).
Return type:: str
Returns:: str – Processed text. Must be a non-empty string when input is non-empty.

class ractogateway.rag.ProcessingPipeline(processors)[source]

Bases: BaseProcessor

Apply a sequence of BaseProcessor objects to text.

Example:

pipeline = ProcessingPipeline([TextCleaner(), Lemmatizer()])
processed = pipeline.process("  Hello,   worlds!  ")

Parameters:: processors (list[BaseProcessor]) – Ordered list of processors to apply. Each processor receives the output of the previous one.

process(text)[source]

Process text and return the transformed string.

Parameters:: text (str) – Input text (chunk content or raw document content).
Return type:: str
Returns:: str – Processed text. Must be a non-empty string when input is non-empty.

class ractogateway.rag.TextCleaner(normalize_unicode=True, strip_html=True, strip_control_chars=True, collapse_whitespace=True, collapse_blank_lines=True)[source]

Bases: BaseProcessor

Normalise text for embedding and retrieval.

Steps applied (all optional via constructor flags):

Unicode normalisation (NFC)
Strip residual HTML tags
Remove control characters
Collapse multiple spaces to one
Collapse runs of blank lines to at most two newlines
Strip leading/trailing whitespace

Parameters:

normalize_unicode (bool) – Apply unicodedata.normalize("NFC", text).
strip_html (bool) – Remove <tag> patterns.
strip_control_chars (bool) – Remove non-printable control characters.
collapse_whitespace (bool) – Collapse sequences of spaces/tabs to a single space.
collapse_blank_lines (bool) – Collapse 3+ consecutive newlines to 2.

process(text)[source]

Process text and return the transformed string.

Parameters:: text (str) – Input text (chunk content or raw document content).
Return type:: str
Returns:: str – Processed text. Must be a non-empty string when input is non-empty.

class ractogateway.rag.FileReaderRegistry(readers=None)[source]

Bases: object

Registry that maps file extensions to BaseReader instances.

By default all built-in readers are registered. You can add custom readers with register().

Example:

registry = FileReaderRegistry()
doc = registry.read("report.pdf")

register(reader)[source]

Add reader to the registry for all its supported extensions.

Return type:: None

get_reader(path)[source]

Return the reader for path’s extension.

Raises:: ValueError – If no reader supports the file’s extension.
Return type:: BaseReader

read(path)[source]

Convenience method: detect reader and return a Document.

Return type:: Document

property supported_extensions: frozenset[str]: All extensions currently registered.

class ractogateway.rag.ChromaStore(collection='ractogateway', *, path=None, host=None, port=8000, distance_function='cosine')[source]

Vector store backed by ChromaDB.

Supports both in-process (path or None for ephemeral) and HTTP-client modes (host + port).

Parameters:

collection (str) – Name of the ChromaDB collection.
path (str | None) – Persist directory for a local persistent client. None = ephemeral.
host (str | None) – ChromaDB server host (enables HTTP client mode).
port (int) – ChromaDB server port (default 8000).
distance_function (str) – "cosine", "l2", or "ip" (inner product).

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

class ractogateway.rag.FAISSStore(dimension=None, index_type='flat_ip')[source]

Vector store backed by Facebook AI Similarity Search (FAISS).

Stores embeddings in a flat L2 or cosine (Inner Product) index. All data is in-memory; call save() / load() to persist.

Parameters:

dimension (int | None) – Embedding dimension. Inferred from the first add() call if None.
index_type (str) – "flat_l2" or "flat_ip" (inner product / cosine when normalised).

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

save(path)[source]

Persist the FAISS index to path.index and chunks to path.chunks.

Return type:: None

load(path)[source]

Load a previously saved index from path.

Return type:: None

class ractogateway.rag.InMemoryVectorStore(similarity='cosine')[source]

Pure-Python brute-force vector store — no extra dependencies.

This store keeps all chunks and their embeddings in memory. It is not suitable for production-scale corpora but requires no installation.

Parameters:: similarity (str) – Similarity function to use. Currently only "cosine" is supported.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

class ractogateway.rag.MilvusStore(collection='ractogateway', *, host='localhost', port=19530, uri=None, token=None, dimension=None, metric_type='IP', batch_size=100)[source]

Vector store backed by Milvus or Zilliz Cloud.

Parameters:

collection (str) – Milvus collection name.
host (str) – Milvus server host (default "localhost").
port (int) – Milvus server port (default 19530).
uri (str | None) – Zilliz Cloud URI (overrides host/port when set).
token (str | None) – Zilliz Cloud API token.
dimension (int | None) – Embedding dimension. Inferred on first add.
metric_type (str) – "IP" (inner product / cosine) or "L2".
batch_size (int) – Vectors per insert batch.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

class ractogateway.rag.PGVectorStore(dsn, *, table='rag_chunks', dimension=None, distance='cosine', batch_size=100)[source]

Vector store backed by PostgreSQL with the pgvector extension.

Parameters:

dsn (str) – PostgreSQL connection string (e.g. "postgresql://user:pass@localhost/mydb").
table (str) – Table name (default "rag_chunks").
dimension (int | None) – Embedding dimension. Inferred on first add.
distance (str) – "cosine", "l2", or "inner".
batch_size (int) – Rows per INSERT batch.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

class ractogateway.rag.PineconeStore(index_name, *, api_key=None, namespace='', batch_size=100)[source]

Vector store backed by Pinecone cloud.

Parameters:

index_name (str) – Name of the Pinecone index (must already exist).
api_key (str | None) – Pinecone API key. Falls back to PINECONE_API_KEY env var.
namespace (str) – Pinecone namespace for logical data isolation.
environment – Deprecated Pinecone environment string (for legacy pod-based indexes).
batch_size (int) – Number of vectors per upsert batch.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

class ractogateway.rag.QdrantStore(collection='ractogateway', *, url=None, api_key=None, distance='cosine', dimension=None, batch_size=100)[source]

Vector store backed by Qdrant.

Parameters:

collection (str) – Qdrant collection name.
url (str | None) – Qdrant server URL. None = in-memory.
api_key (str | None) – Qdrant cloud API key (optional).
distance (str) – "cosine", "euclid", or "dot".
dimension (int | None) – Vector dimension. Inferred on first add if None.
batch_size (int) – Points per upsert batch.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

class ractogateway.rag.WeaviateStore(class_name='RactoChunk', *, url=None, api_key=None, additional_headers=None, distance_metric='cosine', batch_size=100)[source]

Vector store backed by Weaviate.

Supports embedded (local, no server needed), local server, and Weaviate Cloud (WCS) connections.

Parameters:

class_name (str) – Weaviate class (collection) name.
url (str | None) – Weaviate server URL. None = use embedded Weaviate.
api_key (str | None) – Weaviate Cloud API key.
additional_headers (dict[str, str] | None) – Extra HTTP headers (e.g. for OpenAI API key pass-through to Weaviate).
distance_metric (str) – "cosine" or "l2-squared".
batch_size (int) – Objects per batch import.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type: