RAG Pipeline

Pipeline

class ractogateway.rag.pipeline.RactoRAG(vector_store, embedder, *, chunker=None, processors=None, llm_kit=None, context_template=None, reader_registry=None, default_prompt=None)[source]

Bases: object

Production-grade RAG pipeline for RactoGateway.

Parameters:

vector_store (BaseVectorStore) – Any BaseVectorStore instance.
embedder (BaseEmbedder) – Any BaseEmbedder instance.
chunker (BaseChunker | None) – How to split documents. Defaults to RecursiveChunker with chunk_size=512, overlap=50.
processors (list[BaseProcessor] | None) – List of text processors applied to each chunk before embedding. Defaults to [TextCleaner()].
llm_kit (Any | None) – Any developer kit (OpenAIDeveloperKit, GoogleDeveloperKit, or AnthropicDeveloperKit). Required for query() / aquery().
context_template (str | None) – Template string for injecting retrieved context into the LLM prompt. Must contain {context} and {question} placeholders.
reader_registry (FileReaderRegistry | None) – Custom FileReaderRegistry. Defaults to a registry with all built-in readers.
default_prompt (RactoPrompt | None) – RACTO prompt used for generation. Falls back to a built-in RAG prompt.

async aingest(path, **metadata)[source]

Async variant of ingest().

Return type:: list[Chunk]

async aingest_dir(directory, pattern='**/*', **metadata)[source]

Async variant of ingest_dir().

Return type:: list[Chunk]

async aingest_text(text, source='manual', **metadata)[source]

Async variant of ingest_text().

Return type:: list[Chunk]

async aquery(question, top_k=5, filters=None, prompt=None, temperature=0.0, max_tokens=2048)[source]

Async variant of query().

Return type:: RAGResponse

async aretrieve(query, top_k=5, filters=None)[source]

Async variant of retrieve().

Return type:: list[RetrievalResult]

clear()[source]

Remove all indexed chunks from the vector store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

property embedder: BaseEmbedder: The underlying embedder.

ingest(path, **metadata)[source]

Read, chunk, embed, and store a single file.

Parameters:

path (str | Path) – Path to the file to ingest.
**metadata (Any) – Extra metadata merged into each chunk’s ChunkMetadata.extra.

Return type:

list[Chunk]

Returns:

list[Chunk] – The chunks that were added to the vector store.

ingest_dir(directory, pattern='**/*', **metadata)[source]

Recursively ingest all supported files in a directory.

Parameters:

directory (str | Path) – Root directory to scan.
pattern (str) – Glob pattern relative to directory.
**metadata (Any) – Extra metadata merged into every chunk.

Return type:

list[Chunk]

Returns:

list[Chunk] – All chunks added across all ingested files.

ingest_text(text, source='manual', **metadata)[source]

Ingest a raw text string directly (no file needed).

Parameters:

text (str) – The text content to ingest.
source (str) – A label identifying the source (stored in metadata).
**metadata (Any) – Extra metadata merged into each chunk.

Return type:

list[Chunk]

query(question, top_k=5, filters=None, prompt=None, temperature=0.0, max_tokens=2048)[source]

Retrieve relevant chunks and generate an answer.

Parameters:

question (str) – The user’s question.
top_k (int) – Number of context chunks to retrieve.
filters (dict[str, Any] | None) – Optional metadata filters for retrieval.
prompt (RactoPrompt | None) – Override the default RACTO prompt for generation.
temperature (float) – LLM temperature (default 0.0 for factual answers).
max_tokens (int) – Maximum tokens in the generated answer.

Return type:

RAGResponse

Returns:

RAGResponse – Contains the generated answer plus the retrieved source chunks.

Raises:

RuntimeError – If no llm_kit was provided.

retrieve(query, top_k=5, filters=None)[source]

Embed query and retrieve the top-k most relevant chunks.

Parameters:

query (str) – Natural-language question or search phrase.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked results (rank 1 = most relevant).

property store: BaseVectorStore: The underlying vector store.

Models

Core document and chunk models for RAG.

Every piece of content in the RAG pipeline is represented as a Document (raw, as loaded from a file) or a Chunk (a processed, embeddable slice of a document). Both are strict Pydantic models with no unvalidated fields.

class ractogateway.rag._models.document.Chunk(**data)[source]

Bases: BaseModel

A single embeddable slice of a document.

Produced by a BaseChunker and enriched with an embedding vector by a BaseEmbedder.

chunk_id: str

content: str

doc_id: str

embedding: list[float] | None

metadata: ChunkMetadata

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag._models.document.ChunkMetadata(**data)[source]

Bases: BaseModel

Provenance and positional data attached to every chunk.

chunk_index: int

doc_id: str

end_char: int

extra: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

page: int | None

source: str

start_char: int

total_chunks: int

class ractogateway.rag._models.document.Document(**data)[source]

Bases: BaseModel

A raw document loaded from a file or supplied as plain text.

Parameters:

content (str) – The full extracted text of the document.
source (str) – Absolute file path, URL, or a descriptive label (e.g. "manual").
metadata (dict[str, Any]) – Free-form key/value pairs (file size, author, MIME type, …).
doc_id (str) – Auto-generated UUID; override only when you need stable IDs.

content: str

doc_id: str

metadata: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

source: str

Retrieval and RAG response models.

class ractogateway.rag._models.retrieval.RAGResponse(**data)[source]

Bases: BaseModel

Combined output from a RAG query (retrieval + generation).

answer: LLMResponse

context_used: str

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str

sources: list[RetrievalResult]

class ractogateway.rag._models.retrieval.RetrievalConfig(**data)[source]

Bases: BaseModel

Input parameters for a vector-store search.

filters: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str

top_k: int

class ractogateway.rag._models.retrieval.RetrievalResult(**data)[source]

Bases: BaseModel

A single retrieved chunk together with its relevance score.

chunk: Chunk

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

rank: int

score: float

Readers

Abstract base class for all file readers.

class ractogateway.rag.readers.base.BaseReader[source]

Bases: ABC

Read a file from disk and return a Document.

Concrete subclasses implement read() and declare which file extensions they handle via supported_extensions.

abstractmethod read(path)[source]

Load path and return its text content as a Document.

Parameters:: path (Path) – Absolute path to the file to read.
Return type:: Document
Returns:: Document – A Document whose source is set to str(path) and whose content is the full extracted text.

abstract property supported_extensions: frozenset[str]: Lower-case extensions (with dot) this reader handles, e.g. {".pdf"}.

FileReaderRegistry — auto-detects the right reader for any file extension.

class ractogateway.rag.readers.registry.FileReaderRegistry(readers=None)[source]

Bases: object

Registry that maps file extensions to BaseReader instances.

By default all built-in readers are registered. You can add custom readers with register().

Example:

registry = FileReaderRegistry()
doc = registry.read("report.pdf")

get_reader(path)[source]

Return the reader for path’s extension.

Raises:: ValueError – If no reader supports the file’s extension.
Return type:: BaseReader

read(path)[source]

Convenience method: detect reader and return a Document.

Return type:: Document

register(reader)[source]

Add reader to the registry for all its supported extensions.

Return type:: None

property supported_extensions: frozenset[str]: All extensions currently registered.

Plain-text reader — handles .txt, .md, .rst, .log and similar files.

class ractogateway.rag.readers.text_reader.TextReader(encoding='utf-8')[source]

Bases: BaseReader

Read any UTF-8 (or latin-1 fallback) plain-text file.

No external dependencies required.

Parameters:: encoding (str) – Primary encoding to try. Falls back to "latin-1" on error.

read(path)[source]

Load path and return its text content as a Document.

Parameters:: path (Path) – Absolute path to the file to read.
Return type:: Document
Returns:: Document – A Document whose source is set to str(path) and whose content is the full extracted text.

property supported_extensions: frozenset[str]: Lower-case extensions (with dot) this reader handles, e.g. {".pdf"}.

PDF reader — uses pypdf (lazy import).

Install with: pip install ractogateway[rag-pdf]

class ractogateway.rag.readers.pdf_reader.PdfReader(extract_images=False)[source]

Bases: BaseReader

Extract text from PDF files using pypdf.

Parameters:: extract_images (bool) – Reserved for future use — image extraction is not yet supported.

read(path)[source]

Load path and return its text content as a Document.

Parameters:: path (Path) – Absolute path to the file to read.
Return type:: Document
Returns:: Document – A Document whose source is set to str(path) and whose content is the full extracted text.

property supported_extensions: frozenset[str]: Lower-case extensions (with dot) this reader handles, e.g. {".pdf"}.

Word document reader — uses python-docx (lazy import).

Install with: pip install ractogateway[rag-word]

class ractogateway.rag.readers.word_reader.WordReader[source]

Bases: BaseReader

Extract text from Microsoft Word (.docx) files using python-docx.

read(path)[source]

Load path and return its text content as a Document.

Parameters:: path (Path) – Absolute path to the file to read.
Return type:: Document
Returns:: Document – A Document whose source is set to str(path) and whose content is the full extracted text.

property supported_extensions: frozenset[str]: Lower-case extensions (with dot) this reader handles, e.g. {".pdf"}.

Spreadsheet reader — handles CSV (stdlib) and XLSX (openpyxl, lazy).

Install xlsx support with: pip install ractogateway[rag-excel]

class ractogateway.rag.readers.spreadsheet_reader.SpreadsheetReader(max_rows=None, include_header=True)[source]

Bases: BaseReader

Read CSV and Excel spreadsheets into plain text.

Each row is rendered as a tab-separated line; an optional header row is prepended. Multiple sheets in an XLSX workbook are separated by a --- Sheet: <name> --- divider.

Parameters:

max_rows (int | None) – Maximum number of rows to read per sheet (None = all).
include_header (bool) – Whether to repeat the header row at the start of each sheet section.

read(path)[source]

Load path and return its text content as a Document.

Parameters:: path (Path) – Absolute path to the file to read.
Return type:: Document
Returns:: Document – A Document whose source is set to str(path) and whose content is the full extracted text.

property supported_extensions: frozenset[str]: Lower-case extensions (with dot) this reader handles, e.g. {".pdf"}.

Image reader — uses Pillow (lazy import) to extract metadata.

Images are represented as a textual description of their EXIF/metadata, plus an optional prompt to an LLM for visual description. Pixel data is not stored in the Document; use RactoFile for multimodal vision calls.

Install with: pip install ractogateway[rag-image]

class ractogateway.rag.readers.image_reader.ImageReader(include_exif=True)[source]

Bases: BaseReader

Extract metadata from image files and represent them as text Documents.

The resulting Document.content is a human-readable summary of image properties (size, mode, format, EXIF tags). Pass the image to a vision LLM separately using RactoFile for actual visual understanding.

Parameters:: include_exif (bool) – Whether to extract and include EXIF metadata in the content.

read(path)[source]

Load path and return its text content as a Document.

Parameters:: path (Path) – Absolute path to the file to read.
Return type:: Document
Returns:: Document – A Document whose source is set to str(path) and whose content is the full extracted text.

property supported_extensions: frozenset[str]: Lower-case extensions (with dot) this reader handles, e.g. {".pdf"}.

HTML reader — uses stdlib html.parser (no extra deps).

class ractogateway.rag.readers.html_reader.HtmlReader[source]

Bases: BaseReader

Extract visible text from HTML files using the stdlib HTML parser.

No external dependencies required.

read(path)[source]

Load path and return its text content as a Document.

Parameters:: path (Path) – Absolute path to the file to read.
Return type:: Document
Returns:: Document – A Document whose source is set to str(path) and whose content is the full extracted text.

property supported_extensions: frozenset[str]: Lower-case extensions (with dot) this reader handles, e.g. {".pdf"}.

Chunkers

Abstract base class for text chunkers.

class ractogateway.rag.chunkers.base.BaseChunker[source]

Bases: ABC

Split a Document into a list of Chunk objects.

Each chunk preserves provenance (doc_id, chunk_index, start_char, end_char) in its ChunkMetadata.

abstractmethod chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

Fixed-size character chunker with configurable overlap.

class ractogateway.rag.chunkers.fixed_chunker.FixedChunker(chunk_size=512, overlap=50)[source]

Bases: BaseChunker

Split text into fixed-size character windows with overlap.

Parameters:

chunk_size (int) – Maximum number of characters per chunk.
overlap (int) – Number of characters to repeat at the start of the next chunk. Must be less than chunk_size.

chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

Recursive character text splitter (LangChain-style).

Tries progressively finer separators ("\n\n", "\n", ". ", " " and finally character-by-character) until every piece fits within chunk_size.

class ractogateway.rag.chunkers.recursive_chunker.RecursiveChunker(chunk_size=512, overlap=50, separators=None)[source]

Bases: BaseChunker

Split text recursively using a priority list of separators.

Parameters:

chunk_size (int) – Maximum number of characters per chunk.
overlap (int) – Number of characters of overlap between consecutive chunks.
separators (list[str] | None) – Ordered list of separator strings to try. The first separator that produces pieces within chunk_size is used.

chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

Sentence-aware chunker — uses NLTK sent_tokenize (lazy import).

Install with: pip install ractogateway[rag-nlp]

class ractogateway.rag.chunkers.sentence_chunker.SentenceChunker(sentences_per_chunk=5, overlap_sentences=1, language='english')[source]

Bases: BaseChunker

Split text into groups of sentences using NLTK.

Parameters:

sentences_per_chunk (int) – Number of sentences per chunk.
overlap_sentences (int) – Number of sentences to repeat at the start of the next chunk.
language (str) – Language for the NLTK sentence tokenizer (default: "english").

chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

Semantic chunker — splits at embedding-space boundaries.

Uses cosine similarity between adjacent sentence embeddings to detect topic shifts. Requires an BaseEmbedder and NLTK sent_tokenize.

Install with: pip install ractogateway[rag-nlp]

class ractogateway.rag.chunkers.semantic_chunker.SemanticChunker(embedder, threshold=0.5, min_chunk_size=2, language='english')[source]

Bases: BaseChunker

Split documents where the semantic similarity between adjacent sentences drops below a threshold.

Parameters:

embedder (BaseEmbedder) – Any BaseEmbedder instance.
threshold (float) – Cosine similarity below which a split is inserted (default: 0.5).
min_chunk_size (int) – Minimum number of sentences per chunk (prevents ultra-fine splits).
language (str) – NLTK sentence tokenizer language.

chunk(document)[source]

Split document into chunks.

Parameters:: document (Document) – The fully-loaded document to split.
Return type:: list[Chunk]
Returns:: list[Chunk] – Ordered list of non-overlapping (or slightly overlapping) chunks.

Processors

Abstract base class for text processors.

class ractogateway.rag.processors.base.BaseProcessor[source]

Bases: ABC

Transform a text string and return the processed result.

Processors are applied to chunk content before embedding. They can normalise whitespace, lemmatize tokens, remove stop words, etc.

Chain multiple processors with ProcessingPipeline.

abstractmethod process(text)[source]

Process text and return the transformed string.

Parameters:: text (str) – Input text (chunk content or raw document content).
Return type:: str
Returns:: str – Processed text. Must be a non-empty string when input is non-empty.

Text cleaning processor — no extra dependencies.

class ractogateway.rag.processors.cleaner.TextCleaner(normalize_unicode=True, strip_html=True, strip_control_chars=True, collapse_whitespace=True, collapse_blank_lines=True)[source]

Bases: BaseProcessor

Normalise text for embedding and retrieval.

Steps applied (all optional via constructor flags):

Unicode normalisation (NFC)
Strip residual HTML tags
Remove control characters
Collapse multiple spaces to one
Collapse runs of blank lines to at most two newlines
Strip leading/trailing whitespace

Parameters:

normalize_unicode (bool) – Apply unicodedata.normalize("NFC", text).
strip_html (bool) – Remove <tag> patterns.
strip_control_chars (bool) – Remove non-printable control characters.
collapse_whitespace (bool) – Collapse sequences of spaces/tabs to a single space.
collapse_blank_lines (bool) – Collapse 3+ consecutive newlines to 2.

process(text)[source]

Process text and return the transformed string.

Parameters:: text (str) – Input text (chunk content or raw document content).
Return type:: str
Returns:: str – Processed text. Must be a non-empty string when input is non-empty.

Lemmatization processor — uses NLTK WordNetLemmatizer (lazy import).

Install with: pip install ractogateway[rag-nlp]

Note: Lemmatization changes the surface form of text and can degrade embedding quality for neural models (which were trained on unmodified text). Use this processor only when building keyword-index pipelines or when explicitly required for your retrieval strategy.

class ractogateway.rag.processors.lemmatizer.Lemmatizer(use_pos_tagging=True)[source]

Bases: BaseProcessor

Reduce words to their base (lemma) form using NLTK WordNet.

Parameters:: use_pos_tagging (bool) – If True, use POS tagging to improve lemmatization accuracy. Slightly slower but produces better results.

process(text)[source]

Process text and return the transformed string.

Parameters:: text (str) – Input text (chunk content or raw document content).
Return type:: str
Returns:: str – Processed text. Must be a non-empty string when input is non-empty.

ProcessingPipeline — chain multiple processors sequentially.

class ractogateway.rag.processors.pipeline.ProcessingPipeline(processors)[source]

Bases: BaseProcessor

Apply a sequence of BaseProcessor objects to text.

Example:

pipeline = ProcessingPipeline([TextCleaner(), Lemmatizer()])
processed = pipeline.process("  Hello,   worlds!  ")

Parameters:: processors (list[BaseProcessor]) – Ordered list of processors to apply. Each processor receives the output of the previous one.

process(text)[source]

Process text and return the transformed string.

Parameters:: text (str) – Input text (chunk content or raw document content).
Return type:: str
Returns:: str – Processed text. Must be a non-empty string when input is non-empty.

Embedders

Abstract base class for embedding providers.

class ractogateway.rag.embedders.base.BaseEmbedder[source]

Bases: ABC

Embed a list of texts into dense float vectors.

All embedders implement both sync embed() and async aembed() variants. The dimension of returned vectors is declared via the dimension property (-1 if unknown until the first call).

abstractmethod async aembed(texts)[source]

Async variant of embed().

Return type:: list[list[float]]

property dimension: int

Dimensionality of the embedding vectors.

Returns -1 if not known until after the first call.

abstractmethod embed(texts)[source]

Embed texts synchronously.

Parameters:: texts (list[str]) – Non-empty list of strings to embed.
Return type:: list[list[float]]
Returns:: list[list[float]] – One embedding vector per input text, in the same order.

OpenAI embedding provider.

Install with: pip install ractogateway[openai]

class ractogateway.rag.embedders.openai_embedder.OpenAIEmbedder(model='text-embedding-3-small', *, api_key=None, base_url=None, dimensions=None, batch_size=256)[source]

Bases: BaseEmbedder

Embed texts using the OpenAI Embeddings API.

Parameters:

model (str) – OpenAI embedding model (default "text-embedding-3-small").
api_key (str | None) – OpenAI API key. Falls back to OPENAI_API_KEY env var.
base_url (str | None) – Custom base URL (Azure OpenAI or proxy).
dimensions (int | None) – Override output dimensionality (supported for text-embedding-3-*).
batch_size (int) – Maximum number of texts per API call.

async aembed(texts)[source]

Async variant of embed().

Return type:: list[list[float]]

property dimension: int

Dimensionality of the embedding vectors.

Returns -1 if not known until after the first call.

embed(texts)[source]

Embed texts synchronously.

Parameters:: texts (list[str]) – Non-empty list of strings to embed.
Return type:: list[list[float]]
Returns:: list[list[float]] – One embedding vector per input text, in the same order.

Google Gemini embedding provider.

Install with: pip install ractogateway[google]

class ractogateway.rag.embedders.google_embedder.GoogleEmbedder(model='text-embedding-004', *, api_key=None, task_type=None, batch_size=100)[source]

Bases: BaseEmbedder

Embed texts using the Google Gemini Embeddings API.

Parameters:

model (str) – Gemini embedding model (default "text-embedding-004").
api_key (str | None) – Gemini API key. Falls back to GEMINI_API_KEY env var.
task_type (str | None) – Gemini task type hint (e.g. "RETRIEVAL_DOCUMENT", "RETRIEVAL_QUERY"). None lets the API decide.
batch_size (int) – Maximum number of texts per API call.

async aembed(texts)[source]

Async variant of embed().

Return type:: list[list[float]]

property dimension: int

Dimensionality of the embedding vectors.

Returns -1 if not known until after the first call.

embed(texts)[source]

Embed texts synchronously.

Parameters:: texts (list[str]) – Non-empty list of strings to embed.
Return type:: list[list[float]]
Returns:: list[list[float]] – One embedding vector per input text, in the same order.

Voyage AI embedding provider (Anthropic-aligned, best for Claude RAG).

Install with: pip install ractogateway[rag-voyage]

class ractogateway.rag.embedders.voyage_embedder.VoyageEmbedder(model='voyage-3', *, api_key=None, input_type='document', batch_size=128)[source]

Bases: BaseEmbedder

Embed texts using the Voyage AI API.

Voyage AI embeddings are optimised for Anthropic Claude RAG pipelines and are the recommended choice when using Claude as the generation LLM.

Parameters:

model (str) – Voyage model name (default "voyage-3").
api_key (str | None) – Voyage API key. Falls back to VOYAGE_API_KEY env var.
input_type (str | None) – "query" for queries, "document" for documents to index. Using the correct type improves retrieval quality.
batch_size (int) – Maximum texts per API call.

async aembed(texts)[source]

Async variant of embed().

Return type:: list[list[float]]

property dimension: int

Dimensionality of the embedding vectors.

Returns -1 if not known until after the first call.

embed(texts)[source]

Embed texts synchronously.

Parameters:: texts (list[str]) – Non-empty list of strings to embed.
Return type:: list[list[float]]
Returns:: list[list[float]] – One embedding vector per input text, in the same order.

Stores

Abstract base class for vector stores.

class ractogateway.rag.stores.base.BaseVectorStore[source]

Bases: ABC

Persist and search embedding vectors.

All vector stores share the same interface: add(), search(), delete(), clear(), and count(). The underlying storage backend is determined by the concrete subclass.

abstractmethod add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

abstractmethod clear()[source]

Remove all chunks from the store.

Return type:: None

abstractmethod count()[source]

Return the total number of indexed chunks.

Return type:: int

abstractmethod delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

abstractmethod search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

In-memory vector store — pure Python, zero extra dependencies.

Uses brute-force cosine similarity over a list of stored vectors. Suitable for development, testing, and small corpora (< 10k chunks).

class ractogateway.rag.stores.in_memory_store.InMemoryVectorStore(similarity='cosine')[source]

Bases: BaseVectorStore

Pure-Python brute-force vector store — no extra dependencies.

This store keeps all chunks and their embeddings in memory. It is not suitable for production-scale corpora but requires no installation.

Parameters:: similarity (str) – Similarity function to use. Currently only "cosine" is supported.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

ChromaDB vector store (lazy import).

Install with: pip install ractogateway[rag-chroma]

class ractogateway.rag.stores.chroma_store.ChromaStore(collection='ractogateway', *, path=None, host=None, port=8000, distance_function='cosine')[source]

Bases: BaseVectorStore

Vector store backed by ChromaDB.

Supports both in-process (path or None for ephemeral) and HTTP-client modes (host + port).

Parameters:

collection (str) – Name of the ChromaDB collection.
path (str | None) – Persist directory for a local persistent client. None = ephemeral.
host (str | None) – ChromaDB server host (enables HTTP client mode).
port (int) – ChromaDB server port (default 8000).
distance_function (str) – "cosine", "l2", or "ip" (inner product).

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

FAISS vector store (lazy import).

Install with: pip install ractogateway[rag-faiss]

class ractogateway.rag.stores.faiss_store.FAISSStore(dimension=None, index_type='flat_ip')[source]

Bases: BaseVectorStore

Vector store backed by Facebook AI Similarity Search (FAISS).

Stores embeddings in a flat L2 or cosine (Inner Product) index. All data is in-memory; call save() / load() to persist.

Parameters:

dimension (int | None) – Embedding dimension. Inferred from the first add() call if None.
index_type (str) – "flat_l2" or "flat_ip" (inner product / cosine when normalised).

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

load(path)[source]

Load a previously saved index from path.

Return type:: None

save(path)[source]

Persist the FAISS index to path.index and chunks to path.chunks.

Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

Pinecone vector store (lazy import).

Install with: pip install ractogateway[rag-pinecone]

class ractogateway.rag.stores.pinecone_store.PineconeStore(index_name, *, api_key=None, namespace='', batch_size=100)[source]

Bases: BaseVectorStore

Vector store backed by Pinecone cloud.

Parameters:

index_name (str) – Name of the Pinecone index (must already exist).
api_key (str | None) – Pinecone API key. Falls back to PINECONE_API_KEY env var.
namespace (str) – Pinecone namespace for logical data isolation.
environment – Deprecated Pinecone environment string (for legacy pod-based indexes).
batch_size (int) – Number of vectors per upsert batch.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

Qdrant vector store (lazy import).

Install with: pip install ractogateway[rag-qdrant]

class ractogateway.rag.stores.qdrant_store.QdrantStore(collection='ractogateway', *, url=None, api_key=None, distance='cosine', dimension=None, batch_size=100)[source]

Bases: BaseVectorStore

Vector store backed by Qdrant.

Parameters:

collection (str) – Qdrant collection name.
url (str | None) – Qdrant server URL. None = in-memory.
api_key (str | None) – Qdrant cloud API key (optional).
distance (str) – "cosine", "euclid", or "dot".
dimension (int | None) – Vector dimension. Inferred on first add if None.
batch_size (int) – Points per upsert batch.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

Weaviate vector store (lazy import).

Install with: pip install ractogateway[rag-weaviate]

class ractogateway.rag.stores.weaviate_store.WeaviateStore(class_name='RactoChunk', *, url=None, api_key=None, additional_headers=None, distance_metric='cosine', batch_size=100)[source]

Bases: BaseVectorStore

Vector store backed by Weaviate.

Supports embedded (local, no server needed), local server, and Weaviate Cloud (WCS) connections.

Parameters:

class_name (str) – Weaviate class (collection) name.
url (str | None) – Weaviate server URL. None = use embedded Weaviate.
api_key (str | None) – Weaviate Cloud API key.
additional_headers (dict[str, str] | None) – Extra HTTP headers (e.g. for OpenAI API key pass-through to Weaviate).
distance_metric (str) – "cosine" or "l2-squared".
batch_size (int) – Objects per batch import.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

Milvus / Zilliz vector store (lazy import).

Install with: pip install ractogateway[rag-milvus]

class ractogateway.rag.stores.milvus_store.MilvusStore(collection='ractogateway', *, host='localhost', port=19530, uri=None, token=None, dimension=None, metric_type='IP', batch_size=100)[source]

Bases: BaseVectorStore

Vector store backed by Milvus or Zilliz Cloud.

Parameters:

collection (str) – Milvus collection name.
host (str) – Milvus server host (default "localhost").
port (int) – Milvus server port (default 19530).
uri (str | None) – Zilliz Cloud URI (overrides host/port when set).
token (str | None) – Zilliz Cloud API token.
dimension (int | None) – Embedding dimension. Inferred on first add.
metric_type (str) – "IP" (inner product / cosine) or "L2".
batch_size (int) – Vectors per insert batch.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).

PostgreSQL + pgvector store (lazy import).

Install with: pip install ractogateway[rag-pgvector]

class ractogateway.rag.stores.pgvector_store.PGVectorStore(dsn, *, table='rag_chunks', dimension=None, distance='cosine', batch_size=100)[source]

Bases: BaseVectorStore

Vector store backed by PostgreSQL with the pgvector extension.

Parameters:

dsn (str) – PostgreSQL connection string (e.g. "postgresql://user:pass@localhost/mydb").
table (str) – Table name (default "rag_chunks").
dimension (int | None) – Embedding dimension. Inferred on first add.
distance (str) – "cosine", "l2", or "inner".
batch_size (int) – Rows per INSERT batch.

add(chunks)[source]

Add chunks (with embeddings) to the store.

Parameters:: chunks (list[Chunk]) – Chunks to index. Each chunk must have a non-None embedding.
Raises:: ValueError – If any chunk has embedding=None.
Return type:: None

clear()[source]

Remove all chunks from the store.

Return type:: None

count()[source]

Return the total number of indexed chunks.

Return type:: int

delete(chunk_ids)[source]

Remove chunks with the given IDs from the store.

Return type:: None

search(embedding, top_k=5, filters=None)[source]

Search for the top_k most similar chunks.

Parameters:

embedding (list[float]) – Query embedding vector.
top_k (int) – Number of results to return.
filters (dict[str, Any] | None) – Optional metadata filters (store-specific format).

Return type:

list[RetrievalResult]

Returns:

list[RetrievalResult] – Ranked list of results (rank 1 = most similar).