ractogateway.rag._models.document

Core document and chunk models for RAG.

Every piece of content in the RAG pipeline is represented as a Document (raw, as loaded from a file) or a Chunk (a processed, embeddable slice of a document). Both are strict Pydantic models with no unvalidated fields.

class ractogateway.rag._models.document.ChunkMetadata(**data)[source]

Bases: BaseModel

Provenance and positional data attached to every chunk.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

source: str
page: int | None
chunk_index: int
total_chunks: int
start_char: int
end_char: int
doc_id: str
extra: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag._models.document.Document(**data)[source]

Bases: BaseModel

A raw document loaded from a file or supplied as plain text.

Parameters:
  • content (str) – The full extracted text of the document.

  • source (str) – Absolute file path, URL, or a descriptive label (e.g. "manual").

  • metadata (dict[str, Any]) – Free-form key/value pairs (file size, author, MIME type, …).

  • doc_id (str) – Auto-generated UUID; override only when you need stable IDs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

doc_id: str
content: str
source: str
metadata: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.rag._models.document.Chunk(**data)[source]

Bases: BaseModel

A single embeddable slice of a document.

Produced by a BaseChunker and enriched with an embedding vector by a BaseEmbedder.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

chunk_id: str
doc_id: str
content: str
embedding: list[float] | None
metadata: ChunkMetadata
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].