ractogateway.pipelines.video_processor._models

Models for VideoProcessorPipeline.

class ractogateway.pipelines.video_processor._models.DeduplicationMethod(*values)[source]

Bases: str, Enum

Frame similarity algorithm used for deduplication.

PHASH = 'phash'
SSIM = 'ssim'
class ractogateway.pipelines.video_processor._models.FrameAnalysisMode(*values)[source]

Bases: str, Enum

How frames are sent to the vision LLM.

INDIVIDUAL = 'individual'
GRID = 'grid'
class ractogateway.pipelines.video_processor._models.VideoProcessingMode(*values)[source]

Bases: str, Enum

How much of the video should be processed.

ACTIVE = 'active'
PASSIVE = 'passive'
class ractogateway.pipelines.video_processor._models.TranscriberBackend(*values)[source]

Bases: str, Enum

Audio transcription backend.

FASTER_WHISPER = 'faster-whisper'
OPENAI_WHISPER = 'openai-whisper'
HUGGINGFACE_LOCAL = 'huggingface-local'
OPENAI_API = 'openai-api'
GOOGLE_API = 'google-api'
HUGGINGFACE_API = 'huggingface-api'
GROQ_API = 'groq-api'
DEEPGRAM_API = 'deepgram-api'
OLLAMA = 'ollama'
class ractogateway.pipelines.video_processor._models.VideoConfig(**data)[source]

Bases: BaseModel

All tunable hyperparameters for VideoProcessorPipeline.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

fps: float

Frames to sample per second of video.

similarity_threshold: float

Discard a frame whose similarity to the previous kept frame is >= this %. Lower = keep more frames. Range 0-100.

max_frames: int | None

Hard cap on frames kept (None = no cap).

dedup_method: DeduplicationMethod

Algorithm used to compare frame similarity.

frame_format: str

‘JPEG’ (smaller) or ‘PNG’ (lossless).

Type:

Image format for kept frames

analyze_frames: bool

Pass kept frames to the vision LLM for content extraction.

frame_analysis_mode: FrameAnalysisMode

Individual = one LLM call per frame; Grid = stitch frames into a collage.

grid_size: int

Number of frames per grid collage (used when frame_analysis_mode=’grid’).

batch_size: int

How many frames to submit to the LLM concurrently per batch.

max_workers: int

Thread-pool size for concurrent LLM frame analysis calls.

max_process_workers: int

Process-pool size for CPU-bound frame extraction / hashing.

transcribe_audio: bool

Extract and transcribe the video’s audio track.

transcriber_backend: TranscriberBackend

Which transcription engine to use.

transcriber_model: str

Model name / size — interpretation is backend-specific.

Examples:

faster-whisper / openai-whisper : “tiny” “base” “small” “medium” “large-v3” huggingface-local / -api : HF model ID e.g. “openai/whisper-large-v3” openai-api : “whisper-1” google-api : “long” “short” “latest_long” groq-api : “whisper-large-v3” “whisper-large-v3-turbo” deepgram-api : “nova-3” “nova-2” “enhanced” “base” ollama : model name on server e.g. “whisper”

transcriber_api_key: str | None

API key for cloud transcription backends (falls back to env vars).

transcriber_base_url: str | None

Base URL for self-hosted / Ollama transcription endpoints.

language: str | None

BCP-47 language code (e.g. ‘en’, ‘fr’). None = auto-detect.

generate_summary: bool

Generate a comprehensive textual summary at the end.

store_in_rag: bool

Push all extracted content into the supplied rag_pipeline for Q&A.

processing_mode: VideoProcessingMode

active processes full video; passive processes only a time window.

focus_time_seconds: float | None

10).

Type:

Center timestamp in seconds for passive mode (e.g. 130 for 02

window_seconds: float

Passive-mode half-window size in seconds (focus ± window_seconds).

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.pipelines.video_processor._models.FrameEntry(**data)[source]

Bases: BaseModel

One video frame, after extraction and optional analysis.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

frame_id: int

Zero-based sequential frame identifier.

timestamp: float

Position in the video in seconds.

similarity_to_prev: float | None

Similarity percentage to the previous kept frame (None for first frame).

kept: bool

False if discarded by the deduplication step.

analysis: str | None

LLM-generated description of visual content (whiteboard, screen, etc.).

image_data: bytes | None

Raw image bytes for kept + analyzed frames.

image_format: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.pipelines.video_processor._models.TranscriptSegment(**data)[source]

Bases: BaseModel

A time-bounded transcription segment aligned to frame IDs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

start: float

Segment start time in seconds.

end: float

Segment end time in seconds.

text: str

Transcribed text for this segment.

frame_ids: list[int]

IDs of kept frames whose timestamps fall within [start, end].

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.pipelines.video_processor._models.VideoSection(**data)[source]

Bases: BaseModel

A merged time section combining visual analysis + audio transcript.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

timestamp_start: float
timestamp_end: float
frame_ids: list[int]
visual_content: str

Combined LLM analyses for all frames in this section.

audio_content: str

Concatenated transcript text for this section’s time range.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.pipelines.video_processor._models.VideoProcessorUsage(**data)[source]

Bases: BaseModel

Accounting of tokens and frame counts across the full pipeline.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

frames_extracted: int
frames_kept: int
frames_discarded: int
analysis_input_tokens: int
analysis_output_tokens: int
summary_input_tokens: int
summary_output_tokens: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

audio_duration_seconds: float
property total_analysis_tokens: int
property total_summary_tokens: int
property total_tokens: int
class ractogateway.pipelines.video_processor._models.StageError(**data)[source]

Bases: BaseModel

Structured record of a failure in one pipeline stage.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

stage: str

Name of the pipeline stage that failed (e.g. ‘extract’, ‘transcribe’).

error_type: str

Exception class name (e.g. ‘ImportError’, ‘RuntimeError’).

message: str

str(exc) — the error message.

traceback: str | None

Full Python traceback as a string (available in safe_mode).

class ractogateway.pipelines.video_processor._models.VideoProcessorResult(**data)[source]

Bases: BaseModel

Full output of a VideoProcessorPipeline run.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

video_path: str

Original source identifier (path, URL, or ‘<bytes>’ for buffer input).

frames: list[FrameEntry]

All extracted frames (kept and discarded).

transcript: list[TranscriptSegment]

Audio transcript segmented by timestamp.

sections: list[VideoSection]

Merged visual + audio sections ordered by time.

summary: str | None

Comprehensive LLM-generated summary of the entire video.

rag_stored: bool
rag_chunk_count: int
usage: VideoProcessorUsage
error: str | None

Short description of the first fatal error (backward-compatible).

failed_stage: str | None

Name of the stage that caused a fatal pipeline abort, if any.

stage_errors: list[StageError]

All per-stage errors collected during the run (fatal + non-fatal).

processing_mode: VideoProcessingMode

Whether this run processed full video (active) or a window (passive).

window_start_seconds: float | None

Passive-mode window start timestamp in source-video seconds.

window_end_seconds: float | None

Passive-mode window end timestamp in source-video seconds.

question: str | None

Optional user question answered from this run.

answer: str | None

Answer generated for question, when question-answer mode is used.

property has_errors: bool

True if any stage encountered an error.

property is_failed: bool

True if the pipeline aborted early due to a fatal stage error.

get_transcript_text()[source]

Full transcript as a single string.

Return type:

str

get_all_visual_content()[source]

All frame analyses concatenated in timestamp order.

Return type:

str

to_json(path=None, *, indent=2)[source]

Serialise result to JSON. Returns JSON string if path is None.

Return type:

str | None

to_markdown(path=None)[source]

Build a structured Markdown report. Returns string if path is None.

Return type:

str | None

exception ractogateway.pipelines.video_processor._models.VideoRateLimitExceededError[source]

Bases: RuntimeError

Raised when a rate_limiter denies a VideoProcessorPipeline request.