ractogateway.pipelines.video_processor._transcriber
Audio extraction and transcription for VideoProcessorPipeline.
Supports 9 transcription backends via a pluggable BaseTranscriber:
- Local / open-source:
faster-whisper — faster-whisper library (default) openai-whisper — openai-whisper library huggingface-local — HuggingFace transformers ASR pipeline
- Cloud APIs:
openai-api — OpenAI Whisper API google-api — Google Cloud Speech-to-Text v2 huggingface-api — HuggingFace Inference API groq-api — Groq Whisper (ultra-fast cloud) deepgram-api — Deepgram Nova
- Self-hosted:
ollama — Ollama server (audio-capable models)
- ractogateway.pipelines.video_processor._transcriber.extract_audio(video_path, *, start_time_seconds=None, end_time_seconds=None)[source]
Extract audio from video_path to a WAV temp file via ffmpeg-python.
When start/end bounds are provided, only that time window is extracted.
- Return type:
- ractogateway.pipelines.video_processor._transcriber.get_audio_duration(audio_path)[source]
Return audio duration in seconds using ffmpeg probe.
- Return type:
- ractogateway.pipelines.video_processor._transcriber.align_frames_to_transcript(frames, segments)[source]
Assign frame IDs to transcript segments by timestamp overlap.
- Return type:
list[TranscriptSegment]
- class ractogateway.pipelines.video_processor._transcriber.BaseTranscriber[source]
Bases:
ABCAbstract interface for all transcription backends.
- class ractogateway.pipelines.video_processor._transcriber.FasterWhisperTranscriber(model_size='base')[source]
Bases:
BaseTranscriberLocal transcription using the faster-whisper library.
- class ractogateway.pipelines.video_processor._transcriber.OpenAIWhisperTranscriber(model_size='base')[source]
Bases:
BaseTranscriberLocal transcription using the openai-whisper library.
- class ractogateway.pipelines.video_processor._transcriber.HuggingFaceLocalTranscriber(model_id='openai/whisper-base')[source]
Bases:
BaseTranscriberLocal ASR transcription via HuggingFace transformers pipeline.
- class ractogateway.pipelines.video_processor._transcriber.OpenAIAPITranscriber(model='whisper-1', api_key=None)[source]
Bases:
BaseTranscriberCloud transcription via OpenAI Whisper API.
- class ractogateway.pipelines.video_processor._transcriber.GoogleAPITranscriber(model='long', api_key=None)[source]
Bases:
BaseTranscriberCloud transcription via Google Cloud Speech-to-Text v2.
- class ractogateway.pipelines.video_processor._transcriber.HuggingFaceAPITranscriber(model_id='openai/whisper-large-v3', api_key=None)[source]
Bases:
BaseTranscriberCloud transcription via HuggingFace Inference API.
- class ractogateway.pipelines.video_processor._transcriber.GroqTranscriber(model='whisper-large-v3', api_key=None)[source]
Bases:
BaseTranscriberCloud transcription via Groq Whisper API (ultra-fast).
- class ractogateway.pipelines.video_processor._transcriber.DeepgramTranscriber(model='nova-3', api_key=None)[source]
Bases:
BaseTranscriberCloud transcription via Deepgram Nova.
- class ractogateway.pipelines.video_processor._transcriber.OllamaTranscriber(model='whisper', base_url=None)[source]
Bases:
BaseTranscriberSelf-hosted transcription via Ollama server (audio-capable models).
- ractogateway.pipelines.video_processor._transcriber.get_transcriber(backend, model, api_key, base_url)[source]
Return the concrete
BaseTranscriberfor the given backend.- Return type:
BaseTranscriber