RactoGateway

One Python package for production-grade AI development.

RactoGateway is a unified AI SDK that gives you one clean interface for OpenAI, Google Gemini, Anthropic Claude, Ollama (local), and HuggingFace. It combines prompt engineering, strict Pydantic validation, tool calling, streaming, embeddings, fine-tuning, RAG, and production infrastructure in one library.

Why RactoGateway?

Every LLM provider has a different SDK, request format, response structure, and tool-calling schema. Production AI systems often turn into glue code and brittle parsers.

RactoGateway solves this by providing:

RactoPrompt (RACTO) for structured prompting with anti-hallucination guardrails.
Five unified developer kits: OpenAI (gpt), Google (gemini), Anthropic (claude), Ollama (local), HuggingFace (hf).
Strict typed models for input/output and robust response validation.
Unified tool calling via ToolRegistry.
Typed streaming chunks and async support across providers.
End-to-end retrieval with RactoRAG plus vectorless PageIndexRAG.
Turn-key workflows: SQLAnalystPipeline, ListClassifierPipeline, VideoProcessorPipeline, AgentPipeline.
Production controls: exact cache, semantic cache, routing, truncation, batch.
Ops modules for Redis, Celery, Kafka, MCP, and telemetry.

Use-Case Map

Use case	Typical friction	How RactoGateway helps
Build chat/API assistants	Provider SDK drift and response shape mismatch	One `ChatConfig` + one `LLMResponse` model across providers
Return strict JSON for automation	Markdown fenced JSON and schema drift	`RactoPrompt(output_format=YourModel)` embeds schema and enforces shape
Add tools into workflows	Different function-calling formats per vendor	Register Python tools once with `ToolRegistry`
Build RAG assistants	Stitching readers/chunkers/embedders/stores manually	`RactoRAG` handles ingest -> retrieve -> generate
Keep costs predictable	Duplicate calls and oversized model usage	Cache + routing + truncation + batch controls
Operate on multiple servers	In-memory cache/memory does not scale	Redis modules for distributed cache, memory, and rate limits
Run long jobs safely	Request-thread failures and retries	`RactoCeleryWorker` for retries and background execution

Why It Stands Different

Dimension	Typical approach	RactoGateway approach	Practical impact
Provider support	Rebuild when switching SDK	Same mental model across providers	Faster migration and multi-provider strategy
Prompt reliability	Ad-hoc prompt strings	Structured RACTO prompts	More consistent outputs
Output safety	Manual `json.loads` parsing	Typed validation + normalized responses	Fewer runtime failures
Tool integration	Vendor-specific tool payloads	Single `ToolRegistry` abstraction	Less integration code
RAG delivery	Many separate libraries	One orchestrator with pluggable parts	Faster production rollout
Scale and operations	Infra bolted on later	Redis/Celery/Kafka/MCP first-class modules	Better reliability and throughput

Platform Architecture

RactoGateway is designed as one composable stack rather than disconnected helper utilities:

Layer	Core modules	What you get
Prompt and output control	`RactoPrompt`, `RactoFile`	Structured prompts (RACTO), anti-hallucination guardrails, deterministic output shape
Multi-provider chat	`openai_developer_kit`, `google_developer_kit`, `anthropic_developer_kit`, `ollama_developer_kit`, `huggingface_developer_kit`	One mental model across cloud and local LLM providers
Tool execution	`ToolRegistry`, `tool` decorator	Define Python tools once and execute them through a provider-agnostic interface
Structured response safety	`response_model` support + strict validation	Typed results instead of brittle raw JSON parsing
Retrieval pipeline	`RactoRAG`, `PageIndexRAG`, readers/chunkers/embedders/stores	Ingest -> retrieve -> generate for document-grounded answers
Turn-key workflows	`SQLAnalystPipeline`, `ListClassifierPipeline`, `VideoProcessorPipeline`, `AgentPipeline`	Complete domain workflows with sync and async variants
Cost and performance controls	exact cache, semantic cache, routing, truncation, batch	Lower spend, lower latency, and better throughput
Production operations	Redis, Celery, Kafka, MCP, telemetry	Distributed memory/cache/rate-limits, background jobs, streaming, and observability

End-to-End Pipeline in Practice

Use the library as a composable delivery pipeline instead of isolated API calls:

Define behavior with RactoPrompt (role, aim, constraints, tone, output).
Choose any provider kit (gpt, gemini, claude, local, or hf).
Call chat() / stream() / embed() with typed config models.
Optionally attach tools via ToolRegistry for function execution.
Optionally add retrieval with RactoRAG or PageIndexRAG.
Optionally move to prebuilt pipelines for SQL analytics, classification, video intelligence, or agentic loops.
Add production controls (cache, routing, truncation, batch, Redis, Celery).
Observe and operate with telemetry, Kafka integration, and MCP interoperability.

Pipeline Catalog

Pipeline	Input	Output	Typical use case
`SQLAnalystPipeline`	Natural language question + DB connection	SQL, result tables, narrative answer, optional chart	BI copilots, operations reporting, analytics assistants
`ListClassifierPipeline`	User text + controlled options list	Single/multi label, confidence, optional reasoning	Ticket routing, intent detection, workflow triage
`VideoProcessorPipeline`	Video path/URL/YouTube/bytes	Transcript, frame analysis, section summaries, optional RAG storage	Lecture indexing, training content QA, media intelligence
`AgentPipeline`	Goal + tools	Multi-step tool traces + final answer	ReAct-style automation, tool-driven agents, research workflows

Real-World Use Cases (Implementation Blueprints)

The examples below show how teams use RactoGateway as a full delivery stack, not just a chat wrapper.

Scenario	Primary modules	What ships to production
Customer support copilot	`ListClassifierPipeline`, `ToolRegistry`, `RactoRAG`, Redis modules	Auto-routing, grounded answers, strict response schema, low-latency cached replies
BI and data analyst assistant	`SQLAnalystPipeline`, `RactoPrompt`, typed models	Natural language to SQL, safe query execution, markdown answer, optional charts
Internal knowledge assistant	`RactoRAG` or `PageIndexRAG`, `RactoPrompt`	Policy and SOP answers grounded on private docs with source-aware retrieval
Video intelligence pipeline	`VideoProcessorPipeline`, optional RAG store	Transcript + frame analysis + summary, then searchable knowledge base from videos
Agentic back-office automation	`AgentPipeline`, `ToolRegistry`, `MCP`, Celery	Multi-step tool execution with bounded steps, retries, and background orchestration

1) Customer Support Copilot (SaaS)

Business goal: Reduce first-response time while keeping answers accurate and auditable.

Typical stack:

Route incoming tickets with ListClassifierPipeline.
Use ToolRegistry for account lookup, billing state, and CRM actions.
Ground responses with RactoRAG over help-center and policy docs.
Enforce structured outputs with response_model (no free-form drift).
Run Redis cache + memory + rate limit for multi-server deployments.

from pydantic import BaseModel
from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import ListClassifierPipeline
from ractogateway.redis import RedisExactCache

class SupportReply(BaseModel):
    route: str
    reply: str
    escalate: bool

classifier = ListClassifierPipeline(
    kit=gpt.Chat(model="gpt-4o-mini"),
    options=["Billing", "Technical Support", "Account", "Sales"],
)
ticket = "My invoice is wrong and payment failed"
route = classifier.run(ticket).first or "Billing"
print("Predicted route:", route)

kit = gpt.Chat(
    model="gpt-4o",
    exact_cache=RedisExactCache(url="redis://localhost:6379/0", ttl_seconds=3600),
)
result = kit.chat(
    gpt.ChatConfig(
        user_message=(
            f"Customer ticket: {ticket}\n"
            f"Predicted team route: {route}\n"
            "Resolve this ticket with account-safe steps."
        ),
        response_model=SupportReply,
    )
)
parsed = result.parsed
print("Final route:", parsed.route)
print("Reply:", parsed.reply)
print("Escalate:", parsed.escalate)

Predicted route: Billing
Final route: Billing
Reply: I can help with this billing issue. I will verify invoice line-items and retry payment safely.
Escalate: False

2) BI Analyst Copilot

Business goal: Let business teams ask plain-English data questions and get reliable answers.

Typical stack:

SQLAnalystPipeline for NL -> SQL -> execution -> analysis.
Read-only guardrails + safe_mode=True for operational safety.
Optional chart generation for dashboard-ready output.

from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import SQLAnalystPipeline

pipeline = SQLAnalystPipeline(kit=gpt.Chat(model="gpt-4o"), safe_mode=True)
result = pipeline.run(
    user_query="Top 10 products by revenue growth this quarter",
    connection_string="postgresql://user:pass@localhost:5432/warehouse",
)
print("SQL:", result.sql_query)
print("Answer:", result.answer)

SQL: SELECT product_name, growth_pct FROM quarterly_growth ORDER BY growth_pct DESC LIMIT 10;
Answer: The top growth products this quarter are Product A, Product B, and Product C, led by strong repeat purchases.

3) Internal Knowledge Assistant (Policies, SOPs, Engineering Docs)

Business goal: Replace document hunt with grounded Q&A over private content.

Typical stack:

Ingest docs with RactoRAG (pdf, docx, xlsx, html, text).
Pick embedder + vector store based on your environment.
Use a strict prompt and retrieval filters for domain-safe answers.

from ractogateway import openai_developer_kit as gpt
from ractogateway.rag import RactoRAG
from ractogateway.rag.embedders import OpenAIEmbedder
from ractogateway.rag.stores import ChromaStore

rag = RactoRAG(
    vector_store=ChromaStore(collection="internal_docs", persist_directory="./db"),
    embedder=OpenAIEmbedder(model="text-embedding-3-large"),
    llm_kit=gpt.Chat(model="gpt-4o"),
)
rag.ingest_dir("./knowledge_base", pattern="**/*")
response = rag.query("What is our production incident escalation policy?", top_k=5)
print(response.answer)

P1 incidents must be acknowledged within 5 minutes, incident commander assigned immediately, and stakeholder updates posted every 15 minutes until resolution.

4) Video Learning Intelligence

Business goal: Convert training recordings into searchable, reusable knowledge.

Typical stack:

VideoProcessorPipeline for frame dedup + transcription + visual analysis.
Generate sectioned summaries for quick comprehension.
Store outputs in RAG for downstream question answering.

from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import VideoProcessorPipeline, TranscriberBackend

video = VideoProcessorPipeline(
    kit=gpt.Chat(model="gpt-4o"),
    transcriber=TranscriberBackend.FASTER_WHISPER,
    generate_summary=True,
    safe_mode=True,
)
report = video.run("onboarding_session.mp4")
print(report.summary)
print("Sections:", len(report.sections))

This onboarding covers architecture basics, deployment flow, and incident response ownership.
Sections: 6

5) Agentic Operations Automation

Business goal: Automate multi-step tasks (fetch data, reason, call tools, return final action plan).

Typical stack:

AgentPipeline with approved tool set (SQL, HTTP, RAG, custom Python tools).
max_steps and safe_mode for bounded execution.
Optional Celery for background and retry-safe execution.

from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import AgentPipeline

def get_inventory(sku: str) -> str:
    return f"SKU {sku}: 42 units in stock"

agent = AgentPipeline(
    kit=gpt.Chat(model="gpt-4o-mini"),
    tools=[get_inventory],
    max_steps=6,
    safe_mode=True,
)
result = agent.run("Check stock for SKU-4481 and suggest reorder action.")
print(result.final_answer)
print("Stop reason:", result.stop_reason)

SKU-4481 has 42 units in stock. Recommend reorder trigger at 20 units with a purchase order draft prepared now.
Stop reason: finish_tool

6) Cost-Controlled Multi-Provider Delivery

Business goal: Keep quality high while controlling spend and avoiding vendor lock-in.

Typical stack:

Start with one prompt contract (RactoPrompt).
Switch provider kits without changing business logic.
Add ExactMatchCache, SemanticCache, and routing for cost-performance balance.
Use batch APIs for large offline workloads.

Result: one codebase, provider flexibility, and predictable cost envelopes as usage scales.

Documentation Paths

New to the library: start with Installation and Quick Start.
Building assistants and APIs: see Developer Kits, Prompt Engine, and Tools.
Building retrieval systems: see RAG, Embeddings, and Pipelines.
Running in production: see Cache, Routing, Redis, Celery, Kafka, and MCP.
Improving LLM discoverability: see LLM Discovery Guide and root files llms.txt, llms-full.txt, and robots.txt.

Getting Started

Installation
- Core
- LLM Providers
- RAG
Quick Start

User Guide

API Reference

API Reference