# RactoGateway **One Python package for production-grade AI development.** RactoGateway is a unified AI SDK that gives you one clean interface for OpenAI, Google Gemini, Anthropic Claude, Ollama (local), and HuggingFace. It combines prompt engineering, strict Pydantic validation, tool calling, streaming, embeddings, fine-tuning, RAG, and production infrastructure in one library. ## Why RactoGateway? Every LLM provider has a different SDK, request format, response structure, and tool-calling schema. Production AI systems often turn into glue code and brittle parsers. RactoGateway solves this by providing: - `RactoPrompt` (RACTO) for structured prompting with anti-hallucination guardrails. - Five unified developer kits: OpenAI (`gpt`), Google (`gemini`), Anthropic (`claude`), Ollama (`local`), HuggingFace (`hf`). - Strict typed models for input/output and robust response validation. - Unified tool calling via `ToolRegistry`. - Typed streaming chunks and async support across providers. - End-to-end retrieval with `RactoRAG` plus vectorless `PageIndexRAG`. - Turn-key workflows: `SQLAnalystPipeline`, `ListClassifierPipeline`, `VideoProcessorPipeline`, `AgentPipeline`. - Production controls: exact cache, semantic cache, routing, truncation, batch. - Ops modules for Redis, Celery, Kafka, MCP, and telemetry. ### Use-Case Map | Use case | Typical friction | How RactoGateway helps | | --- | --- | --- | | Build chat/API assistants | Provider SDK drift and response shape mismatch | One `ChatConfig` + one `LLMResponse` model across providers | | Return strict JSON for automation | Markdown fenced JSON and schema drift | `RactoPrompt(output_format=YourModel)` embeds schema and enforces shape | | Add tools into workflows | Different function-calling formats per vendor | Register Python tools once with `ToolRegistry` | | Build RAG assistants | Stitching readers/chunkers/embedders/stores manually | `RactoRAG` handles ingest -> retrieve -> generate | | Keep costs predictable | Duplicate calls and oversized model usage | Cache + routing + truncation + batch controls | | Operate on multiple servers | In-memory cache/memory does not scale | Redis modules for distributed cache, memory, and rate limits | | Run long jobs safely | Request-thread failures and retries | `RactoCeleryWorker` for retries and background execution | ### Why It Stands Different | Dimension | Typical approach | RactoGateway approach | Practical impact | | --- | --- | --- | --- | | Provider support | Rebuild when switching SDK | Same mental model across providers | Faster migration and multi-provider strategy | | Prompt reliability | Ad-hoc prompt strings | Structured RACTO prompts | More consistent outputs | | Output safety | Manual `json.loads` parsing | Typed validation + normalized responses | Fewer runtime failures | | Tool integration | Vendor-specific tool payloads | Single `ToolRegistry` abstraction | Less integration code | | RAG delivery | Many separate libraries | One orchestrator with pluggable parts | Faster production rollout | | Scale and operations | Infra bolted on later | Redis/Celery/Kafka/MCP first-class modules | Better reliability and throughput | ### Platform Architecture RactoGateway is designed as one composable stack rather than disconnected helper utilities: | Layer | Core modules | What you get | | --- | --- | --- | | Prompt and output control | `RactoPrompt`, `RactoFile` | Structured prompts (RACTO), anti-hallucination guardrails, deterministic output shape | | Multi-provider chat | `openai_developer_kit`, `google_developer_kit`, `anthropic_developer_kit`, `ollama_developer_kit`, `huggingface_developer_kit` | One mental model across cloud and local LLM providers | | Tool execution | `ToolRegistry`, `tool` decorator | Define Python tools once and execute them through a provider-agnostic interface | | Structured response safety | `response_model` support + strict validation | Typed results instead of brittle raw JSON parsing | | Retrieval pipeline | `RactoRAG`, `PageIndexRAG`, readers/chunkers/embedders/stores | Ingest -> retrieve -> generate for document-grounded answers | | Turn-key workflows | `SQLAnalystPipeline`, `ListClassifierPipeline`, `VideoProcessorPipeline`, `AgentPipeline` | Complete domain workflows with sync and async variants | | Cost and performance controls | exact cache, semantic cache, routing, truncation, batch | Lower spend, lower latency, and better throughput | | Production operations | Redis, Celery, Kafka, MCP, telemetry | Distributed memory/cache/rate-limits, background jobs, streaming, and observability | ## End-to-End Pipeline in Practice Use the library as a composable delivery pipeline instead of isolated API calls: 1. Define behavior with `RactoPrompt` (role, aim, constraints, tone, output). 2. Choose any provider kit (`gpt`, `gemini`, `claude`, `local`, or `hf`). 3. Call `chat()` / `stream()` / `embed()` with typed config models. 4. Optionally attach tools via `ToolRegistry` for function execution. 5. Optionally add retrieval with `RactoRAG` or `PageIndexRAG`. 6. Optionally move to prebuilt pipelines for SQL analytics, classification, video intelligence, or agentic loops. 7. Add production controls (cache, routing, truncation, batch, Redis, Celery). 8. Observe and operate with telemetry, Kafka integration, and MCP interoperability. ## Pipeline Catalog | Pipeline | Input | Output | Typical use case | | --- | --- | --- | --- | | `SQLAnalystPipeline` | Natural language question + DB connection | SQL, result tables, narrative answer, optional chart | BI copilots, operations reporting, analytics assistants | | `ListClassifierPipeline` | User text + controlled options list | Single/multi label, confidence, optional reasoning | Ticket routing, intent detection, workflow triage | | `VideoProcessorPipeline` | Video path/URL/YouTube/bytes | Transcript, frame analysis, section summaries, optional RAG storage | Lecture indexing, training content QA, media intelligence | | `AgentPipeline` | Goal + tools | Multi-step tool traces + final answer | ReAct-style automation, tool-driven agents, research workflows | ## Real-World Use Cases (Implementation Blueprints) The examples below show how teams use RactoGateway as a full delivery stack, not just a chat wrapper. | Scenario | Primary modules | What ships to production | | --- | --- | --- | | Customer support copilot | `ListClassifierPipeline`, `ToolRegistry`, `RactoRAG`, Redis modules | Auto-routing, grounded answers, strict response schema, low-latency cached replies | | BI and data analyst assistant | `SQLAnalystPipeline`, `RactoPrompt`, typed models | Natural language to SQL, safe query execution, markdown answer, optional charts | | Internal knowledge assistant | `RactoRAG` or `PageIndexRAG`, `RactoPrompt` | Policy and SOP answers grounded on private docs with source-aware retrieval | | Video intelligence pipeline | `VideoProcessorPipeline`, optional RAG store | Transcript + frame analysis + summary, then searchable knowledge base from videos | | Agentic back-office automation | `AgentPipeline`, `ToolRegistry`, `MCP`, Celery | Multi-step tool execution with bounded steps, retries, and background orchestration | ### 1) Customer Support Copilot (SaaS) **Business goal:** Reduce first-response time while keeping answers accurate and auditable. **Typical stack:** 1. Route incoming tickets with `ListClassifierPipeline`. 2. Use `ToolRegistry` for account lookup, billing state, and CRM actions. 3. Ground responses with `RactoRAG` over help-center and policy docs. 4. Enforce structured outputs with `response_model` (no free-form drift). 5. Run Redis cache + memory + rate limit for multi-server deployments. ```python from pydantic import BaseModel from ractogateway import openai_developer_kit as gpt from ractogateway.pipelines import ListClassifierPipeline from ractogateway.redis import RedisExactCache class SupportReply(BaseModel): route: str reply: str escalate: bool classifier = ListClassifierPipeline( kit=gpt.Chat(model="gpt-4o-mini"), options=["Billing", "Technical Support", "Account", "Sales"], ) ticket = "My invoice is wrong and payment failed" route = classifier.run(ticket).first or "Billing" print("Predicted route:", route) kit = gpt.Chat( model="gpt-4o", exact_cache=RedisExactCache(url="redis://localhost:6379/0", ttl_seconds=3600), ) result = kit.chat( gpt.ChatConfig( user_message=( f"Customer ticket: {ticket}\n" f"Predicted team route: {route}\n" "Resolve this ticket with account-safe steps." ), response_model=SupportReply, ) ) parsed = result.parsed print("Final route:", parsed.route) print("Reply:", parsed.reply) print("Escalate:", parsed.escalate) ``` ```text Predicted route: Billing Final route: Billing Reply: I can help with this billing issue. I will verify invoice line-items and retry payment safely. Escalate: False ``` ### 2) BI Analyst Copilot **Business goal:** Let business teams ask plain-English data questions and get reliable answers. **Typical stack:** 1. `SQLAnalystPipeline` for NL -> SQL -> execution -> analysis. 2. Read-only guardrails + `safe_mode=True` for operational safety. 3. Optional chart generation for dashboard-ready output. ```python from ractogateway import openai_developer_kit as gpt from ractogateway.pipelines import SQLAnalystPipeline pipeline = SQLAnalystPipeline(kit=gpt.Chat(model="gpt-4o"), safe_mode=True) result = pipeline.run( user_query="Top 10 products by revenue growth this quarter", connection_string="postgresql://user:pass@localhost:5432/warehouse", ) print("SQL:", result.sql_query) print("Answer:", result.answer) ``` ```text SQL: SELECT product_name, growth_pct FROM quarterly_growth ORDER BY growth_pct DESC LIMIT 10; Answer: The top growth products this quarter are Product A, Product B, and Product C, led by strong repeat purchases. ``` ### 3) Internal Knowledge Assistant (Policies, SOPs, Engineering Docs) **Business goal:** Replace document hunt with grounded Q&A over private content. **Typical stack:** 1. Ingest docs with `RactoRAG` (`pdf`, `docx`, `xlsx`, html, text). 2. Pick embedder + vector store based on your environment. 3. Use a strict prompt and retrieval filters for domain-safe answers. ```python from ractogateway import openai_developer_kit as gpt from ractogateway.rag import RactoRAG from ractogateway.rag.embedders import OpenAIEmbedder from ractogateway.rag.stores import ChromaStore rag = RactoRAG( vector_store=ChromaStore(collection="internal_docs", persist_directory="./db"), embedder=OpenAIEmbedder(model="text-embedding-3-large"), llm_kit=gpt.Chat(model="gpt-4o"), ) rag.ingest_dir("./knowledge_base", pattern="**/*") response = rag.query("What is our production incident escalation policy?", top_k=5) print(response.answer) ``` ```text P1 incidents must be acknowledged within 5 minutes, incident commander assigned immediately, and stakeholder updates posted every 15 minutes until resolution. ``` ### 4) Video Learning Intelligence **Business goal:** Convert training recordings into searchable, reusable knowledge. **Typical stack:** 1. `VideoProcessorPipeline` for frame dedup + transcription + visual analysis. 2. Generate sectioned summaries for quick comprehension. 3. Store outputs in RAG for downstream question answering. ```python from ractogateway import openai_developer_kit as gpt from ractogateway.pipelines import VideoProcessorPipeline, TranscriberBackend video = VideoProcessorPipeline( kit=gpt.Chat(model="gpt-4o"), transcriber=TranscriberBackend.FASTER_WHISPER, generate_summary=True, safe_mode=True, ) report = video.run("onboarding_session.mp4") print(report.summary) print("Sections:", len(report.sections)) ``` ```text This onboarding covers architecture basics, deployment flow, and incident response ownership. Sections: 6 ``` ### 5) Agentic Operations Automation **Business goal:** Automate multi-step tasks (fetch data, reason, call tools, return final action plan). **Typical stack:** 1. `AgentPipeline` with approved tool set (`SQL`, HTTP, RAG, custom Python tools). 2. `max_steps` and `safe_mode` for bounded execution. 3. Optional Celery for background and retry-safe execution. ```python from ractogateway import openai_developer_kit as gpt from ractogateway.pipelines import AgentPipeline def get_inventory(sku: str) -> str: return f"SKU {sku}: 42 units in stock" agent = AgentPipeline( kit=gpt.Chat(model="gpt-4o-mini"), tools=[get_inventory], max_steps=6, safe_mode=True, ) result = agent.run("Check stock for SKU-4481 and suggest reorder action.") print(result.final_answer) print("Stop reason:", result.stop_reason) ``` ```text SKU-4481 has 42 units in stock. Recommend reorder trigger at 20 units with a purchase order draft prepared now. Stop reason: finish_tool ``` ### 6) Cost-Controlled Multi-Provider Delivery **Business goal:** Keep quality high while controlling spend and avoiding vendor lock-in. **Typical stack:** 1. Start with one prompt contract (`RactoPrompt`). 2. Switch provider kits without changing business logic. 3. Add `ExactMatchCache`, `SemanticCache`, and routing for cost-performance balance. 4. Use batch APIs for large offline workloads. Result: one codebase, provider flexibility, and predictable cost envelopes as usage scales. ## Documentation Paths - New to the library: start with [Installation](installation.md) and [Quick Start](quickstart.md). - Building assistants and APIs: see [Developer Kits](guide/developer_kits.md), [Prompt Engine](guide/prompt_engine.md), and [Tools](guide/tools.md). - Building retrieval systems: see [RAG](guide/rag.md), [Embeddings](guide/embeddings.md), and [Pipelines](guide/pipelines.md). - Running in production: see [Cache](guide/cache.md), [Routing](guide/routing.md), [Redis](guide/redis.md), [Celery](guide/celery.md), [Kafka](guide/kafka.md), and [MCP](guide/mcp.md). - Improving LLM discoverability: see [LLM Discovery Guide](guide/llm_discovery.md) and root files `llms.txt`, `llms-full.txt`, and `robots.txt`. ```{toctree} :maxdepth: 2 :caption: Getting Started installation quickstart ``` ```{toctree} :maxdepth: 2 :caption: User Guide guide/userguide guide/llm_discovery guide/prompt_engine guide/developer_kits guide/ollama guide/huggingface guide/streaming guide/tools guide/embeddings guide/chain_of_thought guide/native_thinking guide/finetune guide/rag guide/pipelines guide/batch guide/cache guide/routing guide/truncation guide/mcp guide/redis guide/celery guide/kafka ``` ```{toctree} :maxdepth: 3 :caption: API Reference api/index ```