RactoGateway
One Python package for production-grade AI development.
RactoGateway is a unified AI SDK that gives you one clean interface for OpenAI, Google Gemini, Anthropic Claude, Ollama (local), and HuggingFace. It combines prompt engineering, strict Pydantic validation, tool calling, streaming, embeddings, fine-tuning, RAG, and production infrastructure in one library.
Why RactoGateway?
Every LLM provider has a different SDK, request format, response structure, and tool-calling schema. Production AI systems often turn into glue code and brittle parsers.
RactoGateway solves this by providing:
RactoPrompt(RACTO) for structured prompting with anti-hallucination guardrails.Five unified developer kits: OpenAI (
gpt), Google (gemini), Anthropic (claude), Ollama (local), HuggingFace (hf).Strict typed models for input/output and robust response validation.
Unified tool calling via
ToolRegistry.Typed streaming chunks and async support across providers.
End-to-end retrieval with
RactoRAGplus vectorlessPageIndexRAG.Turn-key workflows:
SQLAnalystPipeline,ListClassifierPipeline,VideoProcessorPipeline,AgentPipeline.Production controls: exact cache, semantic cache, routing, truncation, batch.
Ops modules for Redis, Celery, Kafka, MCP, and telemetry.
Use-Case Map
Use case |
Typical friction |
How RactoGateway helps |
|---|---|---|
Build chat/API assistants |
Provider SDK drift and response shape mismatch |
One |
Return strict JSON for automation |
Markdown fenced JSON and schema drift |
|
Add tools into workflows |
Different function-calling formats per vendor |
Register Python tools once with |
Build RAG assistants |
Stitching readers/chunkers/embedders/stores manually |
|
Keep costs predictable |
Duplicate calls and oversized model usage |
Cache + routing + truncation + batch controls |
Operate on multiple servers |
In-memory cache/memory does not scale |
Redis modules for distributed cache, memory, and rate limits |
Run long jobs safely |
Request-thread failures and retries |
|
Why It Stands Different
Dimension |
Typical approach |
RactoGateway approach |
Practical impact |
|---|---|---|---|
Provider support |
Rebuild when switching SDK |
Same mental model across providers |
Faster migration and multi-provider strategy |
Prompt reliability |
Ad-hoc prompt strings |
Structured RACTO prompts |
More consistent outputs |
Output safety |
Manual |
Typed validation + normalized responses |
Fewer runtime failures |
Tool integration |
Vendor-specific tool payloads |
Single |
Less integration code |
RAG delivery |
Many separate libraries |
One orchestrator with pluggable parts |
Faster production rollout |
Scale and operations |
Infra bolted on later |
Redis/Celery/Kafka/MCP first-class modules |
Better reliability and throughput |
Platform Architecture
RactoGateway is designed as one composable stack rather than disconnected helper utilities:
Layer |
Core modules |
What you get |
|---|---|---|
Prompt and output control |
|
Structured prompts (RACTO), anti-hallucination guardrails, deterministic output shape |
Multi-provider chat |
|
One mental model across cloud and local LLM providers |
Tool execution |
|
Define Python tools once and execute them through a provider-agnostic interface |
Structured response safety |
|
Typed results instead of brittle raw JSON parsing |
Retrieval pipeline |
|
Ingest -> retrieve -> generate for document-grounded answers |
Turn-key workflows |
|
Complete domain workflows with sync and async variants |
Cost and performance controls |
exact cache, semantic cache, routing, truncation, batch |
Lower spend, lower latency, and better throughput |
Production operations |
Redis, Celery, Kafka, MCP, telemetry |
Distributed memory/cache/rate-limits, background jobs, streaming, and observability |
End-to-End Pipeline in Practice
Use the library as a composable delivery pipeline instead of isolated API calls:
Define behavior with
RactoPrompt(role, aim, constraints, tone, output).Choose any provider kit (
gpt,gemini,claude,local, orhf).Call
chat()/stream()/embed()with typed config models.Optionally attach tools via
ToolRegistryfor function execution.Optionally add retrieval with
RactoRAGorPageIndexRAG.Optionally move to prebuilt pipelines for SQL analytics, classification, video intelligence, or agentic loops.
Add production controls (cache, routing, truncation, batch, Redis, Celery).
Observe and operate with telemetry, Kafka integration, and MCP interoperability.
Pipeline Catalog
Pipeline |
Input |
Output |
Typical use case |
|---|---|---|---|
|
Natural language question + DB connection |
SQL, result tables, narrative answer, optional chart |
BI copilots, operations reporting, analytics assistants |
|
User text + controlled options list |
Single/multi label, confidence, optional reasoning |
Ticket routing, intent detection, workflow triage |
|
Video path/URL/YouTube/bytes |
Transcript, frame analysis, section summaries, optional RAG storage |
Lecture indexing, training content QA, media intelligence |
|
Goal + tools |
Multi-step tool traces + final answer |
ReAct-style automation, tool-driven agents, research workflows |
Real-World Use Cases (Implementation Blueprints)
The examples below show how teams use RactoGateway as a full delivery stack, not just a chat wrapper.
Scenario |
Primary modules |
What ships to production |
|---|---|---|
Customer support copilot |
|
Auto-routing, grounded answers, strict response schema, low-latency cached replies |
BI and data analyst assistant |
|
Natural language to SQL, safe query execution, markdown answer, optional charts |
Internal knowledge assistant |
|
Policy and SOP answers grounded on private docs with source-aware retrieval |
Video intelligence pipeline |
|
Transcript + frame analysis + summary, then searchable knowledge base from videos |
Agentic back-office automation |
|
Multi-step tool execution with bounded steps, retries, and background orchestration |
1) Customer Support Copilot (SaaS)
Business goal: Reduce first-response time while keeping answers accurate and auditable.
Typical stack:
Route incoming tickets with
ListClassifierPipeline.Use
ToolRegistryfor account lookup, billing state, and CRM actions.Ground responses with
RactoRAGover help-center and policy docs.Enforce structured outputs with
response_model(no free-form drift).Run Redis cache + memory + rate limit for multi-server deployments.
from pydantic import BaseModel
from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import ListClassifierPipeline
from ractogateway.redis import RedisExactCache
class SupportReply(BaseModel):
route: str
reply: str
escalate: bool
classifier = ListClassifierPipeline(
kit=gpt.Chat(model="gpt-4o-mini"),
options=["Billing", "Technical Support", "Account", "Sales"],
)
ticket = "My invoice is wrong and payment failed"
route = classifier.run(ticket).first or "Billing"
print("Predicted route:", route)
kit = gpt.Chat(
model="gpt-4o",
exact_cache=RedisExactCache(url="redis://localhost:6379/0", ttl_seconds=3600),
)
result = kit.chat(
gpt.ChatConfig(
user_message=(
f"Customer ticket: {ticket}\n"
f"Predicted team route: {route}\n"
"Resolve this ticket with account-safe steps."
),
response_model=SupportReply,
)
)
parsed = result.parsed
print("Final route:", parsed.route)
print("Reply:", parsed.reply)
print("Escalate:", parsed.escalate)
Predicted route: Billing
Final route: Billing
Reply: I can help with this billing issue. I will verify invoice line-items and retry payment safely.
Escalate: False
2) BI Analyst Copilot
Business goal: Let business teams ask plain-English data questions and get reliable answers.
Typical stack:
SQLAnalystPipelinefor NL -> SQL -> execution -> analysis.Read-only guardrails +
safe_mode=Truefor operational safety.Optional chart generation for dashboard-ready output.
from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import SQLAnalystPipeline
pipeline = SQLAnalystPipeline(kit=gpt.Chat(model="gpt-4o"), safe_mode=True)
result = pipeline.run(
user_query="Top 10 products by revenue growth this quarter",
connection_string="postgresql://user:pass@localhost:5432/warehouse",
)
print("SQL:", result.sql_query)
print("Answer:", result.answer)
SQL: SELECT product_name, growth_pct FROM quarterly_growth ORDER BY growth_pct DESC LIMIT 10;
Answer: The top growth products this quarter are Product A, Product B, and Product C, led by strong repeat purchases.
3) Internal Knowledge Assistant (Policies, SOPs, Engineering Docs)
Business goal: Replace document hunt with grounded Q&A over private content.
Typical stack:
Ingest docs with
RactoRAG(pdf,docx,xlsx, html, text).Pick embedder + vector store based on your environment.
Use a strict prompt and retrieval filters for domain-safe answers.
from ractogateway import openai_developer_kit as gpt
from ractogateway.rag import RactoRAG
from ractogateway.rag.embedders import OpenAIEmbedder
from ractogateway.rag.stores import ChromaStore
rag = RactoRAG(
vector_store=ChromaStore(collection="internal_docs", persist_directory="./db"),
embedder=OpenAIEmbedder(model="text-embedding-3-large"),
llm_kit=gpt.Chat(model="gpt-4o"),
)
rag.ingest_dir("./knowledge_base", pattern="**/*")
response = rag.query("What is our production incident escalation policy?", top_k=5)
print(response.answer)
P1 incidents must be acknowledged within 5 minutes, incident commander assigned immediately, and stakeholder updates posted every 15 minutes until resolution.
4) Video Learning Intelligence
Business goal: Convert training recordings into searchable, reusable knowledge.
Typical stack:
VideoProcessorPipelinefor frame dedup + transcription + visual analysis.Generate sectioned summaries for quick comprehension.
Store outputs in RAG for downstream question answering.
from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import VideoProcessorPipeline, TranscriberBackend
video = VideoProcessorPipeline(
kit=gpt.Chat(model="gpt-4o"),
transcriber=TranscriberBackend.FASTER_WHISPER,
generate_summary=True,
safe_mode=True,
)
report = video.run("onboarding_session.mp4")
print(report.summary)
print("Sections:", len(report.sections))
This onboarding covers architecture basics, deployment flow, and incident response ownership.
Sections: 6
5) Agentic Operations Automation
Business goal: Automate multi-step tasks (fetch data, reason, call tools, return final action plan).
Typical stack:
AgentPipelinewith approved tool set (SQL, HTTP, RAG, custom Python tools).max_stepsandsafe_modefor bounded execution.Optional Celery for background and retry-safe execution.
from ractogateway import openai_developer_kit as gpt
from ractogateway.pipelines import AgentPipeline
def get_inventory(sku: str) -> str:
return f"SKU {sku}: 42 units in stock"
agent = AgentPipeline(
kit=gpt.Chat(model="gpt-4o-mini"),
tools=[get_inventory],
max_steps=6,
safe_mode=True,
)
result = agent.run("Check stock for SKU-4481 and suggest reorder action.")
print(result.final_answer)
print("Stop reason:", result.stop_reason)
SKU-4481 has 42 units in stock. Recommend reorder trigger at 20 units with a purchase order draft prepared now.
Stop reason: finish_tool
6) Cost-Controlled Multi-Provider Delivery
Business goal: Keep quality high while controlling spend and avoiding vendor lock-in.
Typical stack:
Start with one prompt contract (
RactoPrompt).Switch provider kits without changing business logic.
Add
ExactMatchCache,SemanticCache, and routing for cost-performance balance.Use batch APIs for large offline workloads.
Result: one codebase, provider flexibility, and predictable cost envelopes as usage scales.
Documentation Paths
New to the library: start with Installation and Quick Start.
Building assistants and APIs: see Developer Kits, Prompt Engine, and Tools.
Building retrieval systems: see RAG, Embeddings, and Pipelines.
Running in production: see Cache, Routing, Redis, Celery, Kafka, and MCP.
Improving LLM discoverability: see LLM Discovery Guide and root files
llms.txt,llms-full.txt, androbots.txt.
Getting Started
User Guide
- RactoGateway — Complete User Guide
- Table of Contents
- 1. Jargon Buster
- 2. What is RactoGateway?
- 3. Installation
- 4. Core Mental Model
- 5. RactoPrompt
- 6. Developer Kits
- 7. Your First Chat
- 8. ChatConfig
- 9. Structured Output
- 10. Multi-Turn Conversations
- 11. Streaming
- 12. Tool Calling
- 13. File Attachments
- 14. Embeddings
- 15. Performance & Cost Optimisation
- 16. All Five Developer Kits
- 17. RAG — Retrieval-Augmented Generation
- 18. Redis — Production Infrastructure
- 19. Common Mistakes & How to Fix Them
- 19. Telemetry & Observability
- 20. Prebuilt Pipelines — Production Workflows
- 21. Chain of Thought Reasoning
- 22. Native Thinking / Extended Reasoning
- 23. PageIndexRAG — Vectorless RAG
- Quick Reference Card
- LLM Discovery Guide
- Prompt Engine
- Developer Kits
- Ollama — Local Model Inference
- HuggingFace — Cloud and Local Inference
- Streaming
- Tool Calling
- Embeddings
- Chain of Thoughts
- Native Thinking
- Supported models
- Quick start — Anthropic (streaming)
- Quick start — Anthropic (non-streaming)
- Quick start — Google Gemini (streaming)
- Quick start — OpenAI o-series (reasoning token count)
- Controlling the thinking budget
- Reading the output
- Combining with
chain_of_thought - Practical: render thinking in a terminal
- Tips
- Fine-Tuning
- RAG — Retrieval-Augmented Generation
- Prebuilt Pipelines
- Batch Processing
- Caching
- Cost-Aware Routing
- Token Truncation
- MCP — Model Context Protocol
- Redis
- Celery
- Kafka
API Reference
- API Reference
- Complete Module Reference
- ractogateway
- ractogateway._cot
- ractogateway._models
- ractogateway._models.chat
- ractogateway._models.embedding
- ractogateway._models.stream
- ractogateway._tool_runtime
- ractogateway._validation
- ractogateway._version
- ractogateway.adapters
- ractogateway.adapters._openai_schema
- ractogateway.adapters.anthropic_kit
- ractogateway.adapters.base
- ractogateway.adapters.google_kit
- ractogateway.adapters.huggingface_kit
- ractogateway.adapters.ollama_kit
- ractogateway.adapters.openai_kit
- ractogateway.anthropic_developer_kit
- ractogateway.anthropic_developer_kit.kit
- ractogateway.batch
- ractogateway.batch._models
- ractogateway.batch.anthropic_batch
- ractogateway.batch.openai_batch
- ractogateway.cache
- ractogateway.cache._models
- ractogateway.cache.exact_cache
- ractogateway.cache.semantic_cache
- ractogateway.celery
- ractogateway.celery._models
- ractogateway.celery.worker
- ractogateway.exceptions
- ractogateway.finetune
- ractogateway.finetune.anthropic_tuner
- ractogateway.finetune.dataset
- ractogateway.finetune.gemini_tuner
- ractogateway.finetune.openai_tuner
- ractogateway.gateway
- ractogateway.gateway.runner
- ractogateway.google_developer_kit
- ractogateway.google_developer_kit.kit
- ractogateway.huggingface_developer_kit
- ractogateway.huggingface_developer_kit.kit
- ractogateway.kafka
- ractogateway.kafka._models
- ractogateway.kafka.audit
- ractogateway.kafka.consumer
- ractogateway.kafka.producer
- ractogateway.kafka.stream
- ractogateway.mcp
- ractogateway.mcp._models
- ractogateway.mcp.agent
- ractogateway.mcp.client
- ractogateway.mcp.multi_client
- ractogateway.mcp.server
- ractogateway.ollama_developer_kit
- ractogateway.ollama_developer_kit.kit
- ractogateway.ollama_developer_kit.server
- ractogateway.openai_developer_kit
- ractogateway.openai_developer_kit.kit
- ractogateway.pipelines
- ractogateway.pipelines.agent
- ractogateway.pipelines.agent._executor
- ractogateway.pipelines.agent._models
- ractogateway.pipelines.agent.pipeline
- ractogateway.pipelines.list_classifier
- ractogateway.pipelines.list_classifier._models
- ractogateway.pipelines.list_classifier.pipeline
- ractogateway.pipelines.sql_analyst
- ractogateway.pipelines.sql_analyst._guard
- ractogateway.pipelines.sql_analyst._models
- ractogateway.pipelines.sql_analyst._schema
- ractogateway.pipelines.sql_analyst._viz
- ractogateway.pipelines.sql_analyst.pipeline
- ractogateway.pipelines.video_processor
- ractogateway.pipelines.video_processor._analyzer
- ractogateway.pipelines.video_processor._extractor
- ractogateway.pipelines.video_processor._loader
- ractogateway.pipelines.video_processor._models
- ractogateway.pipelines.video_processor._rag
- ractogateway.pipelines.video_processor._summarizer
- ractogateway.pipelines.video_processor._transcriber
- ractogateway.pipelines.video_processor.pipeline
- ractogateway.prompts
- ractogateway.prompts.engine
- ractogateway.rag
- ractogateway.rag._models
- ractogateway.rag._models.document
- ractogateway.rag._models.retrieval
- ractogateway.rag.chunkers
- ractogateway.rag.chunkers.base
- ractogateway.rag.chunkers.fixed_chunker
- ractogateway.rag.chunkers.recursive_chunker
- ractogateway.rag.chunkers.semantic_chunker
- ractogateway.rag.chunkers.sentence_chunker
- ractogateway.rag.embedders
- ractogateway.rag.embedders.base
- ractogateway.rag.embedders.google_embedder
- ractogateway.rag.embedders.openai_embedder
- ractogateway.rag.embedders.voyage_embedder
- ractogateway.rag.page_index
- ractogateway.rag.page_index._bm25
- ractogateway.rag.page_index._models
- ractogateway.rag.page_index._ocr
- ractogateway.rag.page_index.pipeline
- ractogateway.rag.pipeline
- ractogateway.rag.processors
- ractogateway.rag.processors.base
- ractogateway.rag.processors.cleaner
- ractogateway.rag.processors.lemmatizer
- ractogateway.rag.processors.pipeline
- ractogateway.rag.readers
- ractogateway.rag.readers.base
- ractogateway.rag.readers.html_reader
- ractogateway.rag.readers.image_reader
- ractogateway.rag.readers.pdf_reader
- ractogateway.rag.readers.registry
- ractogateway.rag.readers.spreadsheet_reader
- ractogateway.rag.readers.text_reader
- ractogateway.rag.readers.word_reader
- ractogateway.rag.stores
- ractogateway.rag.stores.base
- ractogateway.rag.stores.chroma_store
- ractogateway.rag.stores.faiss_store
- ractogateway.rag.stores.in_memory_store
- ractogateway.rag.stores.milvus_store
- ractogateway.rag.stores.pgvector_store
- ractogateway.rag.stores.pinecone_store
- ractogateway.rag.stores.qdrant_store
- ractogateway.rag.stores.weaviate_store
- ractogateway.redis
- ractogateway.redis._models
- ractogateway.redis.chat_memory
- ractogateway.redis.exact_cache
- ractogateway.redis.rate_limiter
- ractogateway.routing
- ractogateway.routing._models
- ractogateway.routing.router
- ractogateway.telemetry
- ractogateway.telemetry._models
- ractogateway.telemetry._pricing
- ractogateway.telemetry.metrics
- ractogateway.telemetry.prometheus_exporter
- ractogateway.telemetry.tracer
- ractogateway.tools
- ractogateway.tools.registry
- ractogateway.truncation
- ractogateway.truncation._models
- ractogateway.truncation.truncator
- Prompt Engine
- Tool Registry
- Adapters
- Gateway
- OpenAI Developer Kit
- Google Developer Kit
- Anthropic Developer Kit
- Fine-Tuning
- Chain of Thoughts API
- Native Thinking API Reference
- RAG Pipeline
- API Reference — PageIndexRAG
- Cache
- Routing
- Truncation
- Batch Processing
- MCP — Model Context Protocol
- Redis
- Celery
- Kafka
- API Reference — Telemetry
- API Reference — Video Processor Pipeline
- API Reference — AgentPipeline
- Exports
- Tool factories (for advanced use)
- Install
AgentPipelineAsyncAgentPipelineAgentResultAgentStepAgentUsageStopReasonAgentRateLimitExceededErrorToolExecutormake_finish_tool()make_http_tool()make_memory_tools()make_rag_tool()make_rag_tool_async()make_sql_tool()StopReasonAgentUsageAgentStepAgentResultAgentRateLimitExceededErrorToolExecutormake_finish_tool()make_rag_tool()make_rag_tool_async()make_sql_tool()make_http_tool()make_memory_tools()AgentPipelineAsyncAgentPipeline
- Complete Module Reference