ractogateway.openai_developer_kit
OpenAI Developer Kit — from ractogateway import openai_developer_kit as gpt.
Short usage:
from ractogateway import openai_developer_kit as gpt
kit = gpt.Chat(model="gpt-4o") # short alias
kit = gpt.OpenAIDeveloperKit(model="gpt-4o") # full name (same class)
- class ractogateway.openai_developer_kit.BatchItem(**data)[source]
Bases:
BaseModelA single request within a batch job.
- Parameters:
custom_id (str) – User-supplied identifier used to correlate results. Must be unique within a batch.
user_message (str) – The end-user’s query string (equivalent to
ChatConfig.user_message).temperature (float) – Sampling temperature. Defaults to
0.0.max_tokens (int) – Maximum tokens for the completion. Defaults to
4096.extra (dict[str, Any]) – Provider-specific pass-through kwargs.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- custom_id: str
- user_message: str
- temperature: float
- max_tokens: int
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.BatchJobInfo(**data)[source]
Bases:
BaseModelMetadata about a submitted batch job.
Returned by
submit_batch()andpoll_status().Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- job_id: str
- provider: str
- status: BatchStatus
- created_at: float
- request_count: int
- raw: Any
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.BatchResult(**data)[source]
Bases:
BaseModelThe outcome of a single
BatchItem.A result is always present in the
resultslist returned byget_results(); checkerrorto detect failures.Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- custom_id: str
- response: LLMResponse | None
- raw: Any
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property ok: bool
Truewhen the request succeeded (no error, response present).
- class ractogateway.openai_developer_kit.BatchStatus(*values)[source]
-
Processing state of a batch job.
Maps to the union of OpenAI and Anthropic batch status strings.
- PENDING = 'pending'
- IN_PROGRESS = 'in_progress'
- FINALIZING = 'finalizing'
- COMPLETED = 'completed'
- FAILED = 'failed'
- EXPIRED = 'expired'
- CANCELLING = 'cancelling'
- CANCELLED = 'cancelled'
- ractogateway.openai_developer_kit.Chat
Short alias —
gpt.Chat(model="gpt-4o")is identical togpt.OpenAIDeveloperKit(...).
- class ractogateway.openai_developer_kit.ChatConfig(**data)[source]
Bases:
BaseModelValidated input for every
chat/achat/stream/astreamcall.Pass a single
ChatConfigto any developer-kit method. Every field has a safe default so you only need to supply what you actually need.Minimal example:
config = ChatConfig(user_message="Explain Python generators.") response = kit.chat(config)
Vision / multimodal example:
from ractogateway.prompts.engine import RactoFile config = ChatConfig( user_message="Describe this chart.", attachments=[RactoFile.from_path("sales_q4.png")], )
Structured JSON output example:
class Sentiment(BaseModel): label: str score: float config = ChatConfig( user_message="I love this library!", response_model=Sentiment, )
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- user_message: str
- prompt: RactoPrompt | None
- temperature: float
- max_tokens: int
- tools: ToolRegistry | None
- auto_execute_tools: bool
- max_tool_turns: int
- max_validation_retries: int
- history: list[Message]
- chain_of_thought: bool
- native_thinking: bool
- thinking_budget: int
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.CostAwareRouter(tiers)[source]
Bases:
objectRoutes LLM requests to the appropriate model tier based on message complexity — without making any extra API calls.
- Parameters:
tiers (
list[RoutingTier]) – Ordered list ofRoutingTierobjects, sorted ascending bymax_score(cheapest first). The last tier’smax_scoreshould be100to act as fallback.- Raises:
ValueError – If
tiersis empty or not sorted ascending bymax_score.Example — 3-tier OpenAI ladder:: – from ractogateway.routing import CostAwareRouter, RoutingTier router = CostAwareRouter([ RoutingTier(model=”gpt-4o-mini”, max_score=30), RoutingTier(model=”gpt-4o”, max_score=70), RoutingTier(model=”o3-mini”, max_score=100), ]) model = router.route(“What is 2+2?”) # → “gpt-4o-mini” model = router.route(“Analyze the trade-offs between Redis Cluster and ” “Cassandra for a write-heavy time-series workload …”) # → “o3-mini”
Example — binary routing (2 tiers):: – router = CostAwareRouter([ RoutingTier(model=”claude-haiku-4-5-20251001”, max_score=40), RoutingTier(model=”claude-opus-4-6”, max_score=100), ])
- score(text)[source]
Compute a complexity score in [0, 100] for text.
A higher score means a more complex task.
- Return type:
Algorithm
token_pts = min(len(text)//4, SAT) * (MAX_TP / SAT) kw_pts = min(matches * PPK, MAX_KP) score = clamp(token_pts + kw_pts, 0, 100)
- route(text)[source]
Return the model identifier for text.
Walks tiers (cheapest first) and returns the first model whose
max_score ≥ complexity_score. Always returns a model because the last tier hasmax_score == 100(validated at construction).Complexity: O(k) where k = number of tiers.
- Return type:
- property tiers: tuple[RoutingTier, ...]
Immutable view of the configured tiers.
- class ractogateway.openai_developer_kit.EmbeddingConfig(**data)[source]
Bases:
BaseModelValidated input for
embed/aembedcalls.Example:
config = EmbeddingConfig(texts=["Hello world", "Goodbye world"]) response = kit.embed(config)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.EmbeddingResponse(**data)[source]
Bases:
BaseModelUnified response from an embedding call.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- vectors: list[EmbeddingVector]
- model: str
- raw: Any
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.EmbeddingVector(**data)[source]
Bases:
BaseModelA single embedding result.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- index: int
- text: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.ExactMatchCache(max_size=1024, ttl_seconds=None)[source]
Bases:
objectUltra-low-latency key-value cache for identical LLM requests.
- Parameters:
max_size (
int) – LRU capacity.0= unlimited (no eviction).ttl_seconds (
float|None) – Entries older than ttl_seconds are treated as misses and transparently evicted.Nonedisables expiry.Example:: –
from ractogateway.cache import ExactMatchCache
cache = ExactMatchCache(max_size=512, ttl_seconds=3600)
# Wire into a kit: kit = OpenAIDeveloperKit(model=”gpt-4o”, exact_cache=cache)
- get(user_message, system_prompt, model, temperature, max_tokens)[source]
Return a cached response or
Noneon a miss.O(1) — dictionary lookup + optional move-to-end.
- Return type:
- put(user_message, system_prompt, model, temperature, max_tokens, response)[source]
Store a response. Evicts LRU entry when at capacity.
O(1) amortised — dictionary insert + optional popitem(last=False).
- Return type:
- invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]
Remove a specific entry. Returns
Trueif it was present.- Return type:
- property stats: CacheStats
Return a snapshot of hit/miss/size counters.
- class ractogateway.openai_developer_kit.FinishReason(*values)[source]
-
Why the model stopped generating.
- STOP = 'stop'
- TOOL_CALL = 'tool_call'
- LENGTH = 'length'
- CONTENT_FILTER = 'content_filter'
- ERROR = 'error'
- class ractogateway.openai_developer_kit.LLMResponse(**data)[source]
Bases:
BaseModelUnified, provider-agnostic response envelope.
Every adapter’s
run()method returns one of these, regardless of whether the underlying provider is OpenAI, Gemini, or Anthropic.Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- tool_calls: list[ToolCallResult]
- finish_reason: FinishReason
- raw: Any
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.Message(**data)[source]
Bases:
BaseModelA single conversation turn.
Used inside
ChatConfig.historyto provide prior conversation context to the model for multi-turn conversations.Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- role: MessageRole
- content: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.MessageRole(*values)[source]
-
Role of a single message in a conversation.
- SYSTEM = 'system'
- USER = 'user'
- ASSISTANT = 'assistant'
- class ractogateway.openai_developer_kit.OpenAIBatchProcessor(model='gpt-4o-mini', *, api_key=None, base_url=None, default_prompt=None)[source]
Bases:
objectSubmit thousands of chat-completion requests to OpenAI’s Batch API at ~50 % of standard API cost.
- Parameters:
model (
str) – Chat model to use for all items in a batch (e.g."gpt-4o-mini").api_key (
str|None) – OpenAI API key. Falls back toOPENAI_API_KEYenv var.base_url (
str|None) – Custom base URL (Azure OpenAI / proxy).default_prompt (
RactoPrompt|None) – RACTO prompt used as the system message for every batch item.
- submit_batch / asubmit_batch:
Upload JSONL and create batch job → returns
BatchJobInfo.
- poll_status / apoll_status:
Fetch current job state → returns updated
BatchJobInfo.
- get_results / aget_results:
Download and parse completed job results →
list[BatchResult].
- submit_and_wait / asubmit_and_wait:
Convenience: submit + poll until done + return results.
- provider: str = 'openai'
- submit_batch(items, *, prompt=None, completion_window='24h')[source]
Upload items as a JSONL file and create an OpenAI batch job.
Returns immediately with a
BatchJobInfo(status = IN_PROGRESS).- Return type:
- poll_status(job_id)[source]
Fetch the current status of batch job job_id.
- Return type:
- get_results(job_id)[source]
Download and parse results for a completed batch job.
- Raises:
RuntimeError – If the job is not yet completed.
- Return type:
- submit_and_wait(items, *, prompt=None, completion_window='24h', poll_interval_s=60.0, max_wait_s=86400.0)[source]
Submit a batch and block until it completes, then return results.
- Parameters:
- Raises:
TimeoutError – If the batch does not complete within max_wait_s.
RuntimeError – If the batch job fails or is cancelled.
- Return type:
- async asubmit_batch(items, *, prompt=None, completion_window='24h')[source]
Async variant of
submit_batch().- Return type:
- async apoll_status(job_id)[source]
Async variant of
poll_status().- Return type:
- async aget_results(job_id)[source]
Async variant of
get_results().- Return type:
- async asubmit_and_wait(items, *, prompt=None, completion_window='24h', poll_interval_s=60.0, max_wait_s=86400.0)[source]
Async variant of
submit_and_wait().- Return type:
- class ractogateway.openai_developer_kit.OpenAIDeveloperKit(model='gpt-4o', *, api_key=None, base_url=None, embedding_model='text-embedding-3-small', default_prompt=None, exact_cache=None, semantic_cache=None, router=None, truncator=None, tracer=None, metrics=None)[source]
Bases:
objectComplete OpenAI developer kit — chat, stream, embeddings, and optional performance/cost optimisation middleware.
- Parameters:
model (
str) – Chat model (e.g."gpt-4o","gpt-4o-mini"). Use"auto"when aCostAwareRouteris provided — the router will select the model per-request.api_key (
str|None) – OpenAI API key. Falls back toOPENAI_API_KEYenv var.base_url (
str|None) – Custom base URL (Azure OpenAI or proxy).embedding_model (
str) – Default embedding model. Defaults to"text-embedding-3-small".default_prompt (
RactoPrompt|None) – RACTO prompt used whenChatConfig.promptisNone.exact_cache (
ExactMatchCache|None) – OptionalExactMatchCache. Serves byte-identical requests from memory at zero cost.semantic_cache (
SemanticCache|None) – OptionalSemanticCache. Returns cached answers for semantically similar queries (similarity ≥ threshold).router (
CostAwareRouter|None) – OptionalCostAwareRouter. Selects the cheapest model that can handle each request’s complexity. Required whenmodel="auto".truncator (
TokenTruncator|None) – OptionalTokenTruncator. Automatically trims conversation history to fit the model’s context window before each API call.tracer (
RactoTracer|None) – OptionalRactoTracer. Emits OpenTelemetry spans for every chat, stream, and embed call. Requirespip install ractogateway[telemetry].metrics (
GatewayMetricsMiddleware|None) – OptionalGatewayMetricsMiddleware. Records Prometheus metrics (latency, tokens, cost, cache hit/miss). Requirespip install ractogateway[prometheus].
- provider: str = 'openai'
- chat(config)[source]
Synchronous chat completion with optional middleware pipeline.
Middleware order: truncate → exact cache → semantic cache → route model → API call → write caches → record telemetry.
- Return type:
- async achat(config)[source]
Async chat completion with optional middleware pipeline.
- Return type:
- stream(config)[source]
Synchronous streaming — yields
StreamChunkobjects.Example:
for chunk in kit.stream(config): print(chunk.delta.text, end="", flush=True) if chunk.is_final: print(f"\nTokens: {chunk.usage}")
- Return type:
Iterator[StreamChunk]
- async astream(config)[source]
Async streaming — yields
StreamChunkobjects.- Return type:
AsyncIterator[StreamChunk]
- embed(config)[source]
Synchronous embedding.
- Return type:
EmbeddingResponse
- async aembed(config)[source]
Async embedding.
- Return type:
EmbeddingResponse
- class ractogateway.openai_developer_kit.RoutingTier(**data)[source]
Bases:
BaseModelOne tier in the cost-aware routing ladder.
The router evaluates a complexity score (0-100) for each incoming message and selects the first tier whose
max_scoreis >= that score. The last tier in the list always acts as the catch-all fallback.- Parameters:
model (str) – The LLM model identifier to use for requests that fall in this tier (e.g.
"gpt-4o-mini","gemini-2.0-flash","claude-haiku-4-5-20251001").max_score (float) – Inclusive upper bound on the complexity score that routes to this model. Range: 0-100. Set to
100for the last (most powerful) tier so it catches everything.
Examples
tiers = [ RoutingTier(model="gpt-4o-mini", max_score=30), RoutingTier(model="gpt-4o", max_score=70), RoutingTier(model="o3-mini", max_score=100), ]
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model: str
- max_score: float
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.SemanticCache(embed_fn, similarity_threshold=0.95, max_size=512, ttl_seconds=None)[source]
Bases:
objectVector-similarity cache — returns cached answers for semantically similar queries, costing $0 in API calls.
- Parameters:
embed_fn (
Callable[[str],list[float]]) – Any callable(text: str) -> list[float]. Called once per new query (cache miss) and once atput()time.similarity_threshold (
float) – Minimum cosine similarity to declare a hit. Default0.95is intentionally strict to avoid incorrect responses.max_size (
int) – Maximum number of entries (LRU eviction).0= unlimited.ttl_seconds (
float|None) – Optional per-entry TTL.Nonedisables expiry.
Examples
import ractogateway.openai_developer_kit as gpt kit = gpt.OpenAIDeveloperKit(model="gpt-4o") def embed(text: str) -> list[float]: import openai r = openai.OpenAI().embeddings.create( model="text-embedding-3-small", input=text ) return r.data[0].embedding cache = SemanticCache(embed_fn=embed, similarity_threshold=0.95)
- get(query)[source]
Embed query and return a cached response if cosine-sim ≥ threshold.
Returns
Noneon a cache miss (caller should make the real API call and then invokeput()).Complexity: O(n·d) where n = number of entries, d = embedding dim.
- Return type:
- put(query, response)[source]
Embed query and store response for future similar queries.
Evicts LRU entry when at capacity.
- Return type:
- property stats: CacheStats
Return a snapshot of hit/miss/size counters.
- class ractogateway.openai_developer_kit.StreamChunk(**data)[source]
Bases:
BaseModelA single piece of a streaming response.
Consumers iterate over
StreamChunkobjects — they never touch raw provider events directly.- delta
The incremental content for this chunk.
- accumulated_text
Running concatenation of all
delta.textvalues so far.
- finish_reason
Nonefor intermediate chunks; set on the final chunk.
- tool_calls
Empty until the final chunk (
is_final=True).
- usage
Token counts — populated on the final chunk only.
- is_final
Trueonly for the very last chunk in the stream.
- raw
The underlying provider event (escape-hatch for advanced users).
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- delta: StreamDelta
- accumulated_text: str
- accumulated_thinking: str
- is_thinking: bool
- finish_reason: FinishReason | None
- tool_calls: list[ToolCallResult]
- is_final: bool
- raw: Any
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.StreamDelta(**data)[source]
Bases:
BaseModelIncremental content produced by a single streaming event.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- text: str
- thinking: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.ToolCallResult(**data)[source]
Bases:
BaseModelA single tool/function call returned by the model.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- id: str
- name: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.openai_developer_kit.TokenTruncator(config=None)[source]
Bases:
objectSmart conversation-history trimmer.
- Parameters:
config (
TruncationConfig|None) –TruncationConfiginstance. If omitted a default config is used (approximate counter, 8 k limit).
Examples
from ractogateway.truncation import TokenTruncator, TruncationConfig import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") truncator = TokenTruncator( TruncationConfig( token_counter=lambda t: len(enc.encode(t)), keep_first_n=2, keep_last_n=8, ) ) kit = OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)
- truncate(chat_config, model)[source]
Return a copy of chat_config with trimmed history if necessary.
If the total estimated token count (system prompt + history + user_message) fits within the model’s context limit, the original
ChatConfigis returned unchanged.- Parameters:
chat_config (
ChatConfig) – The chat configuration to potentially truncate.model (
str) – The resolved model name used to look up the context-window limit.
- Return type:
ChatConfig- Returns:
ChatConfig – A new
ChatConfiginstance with (possibly shorter) history. Theuser_messageand all other fields are preserved verbatim.
- class ractogateway.openai_developer_kit.TruncationConfig(**data)[source]
Bases:
BaseModelConfiguration for
TokenTruncator.- Parameters:
max_context_tokens (int | None) – Hard cap on total prompt tokens before calling the API. When
None, the truncator looks up the model inMODEL_CONTEXT_LIMITS(falling back to8 192).keep_first_n (int) – Number of history messages to always preserve from the start of the conversation (anchors context). Defaults to
2.keep_last_n (int) – Number of history messages to always preserve from the most recent end of the conversation. Defaults to
6.token_counter (Callable[[str], int]) –
Callable
(text: str) -> int. Defaults to the built-in approximate counter (len // 4). Swap fortiktokenfor exact OpenAI token counts:import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") config = TruncationConfig(token_counter=lambda t: len(enc.encode(t)))
safety_margin (int) – Extra token budget reserved beyond the system prompt and user message. Defaults to
512.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- keep_first_n: int
- keep_last_n: int
- safety_margin: int
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resolve_limit(model)[source]
Return the effective token limit for model.
Priority:
max_context_tokens→MODEL_CONTEXT_LIMITSlookup →_DEFAULT_CONTEXT.- Return type: