ractogateway.openai_developer_kit.kit
OpenAI Developer Kit — production-grade OpenAI interface.
Usage:
from ractogateway import openai_developer_kit as opd
kit = opd.OpenAIDeveloperKit(model="gpt-4o", default_prompt=my_prompt)
response = kit.chat(opd.ChatConfig(user_message="Hello"))
for chunk in kit.stream(opd.ChatConfig(user_message="Hello")):
print(chunk.delta.text, end="", flush=True)
- class ractogateway.openai_developer_kit.kit.OpenAIDeveloperKit(model='gpt-4o', *, api_key=None, base_url=None, embedding_model='text-embedding-3-small', default_prompt=None, exact_cache=None, semantic_cache=None, router=None, truncator=None, tracer=None, metrics=None)[source]
Bases:
objectComplete OpenAI developer kit — chat, stream, embeddings, and optional performance/cost optimisation middleware.
- Parameters:
model (
str) – Chat model (e.g."gpt-4o","gpt-4o-mini"). Use"auto"when aCostAwareRouteris provided — the router will select the model per-request.api_key (
str|None) – OpenAI API key. Falls back toOPENAI_API_KEYenv var.base_url (
str|None) – Custom base URL (Azure OpenAI or proxy).embedding_model (
str) – Default embedding model. Defaults to"text-embedding-3-small".default_prompt (
RactoPrompt|None) – RACTO prompt used whenChatConfig.promptisNone.exact_cache (
ExactMatchCache|None) – OptionalExactMatchCache. Serves byte-identical requests from memory at zero cost.semantic_cache (
SemanticCache|None) – OptionalSemanticCache. Returns cached answers for semantically similar queries (similarity ≥ threshold).router (
CostAwareRouter|None) – OptionalCostAwareRouter. Selects the cheapest model that can handle each request’s complexity. Required whenmodel="auto".truncator (
TokenTruncator|None) – OptionalTokenTruncator. Automatically trims conversation history to fit the model’s context window before each API call.tracer (
RactoTracer|None) – OptionalRactoTracer. Emits OpenTelemetry spans for every chat, stream, and embed call. Requirespip install ractogateway[telemetry].metrics (
GatewayMetricsMiddleware|None) – OptionalGatewayMetricsMiddleware. Records Prometheus metrics (latency, tokens, cost, cache hit/miss). Requirespip install ractogateway[prometheus].
- provider: str = 'openai'
- chat(config)[source]
Synchronous chat completion with optional middleware pipeline.
Middleware order: truncate → exact cache → semantic cache → route model → API call → write caches → record telemetry.
- Return type:
- async achat(config)[source]
Async chat completion with optional middleware pipeline.
- Return type:
- stream(config)[source]
Synchronous streaming — yields
StreamChunkobjects.Example:
for chunk in kit.stream(config): print(chunk.delta.text, end="", flush=True) if chunk.is_final: print(f"\nTokens: {chunk.usage}")
- Return type:
Iterator[StreamChunk]
- async astream(config)[source]
Async streaming — yields
StreamChunkobjects.- Return type:
AsyncIterator[StreamChunk]
- embed(config)[source]
Synchronous embedding.
- Return type:
EmbeddingResponse
- async aembed(config)[source]
Async embedding.
- Return type:
EmbeddingResponse