ractogateway.openai_developer_kit.kit

OpenAI Developer Kit — production-grade OpenAI interface.

Usage:

from ractogateway import openai_developer_kit as opd

kit = opd.OpenAIDeveloperKit(model="gpt-4o", default_prompt=my_prompt)
response = kit.chat(opd.ChatConfig(user_message="Hello"))

for chunk in kit.stream(opd.ChatConfig(user_message="Hello")):
    print(chunk.delta.text, end="", flush=True)

class ractogateway.openai_developer_kit.kit.OpenAIDeveloperKit(model='gpt-4o', *, api_key=None, base_url=None, embedding_model='text-embedding-3-small', default_prompt=None, exact_cache=None, semantic_cache=None, router=None, truncator=None, tracer=None, metrics=None)[source]

Bases: object

Complete OpenAI developer kit — chat, stream, embeddings, and optional performance/cost optimisation middleware.

Parameters:

model (str) – Chat model (e.g. "gpt-4o", "gpt-4o-mini"). Use "auto" when a CostAwareRouter is provided — the router will select the model per-request.
api_key (str | None) – OpenAI API key. Falls back to OPENAI_API_KEY env var.
base_url (str | None) – Custom base URL (Azure OpenAI or proxy).
embedding_model (str) – Default embedding model. Defaults to "text-embedding-3-small".
default_prompt (RactoPrompt | None) – RACTO prompt used when ChatConfig.prompt is None.
exact_cache (ExactMatchCache | None) – Optional ExactMatchCache. Serves byte-identical requests from memory at zero cost.
semantic_cache (SemanticCache | None) – Optional SemanticCache. Returns cached answers for semantically similar queries (similarity ≥ threshold).
router (CostAwareRouter | None) – Optional CostAwareRouter. Selects the cheapest model that can handle each request’s complexity. Required when model="auto".
truncator (TokenTruncator | None) – Optional TokenTruncator. Automatically trims conversation history to fit the model’s context window before each API call.
tracer (RactoTracer | None) – Optional RactoTracer. Emits OpenTelemetry spans for every chat, stream, and embed call. Requires pip install ractogateway[telemetry].
metrics (GatewayMetricsMiddleware | None) – Optional GatewayMetricsMiddleware. Records Prometheus metrics (latency, tokens, cost, cache hit/miss). Requires pip install ractogateway[prometheus].

provider: str = 'openai'

chat(config)[source]

Synchronous chat completion with optional middleware pipeline.

Middleware order: truncate → exact cache → semantic cache → route model → API call → write caches → record telemetry.

Return type:: LLMResponse

async achat(config)[source]

Async chat completion with optional middleware pipeline.

Return type:: LLMResponse

stream(config)[source]

Synchronous streaming — yields StreamChunk objects.

Example:

for chunk in kit.stream(config):
    print(chunk.delta.text, end="", flush=True)
    if chunk.is_final:
        print(f"\nTokens: {chunk.usage}")

Return type:: Iterator[StreamChunk]

async astream(config)[source]

Async streaming — yields StreamChunk objects.

Return type:: AsyncIterator[StreamChunk]

embed(config)[source]

Synchronous embedding.

Return type:: EmbeddingResponse

async aembed(config)[source]

Async embedding.

Return type:: EmbeddingResponse