ractogateway.google_developer_kit.kit

Google Gemini Developer Kit — production-grade Gemini interface.

Usage:

from ractogateway import google_developer_kit as god

kit = god.GoogleDeveloperKit(model="gemini-2.0-flash", default_prompt=my_prompt)
response = kit.chat(god.ChatConfig(user_message="Hello"))

for chunk in kit.stream(god.ChatConfig(user_message="Hello")):
    print(chunk.delta.text, end="", flush=True)
class ractogateway.google_developer_kit.kit.GoogleDeveloperKit(model='gemini-2.0-flash', *, api_key=None, embedding_model='text-embedding-004', default_prompt=None, exact_cache=None, semantic_cache=None, router=None, truncator=None, tracer=None, metrics=None)[source]

Bases: object

Complete Google Gemini developer kit — chat, stream, embeddings, and optional performance/cost optimisation middleware.

Parameters:
  • model (str) – Gemini model (e.g. "gemini-2.0-flash", "gemini-2.5-pro"). Use "auto" when a CostAwareRouter is provided — the router will select the model per-request.

  • api_key (str | None) – Gemini API key. Falls back to GEMINI_API_KEY env var.

  • embedding_model (str) – Default embedding model. Defaults to "text-embedding-004".

  • default_prompt (RactoPrompt | None) – RACTO prompt used when ChatConfig.prompt is None.

  • exact_cache (ExactMatchCache | None) – Optional ExactMatchCache.

  • semantic_cache (SemanticCache | None) – Optional SemanticCache.

  • router (CostAwareRouter | None) – Optional CostAwareRouter. Required when model="auto".

  • truncator (TokenTruncator | None) – Optional TokenTruncator.

  • tracer (RactoTracer | None) – Optional RactoTracer. Emits OpenTelemetry spans for every chat, stream, and embed call. Requires pip install ractogateway[telemetry].

  • metrics (GatewayMetricsMiddleware | None) – Optional GatewayMetricsMiddleware. Records Prometheus metrics (latency, tokens, cost, cache hit/miss). Requires pip install ractogateway[prometheus].

provider: str = 'google'
chat(config)[source]

Synchronous chat completion with optional middleware pipeline.

Middleware order: truncate → exact cache → semantic cache → route model → API call → write caches → record telemetry.

Return type:

LLMResponse

async achat(config)[source]

Async chat completion with optional middleware pipeline.

Return type:

LLMResponse

stream(config)[source]

Synchronous streaming via generate_content_stream.

Example:

for chunk in kit.stream(config):
    print(chunk.delta.text, end="", flush=True)
Return type:

Iterator[StreamChunk]

async astream(config)[source]

Async streaming via aio.models.generate_content_stream.

Return type:

AsyncIterator[StreamChunk]

embed(config)[source]

Synchronous embedding via embed_content.

Return type:

EmbeddingResponse

async aembed(config)[source]

Async embedding via aio.models.embed_content.

Return type:

EmbeddingResponse