ractogateway.telemetry.metrics

GatewayMetricsMiddleware — Prometheus metrics for RactoGateway.

Pass a GatewayMetricsMiddleware instance as metrics= to any developer kit to collect per-request Prometheus metrics.

Requires: pip install ractogateway[prometheus]

Metrics exposed

  • ractogateway_requests_total{provider,model,operation,status} — Counter

  • ractogateway_request_duration_seconds{provider,model,operation} — Histogram

  • ractogateway_tokens_total{provider,model,token_type} — Counter

  • ractogateway_cost_usd_total{provider,model} — Counter

  • ractogateway_cache_hits_total{cache_type} — Counter

  • ractogateway_cache_misses_total{cache_type} — Counter

  • ractogateway_tool_calls_total{tool_name} — Counter

Example:

from ractogateway import openai_developer_kit as opd
from ractogateway.telemetry import GatewayMetricsMiddleware, PrometheusExporter

metrics = GatewayMetricsMiddleware()
exporter = PrometheusExporter(port=8000)
exporter.start()

kit = opd.OpenAIDeveloperKit(
    model="gpt-4o",
    default_prompt=my_prompt,
    metrics=metrics,
)
response = kit.chat(opd.ChatConfig(user_message="Hello"))
# Scrape http://localhost:8000/metrics in Prometheus.
class ractogateway.telemetry.metrics.GatewayMetricsMiddleware(*, price_table=None, registry=None)[source]

Bases: object

Prometheus metrics middleware — pass as metrics= to any developer kit.

A single instance can be shared across multiple kits (different providers) to aggregate metrics in one registry.

Parameters:
  • price_table (dict[str, ModelPricing] | None) – Override or extend the built-in pricing table used for the ractogateway_cost_usd_total counter.

  • registry (Any | None) – Custom prometheus_client.CollectorRegistry. Defaults to the global REGISTRY (which also includes default Python metrics). Pass prometheus_client.CollectorRegistry() to get an isolated registry — useful in tests.

  • Requires (pip install ractogateway[prometheus])

record_request(*, provider, model, operation, status, latency_s, input_tokens=0, output_tokens=0, tool_calls=None)[source]

Record metrics for a completed LLM request.

Parameters:
  • provider (str) – Provider string ("openai", "google", "anthropic").

  • model (str) – Model identifier (e.g. "gpt-4o").

  • operation (str) – "chat", "stream", or "embed".

  • status (str) – "ok" or "error".

  • latency_s (float) – Request wall-clock latency in seconds.

  • input_tokens (int) – Prompt tokens consumed (0 for cache hits or errors).

  • output_tokens (int) – Completion tokens produced (0 for cache hits or errors).

  • tool_calls (list[Any] | None) – List of ToolCallResult objects from the response. Used to update ractogateway_tool_calls_total.

Return type:

None

record_cache_hit(cache_type)[source]

Increment the cache-hits counter.

Parameters:

cache_type (str) – "exact" or "semantic".

Return type:

None

record_cache_miss(cache_type)[source]

Increment the cache-misses counter.

Parameters:

cache_type (str) – "exact" or "semantic".

Return type:

None

generate_latest()[source]

Return current metrics in Prometheus text exposition format.

Useful for testing without starting an HTTP server:

text = middleware.generate_latest()
assert "ractogateway_requests_total" in text
Return type:

str

Returns:

str – UTF-8 decoded Prometheus text format string.