ractogateway.telemetry.metrics
GatewayMetricsMiddleware — Prometheus metrics for RactoGateway.
Pass a GatewayMetricsMiddleware instance as metrics= to any developer
kit to collect per-request Prometheus metrics.
Requires: pip install ractogateway[prometheus]
Metrics exposed
ractogateway_requests_total{provider,model,operation,status}— Counterractogateway_request_duration_seconds{provider,model,operation}— Histogramractogateway_tokens_total{provider,model,token_type}— Counterractogateway_cost_usd_total{provider,model}— Counterractogateway_cache_hits_total{cache_type}— Counterractogateway_cache_misses_total{cache_type}— Counterractogateway_tool_calls_total{tool_name}— Counter
Example:
from ractogateway import openai_developer_kit as opd
from ractogateway.telemetry import GatewayMetricsMiddleware, PrometheusExporter
metrics = GatewayMetricsMiddleware()
exporter = PrometheusExporter(port=8000)
exporter.start()
kit = opd.OpenAIDeveloperKit(
model="gpt-4o",
default_prompt=my_prompt,
metrics=metrics,
)
response = kit.chat(opd.ChatConfig(user_message="Hello"))
# Scrape http://localhost:8000/metrics in Prometheus.
- class ractogateway.telemetry.metrics.GatewayMetricsMiddleware(*, price_table=None, registry=None)[source]
Bases:
objectPrometheus metrics middleware — pass as
metrics=to any developer kit.A single instance can be shared across multiple kits (different providers) to aggregate metrics in one registry.
- Parameters:
price_table (
dict[str,ModelPricing] |None) – Override or extend the built-in pricing table used for theractogateway_cost_usd_totalcounter.registry (
Any|None) – Customprometheus_client.CollectorRegistry. Defaults to the globalREGISTRY(which also includes default Python metrics). Passprometheus_client.CollectorRegistry()to get an isolated registry — useful in tests.Requires (
pip install ractogateway[prometheus])
- record_request(*, provider, model, operation, status, latency_s, input_tokens=0, output_tokens=0, tool_calls=None)[source]
Record metrics for a completed LLM request.
- Parameters:
provider (
str) – Provider string ("openai","google","anthropic").model (
str) – Model identifier (e.g."gpt-4o").operation (
str) –"chat","stream", or"embed".status (
str) –"ok"or"error".latency_s (
float) – Request wall-clock latency in seconds.input_tokens (
int) – Prompt tokens consumed (0for cache hits or errors).output_tokens (
int) – Completion tokens produced (0for cache hits or errors).tool_calls (
list[Any] |None) – List ofToolCallResultobjects from the response. Used to updateractogateway_tool_calls_total.
- Return type:
- record_cache_hit(cache_type)[source]
Increment the cache-hits counter.
- record_cache_miss(cache_type)[source]
Increment the cache-misses counter.