# Telemetry & Observability RactoGateway ships a production-grade observability layer with **zero code changes** to your existing kit calls. Attach a `RactoTracer` and/or `GatewayMetricsMiddleware` to any developer kit and every LLM call is automatically instrumented. ## Installation ```bash # OpenTelemetry tracing only pip install "ractogateway[telemetry]" # Prometheus metrics only pip install "ractogateway[prometheus]" # Both (recommended for production) pip install "ractogateway[observability]" ``` ## Quick start ```python from ractogateway import openai_developer_kit as opd from ractogateway.telemetry import RactoTracer, GatewayMetricsMiddleware, PrometheusExporter from ractogateway.prompts.engine import RactoPrompt # --- Tracing --- tracer = RactoTracer( otlp_endpoint="http://localhost:4317", # Jaeger / Grafana Tempo gRPC console=True, # also print to stdout (dev) ) # --- Prometheus metrics --- metrics = GatewayMetricsMiddleware() PrometheusExporter(port=8000).start() # scrape http://localhost:8000/metrics prompt = RactoPrompt( context="You are a helpful assistant.", instructions="Answer the user's question.", output_format="Return a concise plain-text answer.", ) kit = opd.OpenAIDeveloperKit( model="gpt-4o", default_prompt=prompt, tracer=tracer, # attach tracer metrics=metrics, # attach metrics ) response = kit.chat(opd.ChatConfig(user_message="What is 2 + 2?")) # One OTEL span is now in your backend, one Prometheus data-point is recorded. ``` The same `tracer=` / `metrics=` parameters work on **all three provider kits**. --- ## RactoTracer — OpenTelemetry spans ### Constructor options | Parameter | Type | Default | Description | |---|---|---|---| | `service_name` | `str` | `"ractogateway"` | OTEL `service.name` resource attribute | | `otlp_endpoint` | `str \| None` | `None` | OTLP **gRPC** endpoint (e.g. Jaeger, Tempo) | | `otlp_http_endpoint` | `str \| None` | `None` | OTLP **HTTP** endpoint (e.g. Zipkin) | | `console` | `bool` | `False` | Print spans to stdout | | `in_memory` | `bool` | `False` | Capture spans in memory (for tests) | | `custom_exporter` | `SpanExporter \| None` | `None` | Any OTEL `SpanExporter` | | `price_table` | `dict[str, ModelPricing] \| None` | `None` | Override / extend built-in pricing | ### Span attributes Every span carries these OTEL attributes: | Attribute | Type | Description | |---|---|---| | `llm.provider` | `string` | `"openai"` / `"google"` / `"anthropic"` | | `llm.model` | `string` | Model identifier (e.g. `"gpt-4o"`) | | `llm.operation` | `string` | `"chat"` / `"stream"` / `"embed"` | | `llm.latency_ms` | `float` | Wall-clock time in milliseconds | | `llm.input_tokens` | `int` | Prompt tokens consumed | | `llm.output_tokens` | `int` | Completion tokens produced | | `llm.cost_usd` | `float` | Estimated USD cost (8 decimal places) | | `llm.cache_hit` | `string` | `"exact"` / `"semantic"` / `"miss"` | | `llm.tool_calls` | `int` | Number of tool calls in the response | | `llm.error_type` | `string` | Exception class name on error (omitted on success) | ### Exporting to Jaeger / Grafana Tempo ```python # gRPC (default OTLP port 4317) tracer = RactoTracer(otlp_endpoint="http://jaeger:4317") # HTTP (default OTLP port 4318) tracer = RactoTracer(otlp_http_endpoint="http://tempo:4318") ``` ### Using in unit tests Set `in_memory=True` and inspect `.spans` after each call — no external backend needed. ```python from ractogateway.telemetry import RactoTracer tracer = RactoTracer(in_memory=True) kit = opd.OpenAIDeveloperKit(model="gpt-4o", default_prompt=prompt, tracer=tracer) # ... make a (mocked) call ... assert len(tracer.spans) == 1 span = tracer.spans[0] assert span.provider == "openai" assert span.input_tokens > 0 assert span.cost_usd > 0 tracer.clear_spans() # reset between test cases ``` --- ## GatewayMetricsMiddleware — Prometheus metrics ### Metrics exposed | Metric | Type | Labels | Description | |---|---|---|---| | `ractogateway_requests_total` | Counter | `provider`, `model`, `operation`, `status` | Total LLM requests | | `ractogateway_request_duration_seconds` | Histogram | `provider`, `model`, `operation` | Wall-clock latency | | `ractogateway_tokens_total` | Counter | `provider`, `model`, `token_type` | Token consumption | | `ractogateway_cost_usd_total` | Counter | `provider`, `model` | Estimated USD cost | | `ractogateway_cache_hits_total` | Counter | `cache_type` | Cache hits by type | | `ractogateway_cache_misses_total` | Counter | `cache_type` | Cache misses by type | | `ractogateway_tool_calls_total` | Counter | `tool_name` | Tool calls per function | ### Sharing one instance across multiple kits ```python metrics = GatewayMetricsMiddleware() PrometheusExporter(port=8000).start() openai_kit = opd.OpenAIDeveloperKit(model="gpt-4o", ..., metrics=metrics) google_kit = god.GoogleDeveloperKit(model="gemini-2.0-flash", ..., metrics=metrics) anth_kit = anth.AnthropicDeveloperKit(model="claude-haiku-4-5-20251001", ..., metrics=metrics) # All three kits write to the same registry → aggregate dashboards out of the box. ``` ### Custom Prometheus registry (for tests) ```python import prometheus_client from ractogateway.telemetry import GatewayMetricsMiddleware registry = prometheus_client.CollectorRegistry() # isolated metrics = GatewayMetricsMiddleware(registry=registry) ``` --- ## Cost estimation The built-in pricing table covers 40+ models across all three providers. You can override or extend it on either `RactoTracer` or `GatewayMetricsMiddleware`: ```python from ractogateway.telemetry import ModelPricing, RactoTracer custom_prices = { "my-fine-tuned-gpt4": ModelPricing(input_per_million=5.00, output_per_million=15.00), } tracer = RactoTracer(in_memory=True, price_table=custom_prices) ``` The default table is available as `ractogateway.telemetry.DEFAULT_COST_TABLE` and you can compute one-off costs with `compute_cost(model, input_tokens, output_tokens)`. --- ## Google and Anthropic kits Both kits accept identical `tracer=` / `metrics=` parameters: ```python from ractogateway import google_developer_kit as god from ractogateway import anthropic_developer_kit as anth google_kit = god.GoogleDeveloperKit( model="gemini-2.0-flash", default_prompt=prompt, tracer=tracer, metrics=metrics, ) anth_kit = anth.AnthropicDeveloperKit( model="claude-opus-4-6", default_prompt=prompt, tracer=tracer, metrics=metrics, ) ``` > **Note:** Anthropic does not have a native embedding API, so `record_embed_span` is never > called by `AnthropicDeveloperKit`. --- ## Combining with caching and routing Telemetry is fully compatible with all other middleware. Cache hits are recorded as `cache_hit="exact"` or `cache_hit="semantic"` — the LLM API is not called and no token costs are incurred. ```python from ractogateway.cache import ExactMatchCache from ractogateway.telemetry import RactoTracer, GatewayMetricsMiddleware tracer = RactoTracer(in_memory=True) metrics = GatewayMetricsMiddleware() cache = ExactMatchCache() kit = opd.OpenAIDeveloperKit( model="gpt-4o", default_prompt=prompt, exact_cache=cache, tracer=tracer, metrics=metrics, ) ``` After an exact cache hit: - `tracer.spans[-1].cache_hit == "exact"` — zero tokens recorded - `metrics` counter `ractogateway_cache_hits_total{cache_type="exact"}` is incremented --- ## PrometheusExporter ```python from ractogateway.telemetry import PrometheusExporter exp = PrometheusExporter(port=8000) exp.start() # starts a background HTTP daemon thread print(exp.is_running) # True # Prometheus scrapes http://host:8000/metrics automatically. exp.stop() # clean shutdown ``` The exporter accepts a custom `registry` parameter if you want to serve only specific metrics: ```python exp = PrometheusExporter(port=8001, registry=my_registry) ``` --- ## See also - [API reference — telemetry](../api/telemetry.md) - [Grafana dashboard template](../../dashboards/grafana_dashboard.json) - [Cache guide](cache.md) - [Routing guide](routing.md)