Telemetry & Observability
RactoGateway ships a production-grade observability layer with zero code changes to your
existing kit calls. Attach a RactoTracer and/or GatewayMetricsMiddleware to any developer
kit and every LLM call is automatically instrumented.
Installation
# OpenTelemetry tracing only
pip install "ractogateway[telemetry]"
# Prometheus metrics only
pip install "ractogateway[prometheus]"
# Both (recommended for production)
pip install "ractogateway[observability]"
Quick start
from ractogateway import openai_developer_kit as opd
from ractogateway.telemetry import RactoTracer, GatewayMetricsMiddleware, PrometheusExporter
from ractogateway.prompts.engine import RactoPrompt
# --- Tracing ---
tracer = RactoTracer(
otlp_endpoint="http://localhost:4317", # Jaeger / Grafana Tempo gRPC
console=True, # also print to stdout (dev)
)
# --- Prometheus metrics ---
metrics = GatewayMetricsMiddleware()
PrometheusExporter(port=8000).start() # scrape http://localhost:8000/metrics
prompt = RactoPrompt(
context="You are a helpful assistant.",
instructions="Answer the user's question.",
output_format="Return a concise plain-text answer.",
)
kit = opd.OpenAIDeveloperKit(
model="gpt-4o",
default_prompt=prompt,
tracer=tracer, # attach tracer
metrics=metrics, # attach metrics
)
response = kit.chat(opd.ChatConfig(user_message="What is 2 + 2?"))
# One OTEL span is now in your backend, one Prometheus data-point is recorded.
The same tracer= / metrics= parameters work on all three provider kits.
RactoTracer — OpenTelemetry spans
Constructor options
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
OTEL |
|
|
|
OTLP gRPC endpoint (e.g. Jaeger, Tempo) |
|
|
|
OTLP HTTP endpoint (e.g. Zipkin) |
|
|
|
Print spans to stdout |
|
|
|
Capture spans in memory (for tests) |
|
|
|
Any OTEL |
|
|
|
Override / extend built-in pricing |
Span attributes
Every span carries these OTEL attributes:
Attribute |
Type |
Description |
|---|---|---|
|
|
|
|
|
Model identifier (e.g. |
|
|
|
|
|
Wall-clock time in milliseconds |
|
|
Prompt tokens consumed |
|
|
Completion tokens produced |
|
|
Estimated USD cost (8 decimal places) |
|
|
|
|
|
Number of tool calls in the response |
|
|
Exception class name on error (omitted on success) |
Exporting to Jaeger / Grafana Tempo
# gRPC (default OTLP port 4317)
tracer = RactoTracer(otlp_endpoint="http://jaeger:4317")
# HTTP (default OTLP port 4318)
tracer = RactoTracer(otlp_http_endpoint="http://tempo:4318")
Using in unit tests
Set in_memory=True and inspect .spans after each call — no external backend needed.
from ractogateway.telemetry import RactoTracer
tracer = RactoTracer(in_memory=True)
kit = opd.OpenAIDeveloperKit(model="gpt-4o", default_prompt=prompt, tracer=tracer)
# ... make a (mocked) call ...
assert len(tracer.spans) == 1
span = tracer.spans[0]
assert span.provider == "openai"
assert span.input_tokens > 0
assert span.cost_usd > 0
tracer.clear_spans() # reset between test cases
GatewayMetricsMiddleware — Prometheus metrics
Metrics exposed
Metric |
Type |
Labels |
Description |
|---|---|---|---|
|
Counter |
|
Total LLM requests |
|
Histogram |
|
Wall-clock latency |
|
Counter |
|
Token consumption |
|
Counter |
|
Estimated USD cost |
|
Counter |
|
Cache hits by type |
|
Counter |
|
Cache misses by type |
|
Counter |
|
Tool calls per function |
Custom Prometheus registry (for tests)
import prometheus_client
from ractogateway.telemetry import GatewayMetricsMiddleware
registry = prometheus_client.CollectorRegistry() # isolated
metrics = GatewayMetricsMiddleware(registry=registry)
Cost estimation
The built-in pricing table covers 40+ models across all three providers. You can override or
extend it on either RactoTracer or GatewayMetricsMiddleware:
from ractogateway.telemetry import ModelPricing, RactoTracer
custom_prices = {
"my-fine-tuned-gpt4": ModelPricing(input_per_million=5.00, output_per_million=15.00),
}
tracer = RactoTracer(in_memory=True, price_table=custom_prices)
The default table is available as ractogateway.telemetry.DEFAULT_COST_TABLE and you can
compute one-off costs with compute_cost(model, input_tokens, output_tokens).
Google and Anthropic kits
Both kits accept identical tracer= / metrics= parameters:
from ractogateway import google_developer_kit as god
from ractogateway import anthropic_developer_kit as anth
google_kit = god.GoogleDeveloperKit(
model="gemini-2.0-flash",
default_prompt=prompt,
tracer=tracer,
metrics=metrics,
)
anth_kit = anth.AnthropicDeveloperKit(
model="claude-opus-4-6",
default_prompt=prompt,
tracer=tracer,
metrics=metrics,
)
Note: Anthropic does not have a native embedding API, so
record_embed_spanis never called byAnthropicDeveloperKit.
Combining with caching and routing
Telemetry is fully compatible with all other middleware. Cache hits are recorded as
cache_hit="exact" or cache_hit="semantic" — the LLM API is not called and no token costs
are incurred.
from ractogateway.cache import ExactMatchCache
from ractogateway.telemetry import RactoTracer, GatewayMetricsMiddleware
tracer = RactoTracer(in_memory=True)
metrics = GatewayMetricsMiddleware()
cache = ExactMatchCache()
kit = opd.OpenAIDeveloperKit(
model="gpt-4o",
default_prompt=prompt,
exact_cache=cache,
tracer=tracer,
metrics=metrics,
)
After an exact cache hit:
tracer.spans[-1].cache_hit == "exact"— zero tokens recordedmetricscounterractogateway_cache_hits_total{cache_type="exact"}is incremented
PrometheusExporter
from ractogateway.telemetry import PrometheusExporter
exp = PrometheusExporter(port=8000)
exp.start() # starts a background HTTP daemon thread
print(exp.is_running) # True
# Prometheus scrapes http://host:8000/metrics automatically.
exp.stop() # clean shutdown
The exporter accepts a custom registry parameter if you want to serve only specific metrics:
exp = PrometheusExporter(port=8001, registry=my_registry)