Telemetry & Observability

RactoGateway ships a production-grade observability layer with zero code changes to your existing kit calls. Attach a RactoTracer and/or GatewayMetricsMiddleware to any developer kit and every LLM call is automatically instrumented.

Installation

# OpenTelemetry tracing only
pip install "ractogateway[telemetry]"

# Prometheus metrics only
pip install "ractogateway[prometheus]"

# Both (recommended for production)
pip install "ractogateway[observability]"

Quick start

from ractogateway import openai_developer_kit as opd
from ractogateway.telemetry import RactoTracer, GatewayMetricsMiddleware, PrometheusExporter
from ractogateway.prompts.engine import RactoPrompt

# --- Tracing ---
tracer = RactoTracer(
    otlp_endpoint="http://localhost:4317",   # Jaeger / Grafana Tempo gRPC
    console=True,                            # also print to stdout (dev)
)

# --- Prometheus metrics ---
metrics = GatewayMetricsMiddleware()
PrometheusExporter(port=8000).start()        # scrape http://localhost:8000/metrics

prompt = RactoPrompt(
    context="You are a helpful assistant.",
    instructions="Answer the user's question.",
    output_format="Return a concise plain-text answer.",
)

kit = opd.OpenAIDeveloperKit(
    model="gpt-4o",
    default_prompt=prompt,
    tracer=tracer,     # attach tracer
    metrics=metrics,   # attach metrics
)

response = kit.chat(opd.ChatConfig(user_message="What is 2 + 2?"))
# One OTEL span is now in your backend, one Prometheus data-point is recorded.

The same tracer= / metrics= parameters work on all three provider kits.

RactoTracer — OpenTelemetry spans

Constructor options

Parameter	Type	Default	Description
`service_name`	`str`	`"ractogateway"`	OTEL `service.name` resource attribute
`otlp_endpoint`	`str \| None`	`None`	OTLP gRPC endpoint (e.g. Jaeger, Tempo)
`otlp_http_endpoint`	`str \| None`	`None`	OTLP HTTP endpoint (e.g. Zipkin)
`console`	`bool`	`False`	Print spans to stdout
`in_memory`	`bool`	`False`	Capture spans in memory (for tests)
`custom_exporter`	`SpanExporter \| None`	`None`	Any OTEL `SpanExporter`
`price_table`	`dict[str, ModelPricing] \| None`	`None`	Override / extend built-in pricing

Span attributes

Every span carries these OTEL attributes:

Attribute	Type	Description
`llm.provider`	`string`	`"openai"` / `"google"` / `"anthropic"`
`llm.model`	`string`	Model identifier (e.g. `"gpt-4o"`)
`llm.operation`	`string`	`"chat"` / `"stream"` / `"embed"`
`llm.latency_ms`	`float`	Wall-clock time in milliseconds
`llm.input_tokens`	`int`	Prompt tokens consumed
`llm.output_tokens`	`int`	Completion tokens produced
`llm.cost_usd`	`float`	Estimated USD cost (8 decimal places)
`llm.cache_hit`	`string`	`"exact"` / `"semantic"` / `"miss"`
`llm.tool_calls`	`int`	Number of tool calls in the response
`llm.error_type`	`string`	Exception class name on error (omitted on success)

Exporting to Jaeger / Grafana Tempo

# gRPC (default OTLP port 4317)
tracer = RactoTracer(otlp_endpoint="http://jaeger:4317")

# HTTP (default OTLP port 4318)
tracer = RactoTracer(otlp_http_endpoint="http://tempo:4318")

Using in unit tests

Set in_memory=True and inspect .spans after each call — no external backend needed.

from ractogateway.telemetry import RactoTracer

tracer = RactoTracer(in_memory=True)
kit = opd.OpenAIDeveloperKit(model="gpt-4o", default_prompt=prompt, tracer=tracer)

# ... make a (mocked) call ...

assert len(tracer.spans) == 1
span = tracer.spans[0]
assert span.provider == "openai"
assert span.input_tokens > 0
assert span.cost_usd > 0
tracer.clear_spans()   # reset between test cases

GatewayMetricsMiddleware — Prometheus metrics

Metrics exposed

Metric	Type	Labels	Description
`ractogateway_requests_total`	Counter	`provider`, `model`, `operation`, `status`	Total LLM requests
`ractogateway_request_duration_seconds`	Histogram	`provider`, `model`, `operation`	Wall-clock latency
`ractogateway_tokens_total`	Counter	`provider`, `model`, `token_type`	Token consumption
`ractogateway_cost_usd_total`	Counter	`provider`, `model`	Estimated USD cost
`ractogateway_cache_hits_total`	Counter	`cache_type`	Cache hits by type
`ractogateway_cache_misses_total`	Counter	`cache_type`	Cache misses by type
`ractogateway_tool_calls_total`	Counter	`tool_name`	Tool calls per function

Custom Prometheus registry (for tests)

import prometheus_client
from ractogateway.telemetry import GatewayMetricsMiddleware

registry = prometheus_client.CollectorRegistry()  # isolated
metrics = GatewayMetricsMiddleware(registry=registry)

Cost estimation

The built-in pricing table covers 40+ models across all three providers. You can override or extend it on either RactoTracer or GatewayMetricsMiddleware:

from ractogateway.telemetry import ModelPricing, RactoTracer

custom_prices = {
    "my-fine-tuned-gpt4": ModelPricing(input_per_million=5.00, output_per_million=15.00),
}

tracer = RactoTracer(in_memory=True, price_table=custom_prices)

The default table is available as ractogateway.telemetry.DEFAULT_COST_TABLE and you can compute one-off costs with compute_cost(model, input_tokens, output_tokens).

Google and Anthropic kits

Both kits accept identical tracer= / metrics= parameters:

from ractogateway import google_developer_kit as god
from ractogateway import anthropic_developer_kit as anth

google_kit = god.GoogleDeveloperKit(
    model="gemini-2.0-flash",
    default_prompt=prompt,
    tracer=tracer,
    metrics=metrics,
)

anth_kit = anth.AnthropicDeveloperKit(
    model="claude-opus-4-6",
    default_prompt=prompt,
    tracer=tracer,
    metrics=metrics,
)

Note: Anthropic does not have a native embedding API, so record_embed_span is never called by AnthropicDeveloperKit.

Combining with caching and routing

Telemetry is fully compatible with all other middleware. Cache hits are recorded as cache_hit="exact" or cache_hit="semantic" — the LLM API is not called and no token costs are incurred.

from ractogateway.cache import ExactMatchCache
from ractogateway.telemetry import RactoTracer, GatewayMetricsMiddleware

tracer  = RactoTracer(in_memory=True)
metrics = GatewayMetricsMiddleware()
cache   = ExactMatchCache()

kit = opd.OpenAIDeveloperKit(
    model="gpt-4o",
    default_prompt=prompt,
    exact_cache=cache,
    tracer=tracer,
    metrics=metrics,
)

After an exact cache hit:

tracer.spans[-1].cache_hit == "exact" — zero tokens recorded
metrics counter ractogateway_cache_hits_total{cache_type="exact"} is incremented

PrometheusExporter

from ractogateway.telemetry import PrometheusExporter

exp = PrometheusExporter(port=8000)
exp.start()          # starts a background HTTP daemon thread
print(exp.is_running)  # True

# Prometheus scrapes http://host:8000/metrics automatically.

exp.stop()           # clean shutdown

The exporter accepts a custom registry parameter if you want to serve only specific metrics:

exp = PrometheusExporter(port=8001, registry=my_registry)