API Reference — Telemetry

Module: ractogateway.telemetry

Install extras: pip install "ractogateway[observability]"

RactoTracer

class RactoTracer

OpenTelemetry tracer. Pass as tracer= to any developer kit.

Constructor

RactoTracer(
    *,
    service_name: str = "ractogateway",
    otlp_endpoint: str | None = None,
    otlp_http_endpoint: str | None = None,
    console: bool = False,
    in_memory: bool = False,
    custom_exporter: SpanExporter | None = None,
    price_table: dict[str, ModelPricing] | None = None,
)

Parameters

service_name (str) — OTEL service.name resource attribute. Defaults to "ractogateway".
otlp_endpoint (str | None) — OTLP gRPC endpoint (e.g. "http://localhost:4317"). Requires pip install ractogateway[telemetry].
otlp_http_endpoint (str | None) — OTLP HTTP endpoint (e.g. "http://localhost:4318"). Requires pip install ractogateway[telemetry].
console (bool) — Also print spans to stdout. Defaults to False.
in_memory (bool) — Capture spans in a thread-safe list. Access via .spans. Useful for unit tests. Defaults to False.
custom_exporter — Any opentelemetry.sdk.trace.export.SpanExporter.
price_table (dict[str, ModelPricing] | None) — Override or extend the built-in pricing table. Keys are model identifiers; values are ModelPricing objects.

Methods

record_chat_span

def record_chat_span(
    *,
    provider: str,
    model: str,
    latency_ms: float,
    input_tokens: int = 0,
    output_tokens: int = 0,
    cache_hit: str = "miss",
    tool_calls: int = 0,
    status: str = "ok",
    error_type: str | None = None,
) -> None

Record a completed chat or stream span.

Parameters

provider — "openai", "google", or "anthropic".
model — Model identifier (e.g. "gpt-4o").
latency_ms — Total wall-clock latency of the LLM call in milliseconds.
input_tokens — Prompt tokens consumed. 0 for cache hits.
output_tokens — Completion tokens produced. 0 for cache hits.
cache_hit — "exact", "semantic", or "miss".
tool_calls — Number of tool calls in the response.
status — "ok" or "error".
error_type — Exception class name when status == "error", else None.

record_embed_span

def record_embed_span(
    *,
    provider: str,
    model: str,
    latency_ms: float,
    input_tokens: int = 0,
    status: str = "ok",
    error_type: str | None = None,
) -> None

Record a completed embedding span.

spans (property)

@property
def spans(self) -> list[SpanRecord]

Return all captured in-memory spans. Only populated when in_memory=True. Thread-safe.

clear_spans

def clear_spans(self) -> None

Clear all in-memory spans. Only has effect when in_memory=True.

GatewayMetricsMiddleware

class GatewayMetricsMiddleware

Prometheus metrics middleware. Pass as metrics= to any developer kit.

Constructor

GatewayMetricsMiddleware(
    *,
    price_table: dict[str, ModelPricing] | None = None,
    registry: CollectorRegistry | None = None,
)

Parameters

price_table — Override or extend the built-in pricing table.
registry — Custom prometheus_client.CollectorRegistry. Pass an isolated registry in tests to prevent metric name collisions.

Metrics

Metric name	Type	Labels
`ractogateway_requests_total`	Counter	`provider`, `model`, `operation`, `status`
`ractogateway_request_duration_seconds`	Histogram	`provider`, `model`, `operation`
`ractogateway_tokens_total`	Counter	`provider`, `model`, `token_type`
`ractogateway_cost_usd_total`	Counter	`provider`, `model`
`ractogateway_cache_hits_total`	Counter	`cache_type`
`ractogateway_cache_misses_total`	Counter	`cache_type`
`ractogateway_tool_calls_total`	Counter	`tool_name`

Methods

record_request

def record_request(
    *,
    provider: str,
    model: str,
    operation: str,
    status: str,
    latency_s: float,
    input_tokens: int = 0,
    output_tokens: int = 0,
    tool_calls: list[ToolCallResult] | None = None,
) -> None

Record metrics for a completed LLM request.

Parameters

provider — "openai", "google", or "anthropic".
model — Model identifier.
operation — "chat", "stream", or "embed".
status — "ok" or "error".
latency_s — Request wall-clock latency in seconds.
input_tokens — Prompt tokens consumed.
output_tokens — Completion tokens produced.
tool_calls — List of ToolCallResult objects from the response.

record_cache_hit

def record_cache_hit(cache_type: str) -> None

Increment ractogateway_cache_hits_total. cache_type is "exact" or "semantic".

record_cache_miss

def record_cache_miss(cache_type: str) -> None

Increment ractogateway_cache_misses_total. cache_type is "exact" or "semantic".

generate_latest

def generate_latest(self) -> str

Return current metrics in Prometheus text exposition format (UTF-8 string). Useful for testing without starting an HTTP server.

PrometheusExporter

class PrometheusExporter

HTTP server that exposes metrics at /metrics for Prometheus scraping.

Constructor

PrometheusExporter(
    port: int = 8000,
    registry: CollectorRegistry | None = None,
)

Parameters

port — TCP port to listen on. Defaults to 8000.
registry — Custom registry. If None, uses the global prometheus_client.REGISTRY.

Methods

start

def start(self) -> None

Start the HTTP server in a background daemon thread. Idempotent.

stop

def stop(self) -> None

Stop the HTTP server. Safe to call even if not started.

is_running (property)

@property
def is_running(self) -> bool

True if the HTTP server thread is running.

ModelPricing

class ModelPricing(BaseModel)

USD cost per 1 million tokens for a specific model.

Fields

input_per_million (float) — Price in USD for 1M input (prompt) tokens.
output_per_million (float) — Price in USD for 1M output (completion) tokens.

SpanRecord

class SpanRecord(BaseModel)

In-memory span record. Captured when RactoTracer(in_memory=True).

Fields

Field	Type	Default	Description
`name`	`str`	—	OTEL span name (`"llm.chat"` or `"llm.embed"`)
`provider`	`str`	—	`"openai"` / `"google"` / `"anthropic"`
`model`	`str`	—	Model identifier
`operation`	`str`	—	`"chat"` / `"stream"` / `"embed"`
`latency_ms`	`float`	—	Wall-clock latency in milliseconds (≥ 0)
`input_tokens`	`int`	`0`	Prompt tokens
`output_tokens`	`int`	`0`	Completion tokens
`cost_usd`	`float`	`0.0`	Estimated USD cost
`cache_hit`	`str`	`"miss"`	`"exact"` / `"semantic"` / `"miss"`
`tool_calls`	`int`	`0`	Number of tool calls
`status`	`str`	`"ok"`	`"ok"` or `"error"`
`error_type`	`str \| None`	`None`	Exception class name
`timestamp`	`float`	`time.time()`	Unix timestamp of recording

DEFAULT_COST_TABLE

DEFAULT_COST_TABLE: dict[str, ModelPricing]

Built-in pricing table with 40+ models. Covers OpenAI (GPT-4o, GPT-4o-mini, o1, o3-mini, …), Anthropic (Claude Opus/Sonnet/Haiku across generations), and Google (Gemini 2.0 Flash, Gemini 2.5 Pro, Gemini 1.5, …).

compute_cost

def compute_cost(model: str, input_tokens: int, output_tokens: int) -> float

Compute estimated USD cost from token counts using DEFAULT_COST_TABLE. Returns 0.0 for unknown models.

Parameters

model — Model identifier string.
input_tokens — Number of input tokens.
output_tokens — Number of output tokens.

Returns — Estimated cost in USD.