API Reference — Telemetry
Module: ractogateway.telemetry
Install extras: pip install "ractogateway[observability]"
RactoTracer
class RactoTracer
OpenTelemetry tracer. Pass as tracer= to any developer kit.
Constructor
RactoTracer(
*,
service_name: str = "ractogateway",
otlp_endpoint: str | None = None,
otlp_http_endpoint: str | None = None,
console: bool = False,
in_memory: bool = False,
custom_exporter: SpanExporter | None = None,
price_table: dict[str, ModelPricing] | None = None,
)
Parameters
service_name (
str) — OTELservice.nameresource attribute. Defaults to"ractogateway".otlp_endpoint (
str | None) — OTLP gRPC endpoint (e.g."http://localhost:4317"). Requirespip install ractogateway[telemetry].otlp_http_endpoint (
str | None) — OTLP HTTP endpoint (e.g."http://localhost:4318"). Requirespip install ractogateway[telemetry].console (
bool) — Also print spans to stdout. Defaults toFalse.in_memory (
bool) — Capture spans in a thread-safe list. Access via.spans. Useful for unit tests. Defaults toFalse.custom_exporter — Any
opentelemetry.sdk.trace.export.SpanExporter.price_table (
dict[str, ModelPricing] | None) — Override or extend the built-in pricing table. Keys are model identifiers; values areModelPricingobjects.
Methods
record_chat_span
def record_chat_span(
*,
provider: str,
model: str,
latency_ms: float,
input_tokens: int = 0,
output_tokens: int = 0,
cache_hit: str = "miss",
tool_calls: int = 0,
status: str = "ok",
error_type: str | None = None,
) -> None
Record a completed chat or stream span.
Parameters
provider —
"openai","google", or"anthropic".model — Model identifier (e.g.
"gpt-4o").latency_ms — Total wall-clock latency of the LLM call in milliseconds.
input_tokens — Prompt tokens consumed.
0for cache hits.output_tokens — Completion tokens produced.
0for cache hits.cache_hit —
"exact","semantic", or"miss".tool_calls — Number of tool calls in the response.
status —
"ok"or"error".error_type — Exception class name when
status == "error", elseNone.
record_embed_span
def record_embed_span(
*,
provider: str,
model: str,
latency_ms: float,
input_tokens: int = 0,
status: str = "ok",
error_type: str | None = None,
) -> None
Record a completed embedding span.
spans (property)
@property
def spans(self) -> list[SpanRecord]
Return all captured in-memory spans. Only populated when in_memory=True. Thread-safe.
clear_spans
def clear_spans(self) -> None
Clear all in-memory spans. Only has effect when in_memory=True.
GatewayMetricsMiddleware
class GatewayMetricsMiddleware
Prometheus metrics middleware. Pass as metrics= to any developer kit.
Constructor
GatewayMetricsMiddleware(
*,
price_table: dict[str, ModelPricing] | None = None,
registry: CollectorRegistry | None = None,
)
Parameters
price_table — Override or extend the built-in pricing table.
registry — Custom
prometheus_client.CollectorRegistry. Pass an isolated registry in tests to prevent metric name collisions.
Metrics
Metric name |
Type |
Labels |
|---|---|---|
|
Counter |
|
|
Histogram |
|
|
Counter |
|
|
Counter |
|
|
Counter |
|
|
Counter |
|
|
Counter |
|
Methods
record_request
def record_request(
*,
provider: str,
model: str,
operation: str,
status: str,
latency_s: float,
input_tokens: int = 0,
output_tokens: int = 0,
tool_calls: list[ToolCallResult] | None = None,
) -> None
Record metrics for a completed LLM request.
Parameters
provider —
"openai","google", or"anthropic".model — Model identifier.
operation —
"chat","stream", or"embed".status —
"ok"or"error".latency_s — Request wall-clock latency in seconds.
input_tokens — Prompt tokens consumed.
output_tokens — Completion tokens produced.
tool_calls — List of
ToolCallResultobjects from the response.
record_cache_hit
def record_cache_hit(cache_type: str) -> None
Increment ractogateway_cache_hits_total. cache_type is "exact" or "semantic".
record_cache_miss
def record_cache_miss(cache_type: str) -> None
Increment ractogateway_cache_misses_total. cache_type is "exact" or "semantic".
generate_latest
def generate_latest(self) -> str
Return current metrics in Prometheus text exposition format (UTF-8 string). Useful for testing without starting an HTTP server.
PrometheusExporter
class PrometheusExporter
HTTP server that exposes metrics at /metrics for Prometheus scraping.
Constructor
PrometheusExporter(
port: int = 8000,
registry: CollectorRegistry | None = None,
)
Parameters
port — TCP port to listen on. Defaults to
8000.registry — Custom registry. If
None, uses the globalprometheus_client.REGISTRY.
Methods
start
def start(self) -> None
Start the HTTP server in a background daemon thread. Idempotent.
stop
def stop(self) -> None
Stop the HTTP server. Safe to call even if not started.
is_running (property)
@property
def is_running(self) -> bool
True if the HTTP server thread is running.
ModelPricing
class ModelPricing(BaseModel)
USD cost per 1 million tokens for a specific model.
Fields
input_per_million (
float) — Price in USD for 1M input (prompt) tokens.output_per_million (
float) — Price in USD for 1M output (completion) tokens.
SpanRecord
class SpanRecord(BaseModel)
In-memory span record. Captured when RactoTracer(in_memory=True).
Fields
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
— |
OTEL span name ( |
|
|
— |
|
|
|
— |
Model identifier |
|
|
— |
|
|
|
— |
Wall-clock latency in milliseconds (≥ 0) |
|
|
|
Prompt tokens |
|
|
|
Completion tokens |
|
|
|
Estimated USD cost |
|
|
|
|
|
|
|
Number of tool calls |
|
|
|
|
|
|
|
Exception class name |
|
|
|
Unix timestamp of recording |
DEFAULT_COST_TABLE
DEFAULT_COST_TABLE: dict[str, ModelPricing]
Built-in pricing table with 40+ models. Covers OpenAI (GPT-4o, GPT-4o-mini, o1, o3-mini, …), Anthropic (Claude Opus/Sonnet/Haiku across generations), and Google (Gemini 2.0 Flash, Gemini 2.5 Pro, Gemini 1.5, …).
compute_cost
def compute_cost(model: str, input_tokens: int, output_tokens: int) -> float
Compute estimated USD cost from token counts using DEFAULT_COST_TABLE.
Returns 0.0 for unknown models.
Parameters
model — Model identifier string.
input_tokens — Number of input tokens.
output_tokens — Number of output tokens.
Returns — Estimated cost in USD.