API Reference — Telemetry

Module: ractogateway.telemetry

Install extras: pip install "ractogateway[observability]"


RactoTracer

class RactoTracer

OpenTelemetry tracer. Pass as tracer= to any developer kit.

Constructor

RactoTracer(
    *,
    service_name: str = "ractogateway",
    otlp_endpoint: str | None = None,
    otlp_http_endpoint: str | None = None,
    console: bool = False,
    in_memory: bool = False,
    custom_exporter: SpanExporter | None = None,
    price_table: dict[str, ModelPricing] | None = None,
)

Parameters

  • service_name (str) — OTEL service.name resource attribute. Defaults to "ractogateway".

  • otlp_endpoint (str | None) — OTLP gRPC endpoint (e.g. "http://localhost:4317"). Requires pip install ractogateway[telemetry].

  • otlp_http_endpoint (str | None) — OTLP HTTP endpoint (e.g. "http://localhost:4318"). Requires pip install ractogateway[telemetry].

  • console (bool) — Also print spans to stdout. Defaults to False.

  • in_memory (bool) — Capture spans in a thread-safe list. Access via .spans. Useful for unit tests. Defaults to False.

  • custom_exporter — Any opentelemetry.sdk.trace.export.SpanExporter.

  • price_table (dict[str, ModelPricing] | None) — Override or extend the built-in pricing table. Keys are model identifiers; values are ModelPricing objects.

Methods

record_chat_span

def record_chat_span(
    *,
    provider: str,
    model: str,
    latency_ms: float,
    input_tokens: int = 0,
    output_tokens: int = 0,
    cache_hit: str = "miss",
    tool_calls: int = 0,
    status: str = "ok",
    error_type: str | None = None,
) -> None

Record a completed chat or stream span.

Parameters

  • provider"openai", "google", or "anthropic".

  • model — Model identifier (e.g. "gpt-4o").

  • latency_ms — Total wall-clock latency of the LLM call in milliseconds.

  • input_tokens — Prompt tokens consumed. 0 for cache hits.

  • output_tokens — Completion tokens produced. 0 for cache hits.

  • cache_hit"exact", "semantic", or "miss".

  • tool_calls — Number of tool calls in the response.

  • status"ok" or "error".

  • error_type — Exception class name when status == "error", else None.

record_embed_span

def record_embed_span(
    *,
    provider: str,
    model: str,
    latency_ms: float,
    input_tokens: int = 0,
    status: str = "ok",
    error_type: str | None = None,
) -> None

Record a completed embedding span.

spans (property)

@property
def spans(self) -> list[SpanRecord]

Return all captured in-memory spans. Only populated when in_memory=True. Thread-safe.

clear_spans

def clear_spans(self) -> None

Clear all in-memory spans. Only has effect when in_memory=True.


GatewayMetricsMiddleware

class GatewayMetricsMiddleware

Prometheus metrics middleware. Pass as metrics= to any developer kit.

Constructor

GatewayMetricsMiddleware(
    *,
    price_table: dict[str, ModelPricing] | None = None,
    registry: CollectorRegistry | None = None,
)

Parameters

  • price_table — Override or extend the built-in pricing table.

  • registry — Custom prometheus_client.CollectorRegistry. Pass an isolated registry in tests to prevent metric name collisions.

Metrics

Metric name

Type

Labels

ractogateway_requests_total

Counter

provider, model, operation, status

ractogateway_request_duration_seconds

Histogram

provider, model, operation

ractogateway_tokens_total

Counter

provider, model, token_type

ractogateway_cost_usd_total

Counter

provider, model

ractogateway_cache_hits_total

Counter

cache_type

ractogateway_cache_misses_total

Counter

cache_type

ractogateway_tool_calls_total

Counter

tool_name

Methods

record_request

def record_request(
    *,
    provider: str,
    model: str,
    operation: str,
    status: str,
    latency_s: float,
    input_tokens: int = 0,
    output_tokens: int = 0,
    tool_calls: list[ToolCallResult] | None = None,
) -> None

Record metrics for a completed LLM request.

Parameters

  • provider"openai", "google", or "anthropic".

  • model — Model identifier.

  • operation"chat", "stream", or "embed".

  • status"ok" or "error".

  • latency_s — Request wall-clock latency in seconds.

  • input_tokens — Prompt tokens consumed.

  • output_tokens — Completion tokens produced.

  • tool_calls — List of ToolCallResult objects from the response.

record_cache_hit

def record_cache_hit(cache_type: str) -> None

Increment ractogateway_cache_hits_total. cache_type is "exact" or "semantic".

record_cache_miss

def record_cache_miss(cache_type: str) -> None

Increment ractogateway_cache_misses_total. cache_type is "exact" or "semantic".

generate_latest

def generate_latest(self) -> str

Return current metrics in Prometheus text exposition format (UTF-8 string). Useful for testing without starting an HTTP server.


PrometheusExporter

class PrometheusExporter

HTTP server that exposes metrics at /metrics for Prometheus scraping.

Constructor

PrometheusExporter(
    port: int = 8000,
    registry: CollectorRegistry | None = None,
)

Parameters

  • port — TCP port to listen on. Defaults to 8000.

  • registry — Custom registry. If None, uses the global prometheus_client.REGISTRY.

Methods

start

def start(self) -> None

Start the HTTP server in a background daemon thread. Idempotent.

stop

def stop(self) -> None

Stop the HTTP server. Safe to call even if not started.

is_running (property)

@property
def is_running(self) -> bool

True if the HTTP server thread is running.


ModelPricing

class ModelPricing(BaseModel)

USD cost per 1 million tokens for a specific model.

Fields

  • input_per_million (float) — Price in USD for 1M input (prompt) tokens.

  • output_per_million (float) — Price in USD for 1M output (completion) tokens.


SpanRecord

class SpanRecord(BaseModel)

In-memory span record. Captured when RactoTracer(in_memory=True).

Fields

Field

Type

Default

Description

name

str

OTEL span name ("llm.chat" or "llm.embed")

provider

str

"openai" / "google" / "anthropic"

model

str

Model identifier

operation

str

"chat" / "stream" / "embed"

latency_ms

float

Wall-clock latency in milliseconds (≥ 0)

input_tokens

int

0

Prompt tokens

output_tokens

int

0

Completion tokens

cost_usd

float

0.0

Estimated USD cost

cache_hit

str

"miss"

"exact" / "semantic" / "miss"

tool_calls

int

0

Number of tool calls

status

str

"ok"

"ok" or "error"

error_type

str | None

None

Exception class name

timestamp

float

time.time()

Unix timestamp of recording


DEFAULT_COST_TABLE

DEFAULT_COST_TABLE: dict[str, ModelPricing]

Built-in pricing table with 40+ models. Covers OpenAI (GPT-4o, GPT-4o-mini, o1, o3-mini, …), Anthropic (Claude Opus/Sonnet/Haiku across generations), and Google (Gemini 2.0 Flash, Gemini 2.5 Pro, Gemini 1.5, …).


compute_cost

def compute_cost(model: str, input_tokens: int, output_tokens: int) -> float

Compute estimated USD cost from token counts using DEFAULT_COST_TABLE. Returns 0.0 for unknown models.

Parameters

  • model — Model identifier string.

  • input_tokens — Number of input tokens.

  • output_tokens — Number of output tokens.

Returns — Estimated cost in USD.