ractogateway.telemetry
RactoGateway Telemetry — OpenTelemetry tracing + Prometheus metrics.
Provides production-grade observability for every LLM call made through any RactoGateway developer kit.
Quick start:
from ractogateway import openai_developer_kit as opd
from ractogateway.telemetry import RactoTracer, GatewayMetricsMiddleware, PrometheusExporter
# --- OTEL tracing ---
tracer = RactoTracer(otlp_endpoint="http://localhost:4317")
# --- Prometheus metrics ---
metrics = GatewayMetricsMiddleware()
PrometheusExporter(port=8000).start()
kit = opd.OpenAIDeveloperKit(
model="gpt-4o",
default_prompt=my_prompt,
tracer=tracer,
metrics=metrics,
)
response = kit.chat(opd.ChatConfig(user_message="Hello"))
Install extras:
pip install ractogateway[telemetry] # OpenTelemetry tracing
pip install ractogateway[prometheus] # Prometheus metrics
pip install ractogateway[observability] # both
Public API
RactoTracerOpenTelemetry tracer. Pass as
tracer=to any kit.GatewayMetricsMiddlewarePrometheus metrics collector. Pass as
metrics=to any kit.PrometheusExporterHTTP server for Prometheus scraping.
ModelPricingPer-model input/output pricing (USD per 1M tokens).
SpanRecordIn-memory span record for test assertions.
DEFAULT_COST_TABLEBuilt-in pricing table.
compute_cost()Compute estimated USD cost from token counts.
- class ractogateway.telemetry.GatewayMetricsMiddleware(*, price_table=None, registry=None)[source]
Bases:
objectPrometheus metrics middleware — pass as
metrics=to any developer kit.A single instance can be shared across multiple kits (different providers) to aggregate metrics in one registry.
- Parameters:
price_table (
dict[str,ModelPricing] |None) – Override or extend the built-in pricing table used for theractogateway_cost_usd_totalcounter.registry (
Any|None) – Customprometheus_client.CollectorRegistry. Defaults to the globalREGISTRY(which also includes default Python metrics). Passprometheus_client.CollectorRegistry()to get an isolated registry — useful in tests.Requires (
pip install ractogateway[prometheus])
- record_request(*, provider, model, operation, status, latency_s, input_tokens=0, output_tokens=0, tool_calls=None)[source]
Record metrics for a completed LLM request.
- Parameters:
provider (
str) – Provider string ("openai","google","anthropic").model (
str) – Model identifier (e.g."gpt-4o").operation (
str) –"chat","stream", or"embed".status (
str) –"ok"or"error".latency_s (
float) – Request wall-clock latency in seconds.input_tokens (
int) – Prompt tokens consumed (0for cache hits or errors).output_tokens (
int) – Completion tokens produced (0for cache hits or errors).tool_calls (
list[Any] |None) – List ofToolCallResultobjects from the response. Used to updateractogateway_tool_calls_total.
- Return type:
- record_cache_hit(cache_type)[source]
Increment the cache-hits counter.
- record_cache_miss(cache_type)[source]
Increment the cache-misses counter.
- class ractogateway.telemetry.ModelPricing(**data)[source]
Bases:
BaseModelUSD cost per 1 million tokens for a specific model.
- Parameters:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- input_per_million: float
- output_per_million: float
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.telemetry.PrometheusExporter(port=8000, *, registry=None)[source]
Bases:
objectStart an HTTP server that exposes all registered Prometheus metrics.
The server listens on
http://0.0.0.0:<port>/metricsand responds with the standard Prometheus text exposition format. This is designed to be used alongsideGatewayMetricsMiddlewarebut works with any metrics registered in the global registry.- Parameters:
port (
int) – HTTP port to listen on. Defaults to8000.registry (
Any|None) – Customprometheus_client.CollectorRegistry. WhenNonethe globalREGISTRYis used.Example:: –
from ractogateway.telemetry import GatewayMetricsMiddleware, PrometheusExporter
metrics = GatewayMetricsMiddleware() exporter = PrometheusExporter(port=9090) exporter.start() # Prometheus can now scrape http://localhost:9090/metrics exporter.stop()
Requires (
pip install ractogateway[prometheus])
- start()[source]
Start the metrics HTTP server in a background daemon thread.
Calling
start()a second time on an already-running exporter is a no-op.- Return type:
- stop()[source]
Shut down the metrics HTTP server.
Safe to call even if the server was never started.
- Return type:
- property port: int
The configured HTTP port.
- property is_running: bool
Truewhen the HTTP server is active.
- generate_latest()[source]
Return current metrics in Prometheus text format.
No HTTP server required — useful for testing or embedding the metrics in a custom endpoint:
text = exporter.generate_latest() assert "process_resident_memory_bytes" in text
- Return type:
- Returns:
str – UTF-8 decoded Prometheus text exposition string.
- class ractogateway.telemetry.RactoTracer(*, service_name='ractogateway', otlp_endpoint=None, otlp_http_endpoint=None, console=False, in_memory=False, custom_exporter=None, price_table=None)[source]
Bases:
objectOpenTelemetry tracer — pass as
tracer=to any developer kit.Records one span per LLM call with attributes for latency, token usage, estimated cost, cache-hit type, and tool-call count.
Supports OTLP gRPC (Jaeger / Grafana Tempo), OTLP HTTP, console stdout, in-memory capture (for tests), and any custom
opentelemetry.sdk.trace.export.SpanExporter.- Parameters:
service_name (
str) – OTELservice.nameresource attribute. Defaults to"ractogateway".otlp_endpoint (
str|None) – OTLP gRPC endpoint (e.g."http://localhost:4317"). Requirespip install ractogateway[telemetry].otlp_http_endpoint (
str|None) – OTLP HTTP endpoint (e.g."http://localhost:4318"). Requirespip install ractogateway[telemetry].console (
bool) – Also print spans to stdout — convenient during local development.in_memory (
bool) – Capture spans internally in a thread-safe list. Access recorded spans via thespansproperty. Useful for unit tests — no external backend required.custom_exporter (
Any|None) – Anyopentelemetry.sdk.trace.export.SpanExporterinstance.price_table (
dict[str,ModelPricing] |None) – Override or extend the built-inDEFAULT_COST_TABLE. Keys are model identifiers; values areModelPricingobjects.attributes (All spans carry the following OTEL)
---------------
attributes
"anthropic" (* llm.provider — "openai" / "google" /)
"gpt-4o" (* llm.model — e.g.)
"embed" (* llm.operation — "chat" / "stream" /)
milliseconds (* llm.latency_ms — wall-clock time in)
consumed (* llm.input_tokens — prompt tokens)
produced (* llm.output_tokens — completion tokens)
places) (* llm.cost_usd — estimated USD cost (8 decimal)
"miss" (* llm.cache_hit — "exact" / "semantic" /)
response (* llm.tool_calls — number of tool calls in the)
success) (* llm.error_type — exception class name on error (omitted on)
- record_chat_span(*, provider, model, latency_ms, input_tokens=0, output_tokens=0, cache_hit='miss', tool_calls=0, status='ok', error_type=None)[source]
Record a completed chat or stream span.
- Parameters:
provider (
str) – Provider string ("openai","google","anthropic").model (
str) – Model identifier (e.g."gpt-4o").latency_ms (
float) – Total wall-clock latency of the LLM call in milliseconds.input_tokens (
int) – Number of prompt tokens consumed (0for cache hits).output_tokens (
int) – Number of completion tokens produced (0for cache hits).cache_hit (
str) –"exact","semantic", or"miss".tool_calls (
int) – Number of tool calls in the response.status (
str) –"ok"or"error".error_type (
str|None) – Exception class name whenstatus == "error", elseNone.
- Return type:
- record_embed_span(*, provider, model, latency_ms, input_tokens=0, status='ok', error_type=None)[source]
Record a completed embedding span.
- Parameters:
provider (
str) – Provider string ("openai"or"google").model (
str) – Embedding model identifier.latency_ms (
float) – Total wall-clock latency in milliseconds.input_tokens (
int) – Number of tokens embedded.status (
str) –"ok"or"error".error_type (
str|None) – Exception class name whenstatus == "error", elseNone.
- Return type:
- property spans: list[SpanRecord]
Return all captured in-memory spans.
Only populated when
in_memory=True. Thread-safe.- Returns:
list[SpanRecord] – Snapshot of all recorded spans (newest last).
- class ractogateway.telemetry.SpanRecord(**data)[source]
Bases:
BaseModelIn-memory span record — captured when
RactoTracer(in_memory=True).Useful for unit tests: inspect
.spansafter a call and assert on attributes without requiring an external OTEL backend.- Parameters:
name (str) – OTEL span name (
"llm.chat"or"llm.embed").provider (str) – Provider string —
"openai","google", or"anthropic".model (str) – Model identifier as passed to the kit (e.g.
"gpt-4o").operation (str) – Operation type —
"chat","stream", or"embed".latency_ms (float) – Total wall-clock latency of the LLM call in milliseconds.
input_tokens (int) – Number of prompt / input tokens consumed.
output_tokens (int) – Number of completion / output tokens produced.
cost_usd (float) – Estimated cost in USD derived from the built-in pricing table.
cache_hit (str) – Which cache served the result:
"exact","semantic", or"miss"when the LLM API was actually called.tool_calls (int) – Number of tool calls present in the response.
status (str) –
"ok"on success,"error"on exception.error_type (str | None) – Exception class name when
status == "error", elseNone.timestamp (float) – Unix timestamp (
time.time()) when the span was recorded.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- name: str
- provider: str
- model: str
- operation: str
- latency_ms: float
- input_tokens: int
- output_tokens: int
- cost_usd: float
- cache_hit: str
- tool_calls: int
- status: str
- timestamp: float
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- ractogateway.telemetry.compute_cost(model, input_tokens, output_tokens, extra_table=None)[source]
Compute the estimated USD cost for a single LLM call.
- Parameters:
model (
str) – Model identifier (e.g."gpt-4o"). If not found in the combined table the function returns0.0.input_tokens (
int) – Number of prompt tokens consumed.output_tokens (
int) – Number of completion tokens produced.extra_table (
dict[str,ModelPricing] |None) – Optional{model: ModelPricing}dict to override or extendDEFAULT_COST_TABLE. Extra entries win over defaults.
- Return type:
- Returns:
float – Estimated cost in USD, or
0.0when the model is unknown.