ractogateway.telemetry.tracer
RactoTracer — OpenTelemetry integration for RactoGateway.
Pass a RactoTracer instance as tracer= to any developer kit to
automatically emit OTEL spans for every LLM call.
Requires: pip install ractogateway[telemetry]
Example:
from ractogateway import openai_developer_kit as opd
from ractogateway.telemetry import RactoTracer
tracer = RactoTracer(
otlp_endpoint="http://localhost:4317",
console=True,
)
kit = opd.OpenAIDeveloperKit(
model="gpt-4o",
default_prompt=my_prompt,
tracer=tracer,
)
response = kit.chat(opd.ChatConfig(user_message="Hello"))
# A span named "llm.chat" is now in your OTEL backend.
- class ractogateway.telemetry.tracer.RactoTracer(*, service_name='ractogateway', otlp_endpoint=None, otlp_http_endpoint=None, console=False, in_memory=False, custom_exporter=None, price_table=None)[source]
Bases:
objectOpenTelemetry tracer — pass as
tracer=to any developer kit.Records one span per LLM call with attributes for latency, token usage, estimated cost, cache-hit type, and tool-call count.
Supports OTLP gRPC (Jaeger / Grafana Tempo), OTLP HTTP, console stdout, in-memory capture (for tests), and any custom
opentelemetry.sdk.trace.export.SpanExporter.- Parameters:
service_name (
str) – OTELservice.nameresource attribute. Defaults to"ractogateway".otlp_endpoint (
str|None) – OTLP gRPC endpoint (e.g."http://localhost:4317"). Requirespip install ractogateway[telemetry].otlp_http_endpoint (
str|None) – OTLP HTTP endpoint (e.g."http://localhost:4318"). Requirespip install ractogateway[telemetry].console (
bool) – Also print spans to stdout — convenient during local development.in_memory (
bool) – Capture spans internally in a thread-safe list. Access recorded spans via thespansproperty. Useful for unit tests — no external backend required.custom_exporter (
Any|None) – Anyopentelemetry.sdk.trace.export.SpanExporterinstance.price_table (
dict[str,ModelPricing] |None) – Override or extend the built-inDEFAULT_COST_TABLE. Keys are model identifiers; values areModelPricingobjects.attributes (All spans carry the following OTEL)
---------------
attributes
"anthropic" (* llm.provider — "openai" / "google" /)
"gpt-4o" (* llm.model — e.g.)
"embed" (* llm.operation — "chat" / "stream" /)
milliseconds (* llm.latency_ms — wall-clock time in)
consumed (* llm.input_tokens — prompt tokens)
produced (* llm.output_tokens — completion tokens)
places) (* llm.cost_usd — estimated USD cost (8 decimal)
"miss" (* llm.cache_hit — "exact" / "semantic" /)
response (* llm.tool_calls — number of tool calls in the)
success) (* llm.error_type — exception class name on error (omitted on)
- record_chat_span(*, provider, model, latency_ms, input_tokens=0, output_tokens=0, cache_hit='miss', tool_calls=0, status='ok', error_type=None)[source]
Record a completed chat or stream span.
- Parameters:
provider (
str) – Provider string ("openai","google","anthropic").model (
str) – Model identifier (e.g."gpt-4o").latency_ms (
float) – Total wall-clock latency of the LLM call in milliseconds.input_tokens (
int) – Number of prompt tokens consumed (0for cache hits).output_tokens (
int) – Number of completion tokens produced (0for cache hits).cache_hit (
str) –"exact","semantic", or"miss".tool_calls (
int) – Number of tool calls in the response.status (
str) –"ok"or"error".error_type (
str|None) – Exception class name whenstatus == "error", elseNone.
- Return type:
- record_embed_span(*, provider, model, latency_ms, input_tokens=0, status='ok', error_type=None)[source]
Record a completed embedding span.
- Parameters:
provider (
str) – Provider string ("openai"or"google").model (
str) – Embedding model identifier.latency_ms (
float) – Total wall-clock latency in milliseconds.input_tokens (
int) – Number of tokens embedded.status (
str) –"ok"or"error".error_type (
str|None) – Exception class name whenstatus == "error", elseNone.
- Return type:
- property spans: list[SpanRecord]
Return all captured in-memory spans.
Only populated when
in_memory=True. Thread-safe.- Returns:
list[SpanRecord] – Snapshot of all recorded spans (newest last).