# API Reference — Telemetry

Module: `ractogateway.telemetry`

Install extras: `pip install "ractogateway[observability]"`

---

## RactoTracer

```python
class RactoTracer
```

OpenTelemetry tracer.  Pass as `tracer=` to any developer kit.

### Constructor

```python
RactoTracer(
    *,
    service_name: str = "ractogateway",
    otlp_endpoint: str | None = None,
    otlp_http_endpoint: str | None = None,
    console: bool = False,
    in_memory: bool = False,
    custom_exporter: SpanExporter | None = None,
    price_table: dict[str, ModelPricing] | None = None,
)
```

**Parameters**

- **service_name** (`str`) — OTEL `service.name` resource attribute. Defaults to `"ractogateway"`.
- **otlp_endpoint** (`str | None`) — OTLP gRPC endpoint (e.g. `"http://localhost:4317"`).
  Requires `pip install ractogateway[telemetry]`.
- **otlp_http_endpoint** (`str | None`) — OTLP HTTP endpoint (e.g. `"http://localhost:4318"`).
  Requires `pip install ractogateway[telemetry]`.
- **console** (`bool`) — Also print spans to stdout. Defaults to `False`.
- **in_memory** (`bool`) — Capture spans in a thread-safe list. Access via `.spans`.
  Useful for unit tests. Defaults to `False`.
- **custom_exporter** — Any `opentelemetry.sdk.trace.export.SpanExporter`.
- **price_table** (`dict[str, ModelPricing] | None`) — Override or extend the built-in
  pricing table. Keys are model identifiers; values are `ModelPricing` objects.

### Methods

#### record_chat_span

```python
def record_chat_span(
    *,
    provider: str,
    model: str,
    latency_ms: float,
    input_tokens: int = 0,
    output_tokens: int = 0,
    cache_hit: str = "miss",
    tool_calls: int = 0,
    status: str = "ok",
    error_type: str | None = None,
) -> None
```

Record a completed chat or stream span.

**Parameters**

- **provider** — `"openai"`, `"google"`, or `"anthropic"`.
- **model** — Model identifier (e.g. `"gpt-4o"`).
- **latency_ms** — Total wall-clock latency of the LLM call in milliseconds.
- **input_tokens** — Prompt tokens consumed. `0` for cache hits.
- **output_tokens** — Completion tokens produced. `0` for cache hits.
- **cache_hit** — `"exact"`, `"semantic"`, or `"miss"`.
- **tool_calls** — Number of tool calls in the response.
- **status** — `"ok"` or `"error"`.
- **error_type** — Exception class name when `status == "error"`, else `None`.

#### record_embed_span

```python
def record_embed_span(
    *,
    provider: str,
    model: str,
    latency_ms: float,
    input_tokens: int = 0,
    status: str = "ok",
    error_type: str | None = None,
) -> None
```

Record a completed embedding span.

#### spans (property)

```python
@property
def spans(self) -> list[SpanRecord]
```

Return all captured in-memory spans. Only populated when `in_memory=True`. Thread-safe.

#### clear_spans

```python
def clear_spans(self) -> None
```

Clear all in-memory spans. Only has effect when `in_memory=True`.

---

## GatewayMetricsMiddleware

```python
class GatewayMetricsMiddleware
```

Prometheus metrics middleware.  Pass as `metrics=` to any developer kit.

### Constructor

```python
GatewayMetricsMiddleware(
    *,
    price_table: dict[str, ModelPricing] | None = None,
    registry: CollectorRegistry | None = None,
)
```

**Parameters**

- **price_table** — Override or extend the built-in pricing table.
- **registry** — Custom `prometheus_client.CollectorRegistry`. Pass an isolated registry in
  tests to prevent metric name collisions.

### Metrics

| Metric name | Type | Labels |
|---|---|---|
| `ractogateway_requests_total` | Counter | `provider`, `model`, `operation`, `status` |
| `ractogateway_request_duration_seconds` | Histogram | `provider`, `model`, `operation` |
| `ractogateway_tokens_total` | Counter | `provider`, `model`, `token_type` |
| `ractogateway_cost_usd_total` | Counter | `provider`, `model` |
| `ractogateway_cache_hits_total` | Counter | `cache_type` |
| `ractogateway_cache_misses_total` | Counter | `cache_type` |
| `ractogateway_tool_calls_total` | Counter | `tool_name` |

### Methods

#### record_request

```python
def record_request(
    *,
    provider: str,
    model: str,
    operation: str,
    status: str,
    latency_s: float,
    input_tokens: int = 0,
    output_tokens: int = 0,
    tool_calls: list[ToolCallResult] | None = None,
) -> None
```

Record metrics for a completed LLM request.

**Parameters**

- **provider** — `"openai"`, `"google"`, or `"anthropic"`.
- **model** — Model identifier.
- **operation** — `"chat"`, `"stream"`, or `"embed"`.
- **status** — `"ok"` or `"error"`.
- **latency_s** — Request wall-clock latency **in seconds**.
- **input_tokens** — Prompt tokens consumed.
- **output_tokens** — Completion tokens produced.
- **tool_calls** — List of `ToolCallResult` objects from the response.

#### record_cache_hit

```python
def record_cache_hit(cache_type: str) -> None
```

Increment `ractogateway_cache_hits_total`. `cache_type` is `"exact"` or `"semantic"`.

#### record_cache_miss

```python
def record_cache_miss(cache_type: str) -> None
```

Increment `ractogateway_cache_misses_total`. `cache_type` is `"exact"` or `"semantic"`.

#### generate_latest

```python
def generate_latest(self) -> str
```

Return current metrics in Prometheus text exposition format (UTF-8 string).
Useful for testing without starting an HTTP server.

---

## PrometheusExporter

```python
class PrometheusExporter
```

HTTP server that exposes metrics at `/metrics` for Prometheus scraping.

### Constructor

```python
PrometheusExporter(
    port: int = 8000,
    registry: CollectorRegistry | None = None,
)
```

**Parameters**

- **port** — TCP port to listen on. Defaults to `8000`.
- **registry** — Custom registry. If `None`, uses the global `prometheus_client.REGISTRY`.

### Methods

#### start

```python
def start(self) -> None
```

Start the HTTP server in a background daemon thread. Idempotent.

#### stop

```python
def stop(self) -> None
```

Stop the HTTP server. Safe to call even if not started.

#### is_running (property)

```python
@property
def is_running(self) -> bool
```

`True` if the HTTP server thread is running.

---

## ModelPricing

```python
class ModelPricing(BaseModel)
```

USD cost per 1 million tokens for a specific model.

**Fields**

- **input_per_million** (`float`) — Price in USD for 1M input (prompt) tokens.
- **output_per_million** (`float`) — Price in USD for 1M output (completion) tokens.

---

## SpanRecord

```python
class SpanRecord(BaseModel)
```

In-memory span record.  Captured when `RactoTracer(in_memory=True)`.

**Fields**

| Field | Type | Default | Description |
|---|---|---|---|
| `name` | `str` | — | OTEL span name (`"llm.chat"` or `"llm.embed"`) |
| `provider` | `str` | — | `"openai"` / `"google"` / `"anthropic"` |
| `model` | `str` | — | Model identifier |
| `operation` | `str` | — | `"chat"` / `"stream"` / `"embed"` |
| `latency_ms` | `float` | — | Wall-clock latency in milliseconds (≥ 0) |
| `input_tokens` | `int` | `0` | Prompt tokens |
| `output_tokens` | `int` | `0` | Completion tokens |
| `cost_usd` | `float` | `0.0` | Estimated USD cost |
| `cache_hit` | `str` | `"miss"` | `"exact"` / `"semantic"` / `"miss"` |
| `tool_calls` | `int` | `0` | Number of tool calls |
| `status` | `str` | `"ok"` | `"ok"` or `"error"` |
| `error_type` | `str \| None` | `None` | Exception class name |
| `timestamp` | `float` | `time.time()` | Unix timestamp of recording |

---

## DEFAULT_COST_TABLE

```python
DEFAULT_COST_TABLE: dict[str, ModelPricing]
```

Built-in pricing table with 40+ models.  Covers OpenAI (GPT-4o, GPT-4o-mini, o1, o3-mini,
…), Anthropic (Claude Opus/Sonnet/Haiku across generations), and Google (Gemini 2.0 Flash,
Gemini 2.5 Pro, Gemini 1.5, …).

---

## compute_cost

```python
def compute_cost(model: str, input_tokens: int, output_tokens: int) -> float
```

Compute estimated USD cost from token counts using `DEFAULT_COST_TABLE`.
Returns `0.0` for unknown models.

**Parameters**

- **model** — Model identifier string.
- **input_tokens** — Number of input tokens.
- **output_tokens** — Number of output tokens.

**Returns** — Estimated cost in USD.