Routing

Models

Data models for the cost-aware routing subsystem.

class ractogateway.routing._models.RoutingTier(**data)[source]

Bases: BaseModel

One tier in the cost-aware routing ladder.

The router evaluates a complexity score (0-100) for each incoming message and selects the first tier whose max_score is >= that score. The last tier in the list always acts as the catch-all fallback.

Parameters:
  • model (str) – The LLM model identifier to use for requests that fall in this tier (e.g. "gpt-4o-mini", "gemini-2.0-flash", "claude-haiku-4-5-20251001").

  • max_score (float) – Inclusive upper bound on the complexity score that routes to this model. Range: 0-100. Set to 100 for the last (most powerful) tier so it catches everything.

Examples

tiers = [
    RoutingTier(model="gpt-4o-mini",  max_score=30),
    RoutingTier(model="gpt-4o",        max_score=70),
    RoutingTier(model="o3-mini",        max_score=100),
]

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model: str
max_score: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Cost-Aware Router

Cost-aware model router.

Dynamically selects the cheapest model that can handle the complexity of an incoming request, without making an extra LLM call for classification.

Complexity scoring (pure heuristics, O(1) per call):

  1. Token estimatelen(text) // 4 gives a rough word/token count. Scaled to contribute 0-50 points.

  2. Keyword density — checks the message (lowercased) for a curated set of complexity keywords (e.g. “analyze”, “compare”, “implement”). Each unique keyword found adds points, up to 50.

  3. Score is clamped to [0, 100].

The router then walks the tiers list (sorted ascending by max_score) and returns the model of the first tier whose max_score score. The last tier is always the fallback.

Thread-safety: the router has no mutable state after construction — all methods are pure functions. Safe to share across threads / coroutines.

class ractogateway.routing.router.CostAwareRouter(tiers)[source]

Bases: object

Routes LLM requests to the appropriate model tier based on message complexity — without making any extra API calls.

Parameters:

tiers (list[RoutingTier]) – Ordered list of RoutingTier objects, sorted ascending by max_score (cheapest first). The last tier’s max_score should be 100 to act as fallback.

Raises:
  • ValueError – If tiers is empty or not sorted ascending by max_score.

  • Example — 3-tier OpenAI ladder:: – from ractogateway.routing import CostAwareRouter, RoutingTier router = CostAwareRouter([ RoutingTier(model=”gpt-4o-mini”, max_score=30), RoutingTier(model=”gpt-4o”, max_score=70), RoutingTier(model=”o3-mini”, max_score=100), ]) model = router.route(“What is 2+2?”) # → “gpt-4o-mini” model = router.route(“Analyze the trade-offs between Redis Cluster and ” “Cassandra for a write-heavy time-series workload …”) # → “o3-mini”

  • Example — binary routing (2 tiers):: – router = CostAwareRouter([ RoutingTier(model=”claude-haiku-4-5-20251001”, max_score=40), RoutingTier(model=”claude-opus-4-6”, max_score=100), ])

score(text)[source]

Compute a complexity score in [0, 100] for text.

A higher score means a more complex task.

Return type:

int

Algorithm

token_pts = min(len(text)//4, SAT) * (MAX_TP / SAT) kw_pts = min(matches * PPK, MAX_KP) score = clamp(token_pts + kw_pts, 0, 100)

route(text)[source]

Return the model identifier for text.

Walks tiers (cheapest first) and returns the first model whose max_score complexity_score. Always returns a model because the last tier has max_score == 100 (validated at construction).

Complexity: O(k) where k = number of tiers.

Return type:

str

property tiers: tuple[RoutingTier, ...]

Immutable view of the configured tiers.