Routing
Models
Data models for the cost-aware routing subsystem.
- class ractogateway.routing._models.RoutingTier(**data)[source]
Bases:
BaseModelOne tier in the cost-aware routing ladder.
The router evaluates a complexity score (0-100) for each incoming message and selects the first tier whose
max_scoreis >= that score. The last tier in the list always acts as the catch-all fallback.- Parameters:
model (str) – The LLM model identifier to use for requests that fall in this tier (e.g.
"gpt-4o-mini","gemini-2.0-flash","claude-haiku-4-5-20251001").max_score (float) – Inclusive upper bound on the complexity score that routes to this model. Range: 0-100. Set to
100for the last (most powerful) tier so it catches everything.
Examples
tiers = [ RoutingTier(model="gpt-4o-mini", max_score=30), RoutingTier(model="gpt-4o", max_score=70), RoutingTier(model="o3-mini", max_score=100), ]
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Cost-Aware Router
Cost-aware model router.
Dynamically selects the cheapest model that can handle the complexity of an incoming request, without making an extra LLM call for classification.
Complexity scoring (pure heuristics, O(1) per call):
Token estimate —
len(text) // 4gives a rough word/token count. Scaled to contribute 0-50 points.Keyword density — checks the message (lowercased) for a curated set of complexity keywords (e.g. “analyze”, “compare”, “implement”). Each unique keyword found adds points, up to 50.
Score is clamped to [0, 100].
The router then walks the tiers list (sorted ascending by max_score)
and returns the model of the first tier whose max_score ≥ score.
The last tier is always the fallback.
Thread-safety: the router has no mutable state after construction — all methods are pure functions. Safe to share across threads / coroutines.
- class ractogateway.routing.router.CostAwareRouter(tiers)[source]
Bases:
objectRoutes LLM requests to the appropriate model tier based on message complexity — without making any extra API calls.
- Parameters:
tiers (
list[RoutingTier]) – Ordered list ofRoutingTierobjects, sorted ascending bymax_score(cheapest first). The last tier’smax_scoreshould be100to act as fallback.- Raises:
ValueError – If
tiersis empty or not sorted ascending bymax_score.Example — 3-tier OpenAI ladder:: – from ractogateway.routing import CostAwareRouter, RoutingTier router = CostAwareRouter([ RoutingTier(model=”gpt-4o-mini”, max_score=30), RoutingTier(model=”gpt-4o”, max_score=70), RoutingTier(model=”o3-mini”, max_score=100), ]) model = router.route(“What is 2+2?”) # → “gpt-4o-mini” model = router.route(“Analyze the trade-offs between Redis Cluster and ” “Cassandra for a write-heavy time-series workload …”) # → “o3-mini”
Example — binary routing (2 tiers):: – router = CostAwareRouter([ RoutingTier(model=”claude-haiku-4-5-20251001”, max_score=40), RoutingTier(model=”claude-opus-4-6”, max_score=100), ])
- score(text)[source]
Compute a complexity score in [0, 100] for text.
A higher score means a more complex task.
- Return type:
Algorithm
token_pts = min(len(text)//4, SAT) * (MAX_TP / SAT) kw_pts = min(matches * PPK, MAX_KP) score = clamp(token_pts + kw_pts, 0, 100)
- route(text)[source]
Return the model identifier for text.
Walks tiers (cheapest first) and returns the first model whose
max_score ≥ complexity_score. Always returns a model because the last tier hasmax_score == 100(validated at construction).Complexity: O(k) where k = number of tiers.
- Return type:
- property tiers: tuple[RoutingTier, ...]
Immutable view of the configured tiers.