ractogateway.routing.router
Cost-aware model router.
Dynamically selects the cheapest model that can handle the complexity of an incoming request, without making an extra LLM call for classification.
Complexity scoring (pure heuristics, O(1) per call):
Token estimate —
len(text) // 4gives a rough word/token count. Scaled to contribute 0-50 points.Keyword density — checks the message (lowercased) for a curated set of complexity keywords (e.g. “analyze”, “compare”, “implement”). Each unique keyword found adds points, up to 50.
Score is clamped to [0, 100].
The router then walks the tiers list (sorted ascending by max_score)
and returns the model of the first tier whose max_score ≥ score.
The last tier is always the fallback.
Thread-safety: the router has no mutable state after construction — all methods are pure functions. Safe to share across threads / coroutines.
- class ractogateway.routing.router.CostAwareRouter(tiers)[source]
Bases:
objectRoutes LLM requests to the appropriate model tier based on message complexity — without making any extra API calls.
- Parameters:
tiers (
list[RoutingTier]) – Ordered list ofRoutingTierobjects, sorted ascending bymax_score(cheapest first). The last tier’smax_scoreshould be100to act as fallback.- Raises:
ValueError – If
tiersis empty or not sorted ascending bymax_score.Example — 3-tier OpenAI ladder:: – from ractogateway.routing import CostAwareRouter, RoutingTier router = CostAwareRouter([ RoutingTier(model=”gpt-4o-mini”, max_score=30), RoutingTier(model=”gpt-4o”, max_score=70), RoutingTier(model=”o3-mini”, max_score=100), ]) model = router.route(“What is 2+2?”) # → “gpt-4o-mini” model = router.route(“Analyze the trade-offs between Redis Cluster and ” “Cassandra for a write-heavy time-series workload …”) # → “o3-mini”
Example — binary routing (2 tiers):: – router = CostAwareRouter([ RoutingTier(model=”claude-haiku-4-5-20251001”, max_score=40), RoutingTier(model=”claude-opus-4-6”, max_score=100), ])
- score(text)[source]
Compute a complexity score in [0, 100] for text.
A higher score means a more complex task.
- Return type:
Algorithm
token_pts = min(len(text)//4, SAT) * (MAX_TP / SAT) kw_pts = min(matches * PPK, MAX_KP) score = clamp(token_pts + kw_pts, 0, 100)
- route(text)[source]
Return the model identifier for text.
Walks tiers (cheapest first) and returns the first model whose
max_score ≥ complexity_score. Always returns a model because the last tier hasmax_score == 100(validated at construction).Complexity: O(k) where k = number of tiers.
- Return type:
- property tiers: tuple[RoutingTier, ...]
Immutable view of the configured tiers.