ractogateway.routing.router

Cost-aware model router.

Dynamically selects the cheapest model that can handle the complexity of an incoming request, without making an extra LLM call for classification.

Complexity scoring (pure heuristics, O(1) per call):

Token estimate — len(text) // 4 gives a rough word/token count. Scaled to contribute 0-50 points.
Keyword density — checks the message (lowercased) for a curated set of complexity keywords (e.g. “analyze”, “compare”, “implement”). Each unique keyword found adds points, up to 50.
Score is clamped to [0, 100].

The router then walks the tiers list (sorted ascending by max_score) and returns the model of the first tier whose max_score ≥ score. The last tier is always the fallback.

Thread-safety: the router has no mutable state after construction — all methods are pure functions. Safe to share across threads / coroutines.

class ractogateway.routing.router.CostAwareRouter(tiers)[source]

Bases: object

Routes LLM requests to the appropriate model tier based on message complexity — without making any extra API calls.

Parameters:

tiers (list[RoutingTier]) – Ordered list of RoutingTier objects, sorted ascending by max_score (cheapest first). The last tier’s max_score should be 100 to act as fallback.

Raises:

ValueError – If tiers is empty or not sorted ascending by max_score.
Example — 3-tier OpenAI ladder:: – from ractogateway.routing import CostAwareRouter, RoutingTier router = CostAwareRouter([ RoutingTier(model=”gpt-4o-mini”, max_score=30), RoutingTier(model=”gpt-4o”, max_score=70), RoutingTier(model=”o3-mini”, max_score=100), ]) model = router.route(“What is 2+2?”) # → “gpt-4o-mini” model = router.route(“Analyze the trade-offs between Redis Cluster and ” “Cassandra for a write-heavy time-series workload …”) # → “o3-mini”
Example — binary routing (2 tiers):: – router = CostAwareRouter([ RoutingTier(model=”claude-haiku-4-5-20251001”, max_score=40), RoutingTier(model=”claude-opus-4-6”, max_score=100), ])

score(text)[source]

Compute a complexity score in [0, 100] for text.

A higher score means a more complex task.

Return type:: int

Algorithm

token_pts = min(len(text)//4, SAT) * (MAX_TP / SAT) kw_pts = min(matches * PPK, MAX_KP) score = clamp(token_pts + kw_pts, 0, 100)

route(text)[source]

Return the model identifier for text.

Walks tiers (cheapest first) and returns the first model whose max_score ≥ complexity_score. Always returns a model because the last tier has max_score == 100 (validated at construction).

Complexity: O(k) where k = number of tiers.

Return type:: str

property tiers: tuple[RoutingTier, ...]: Immutable view of the configured tiers.