ractogateway.routing
Cost-aware model routing for RactoGateway.
Dynamically routes LLM requests to the cheapest model capable of handling the task complexity — without any extra API calls.
Quick start:
from ractogateway.routing import CostAwareRouter, RoutingTier
from ractogateway import openai_developer_kit as gpt
router = CostAwareRouter([
RoutingTier(model="gpt-4o-mini", max_score=30), # simple tasks
RoutingTier(model="gpt-4o", max_score=70), # moderate
RoutingTier(model="o3-mini", max_score=100), # complex / fallback
])
kit = gpt.OpenAIDeveloperKit(model="auto", router=router)
response = kit.chat(gpt.ChatConfig(user_message="What is 2+2?"))
# Routes to gpt-4o-mini automatically.
- class ractogateway.routing.CostAwareRouter(tiers)[source]
Bases:
objectRoutes LLM requests to the appropriate model tier based on message complexity — without making any extra API calls.
- Parameters:
tiers (
list[RoutingTier]) – Ordered list ofRoutingTierobjects, sorted ascending bymax_score(cheapest first). The last tier’smax_scoreshould be100to act as fallback.- Raises:
ValueError – If
tiersis empty or not sorted ascending bymax_score.Example — 3-tier OpenAI ladder:: – from ractogateway.routing import CostAwareRouter, RoutingTier router = CostAwareRouter([ RoutingTier(model=”gpt-4o-mini”, max_score=30), RoutingTier(model=”gpt-4o”, max_score=70), RoutingTier(model=”o3-mini”, max_score=100), ]) model = router.route(“What is 2+2?”) # → “gpt-4o-mini” model = router.route(“Analyze the trade-offs between Redis Cluster and ” “Cassandra for a write-heavy time-series workload …”) # → “o3-mini”
Example — binary routing (2 tiers):: – router = CostAwareRouter([ RoutingTier(model=”claude-haiku-4-5-20251001”, max_score=40), RoutingTier(model=”claude-opus-4-6”, max_score=100), ])
- score(text)[source]
Compute a complexity score in [0, 100] for text.
A higher score means a more complex task.
- Return type:
Algorithm
token_pts = min(len(text)//4, SAT) * (MAX_TP / SAT) kw_pts = min(matches * PPK, MAX_KP) score = clamp(token_pts + kw_pts, 0, 100)
- route(text)[source]
Return the model identifier for text.
Walks tiers (cheapest first) and returns the first model whose
max_score ≥ complexity_score. Always returns a model because the last tier hasmax_score == 100(validated at construction).Complexity: O(k) where k = number of tiers.
- Return type:
- property tiers: tuple[RoutingTier, ...]
Immutable view of the configured tiers.
- class ractogateway.routing.RoutingTier(**data)[source]
Bases:
BaseModelOne tier in the cost-aware routing ladder.
The router evaluates a complexity score (0-100) for each incoming message and selects the first tier whose
max_scoreis >= that score. The last tier in the list always acts as the catch-all fallback.- Parameters:
model (str) – The LLM model identifier to use for requests that fall in this tier (e.g.
"gpt-4o-mini","gemini-2.0-flash","claude-haiku-4-5-20251001").max_score (float) – Inclusive upper bound on the complexity score that routes to this model. Range: 0-100. Set to
100for the last (most powerful) tier so it catches everything.
Examples
tiers = [ RoutingTier(model="gpt-4o-mini", max_score=30), RoutingTier(model="gpt-4o", max_score=70), RoutingTier(model="o3-mini", max_score=100), ]
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model: str
- max_score: float
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].