ractogateway.cache
Caching subsystem for RactoGateway.
Two complementary cache strategies:
ExactMatchCache— SHA-256 keyed LRU cache for byte-for-byte identical requests (zero latency on hit, no embedding cost).SemanticCache— vector-similarity cache that returns cached answers for semantically equivalent queries even when the wording differs.
Both are optional and thread-safe. Enable them by passing instances to any developer kit constructor:
from ractogateway.cache import ExactMatchCache, SemanticCache
from ractogateway import openai_developer_kit as gpt
kit = gpt.OpenAIDeveloperKit(
model="gpt-4o",
exact_cache=ExactMatchCache(max_size=1024),
semantic_cache=SemanticCache(embed_fn=my_embed_fn),
)
- class ractogateway.cache.CacheConfig(**data)[source]
Bases:
BaseModelConfiguration for cache instances.
- Parameters:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- max_size: int
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.cache.CacheEntry(**data)[source]
Bases:
BaseModelA single cached LLM response.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- response: LLMResponse
- created_at: float
- hit_count: int
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.cache.CacheStats(**data)[source]
Bases:
BaseModelSnapshot of cache performance counters.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- hits: int
- misses: int
- size: int
- property total: int
Total requests seen by the cache.
- property hit_rate: float
Fraction of requests that were cache hits (0.0-1.0).
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.cache.ExactMatchCache(max_size=1024, ttl_seconds=None)[source]
Bases:
objectUltra-low-latency key-value cache for identical LLM requests.
- Parameters:
max_size (
int) – LRU capacity.0= unlimited (no eviction).ttl_seconds (
float|None) – Entries older than ttl_seconds are treated as misses and transparently evicted.Nonedisables expiry.Example:: –
from ractogateway.cache import ExactMatchCache
cache = ExactMatchCache(max_size=512, ttl_seconds=3600)
# Wire into a kit: kit = OpenAIDeveloperKit(model=”gpt-4o”, exact_cache=cache)
- get(user_message, system_prompt, model, temperature, max_tokens)[source]
Return a cached response or
Noneon a miss.O(1) — dictionary lookup + optional move-to-end.
- Return type:
- put(user_message, system_prompt, model, temperature, max_tokens, response)[source]
Store a response. Evicts LRU entry when at capacity.
O(1) amortised — dictionary insert + optional popitem(last=False).
- Return type:
- invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]
Remove a specific entry. Returns
Trueif it was present.- Return type:
- property stats: CacheStats
Return a snapshot of hit/miss/size counters.
- class ractogateway.cache.SemanticCache(embed_fn, similarity_threshold=0.95, max_size=512, ttl_seconds=None)[source]
Bases:
objectVector-similarity cache — returns cached answers for semantically similar queries, costing $0 in API calls.
- Parameters:
embed_fn (
Callable[[str],list[float]]) – Any callable(text: str) -> list[float]. Called once per new query (cache miss) and once atput()time.similarity_threshold (
float) – Minimum cosine similarity to declare a hit. Default0.95is intentionally strict to avoid incorrect responses.max_size (
int) – Maximum number of entries (LRU eviction).0= unlimited.ttl_seconds (
float|None) – Optional per-entry TTL.Nonedisables expiry.
Examples
import ractogateway.openai_developer_kit as gpt kit = gpt.OpenAIDeveloperKit(model="gpt-4o") def embed(text: str) -> list[float]: import openai r = openai.OpenAI().embeddings.create( model="text-embedding-3-small", input=text ) return r.data[0].embedding cache = SemanticCache(embed_fn=embed, similarity_threshold=0.95)
- get(query)[source]
Embed query and return a cached response if cosine-sim ≥ threshold.
Returns
Noneon a cache miss (caller should make the real API call and then invokeput()).Complexity: O(n·d) where n = number of entries, d = embedding dim.
- Return type:
- put(query, response)[source]
Embed query and store response for future similar queries.
Evicts LRU entry when at capacity.
- Return type:
- property stats: CacheStats
Return a snapshot of hit/miss/size counters.
- class ractogateway.cache.SemanticCacheConfig(**data)[source]
Bases:
BaseModelConfiguration for the semantic similarity cache.
- Parameters:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- threshold: float
- max_size: int
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.cache.SemanticCacheEntry(**data)[source]
Bases:
BaseModelOne entry in the semantic cache, pairing an embedding with a response.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- response: LLMResponse
- created_at: float
- hit_count: int
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].