ractogateway.cache.exact_cache
Exact-match key-value cache with LRU eviction and optional TTL.
Uses collections.OrderedDict for O(1) get / put / evict — a standard
least-recently-used (LRU) cache pattern. No external dependencies.
Thread-safety is provided by a threading.Lock so the cache is safe to
share across threads without any external synchronisation.
- class ractogateway.cache.exact_cache.ExactMatchCache(max_size=1024, ttl_seconds=None)[source]
Bases:
objectUltra-low-latency key-value cache for identical LLM requests.
- Parameters:
max_size (
int) – LRU capacity.0= unlimited (no eviction).ttl_seconds (
float|None) – Entries older than ttl_seconds are treated as misses and transparently evicted.Nonedisables expiry.Example:: –
from ractogateway.cache import ExactMatchCache
cache = ExactMatchCache(max_size=512, ttl_seconds=3600)
# Wire into a kit: kit = OpenAIDeveloperKit(model=”gpt-4o”, exact_cache=cache)
- get(user_message, system_prompt, model, temperature, max_tokens)[source]
Return a cached response or
Noneon a miss.O(1) — dictionary lookup + optional move-to-end.
- Return type:
- put(user_message, system_prompt, model, temperature, max_tokens, response)[source]
Store a response. Evicts LRU entry when at capacity.
O(1) amortised — dictionary insert + optional popitem(last=False).
- Return type:
- invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]
Remove a specific entry. Returns
Trueif it was present.- Return type:
- property stats: CacheStats
Return a snapshot of hit/miss/size counters.