ractogateway.cache.exact_cache

Exact-match key-value cache with LRU eviction and optional TTL.

Uses collections.OrderedDict for O(1) get / put / evict — a standard least-recently-used (LRU) cache pattern. No external dependencies.

Thread-safety is provided by a threading.Lock so the cache is safe to share across threads without any external synchronisation.

class ractogateway.cache.exact_cache.ExactMatchCache(max_size=1024, ttl_seconds=None)[source]

Ultra-low-latency key-value cache for identical LLM requests.

Parameters:

max_size (int) – LRU capacity. 0 = unlimited (no eviction).
ttl_seconds (float | None) – Entries older than ttl_seconds are treated as misses and transparently evicted. None disables expiry.
Example:: –
from ractogateway.cache import ExactMatchCache

cache = ExactMatchCache(max_size=512, ttl_seconds=3600)

# Wire into a kit: kit = OpenAIDeveloperKit(model=”gpt-4o”, exact_cache=cache)

get(user_message, system_prompt, model, temperature, max_tokens)[source]

Return a cached response or None on a miss.

O(1) — dictionary lookup + optional move-to-end.

put(user_message, system_prompt, model, temperature, max_tokens, response)[source]

Store a response. Evicts LRU entry when at capacity.

O(1) amortised — dictionary insert + optional popitem(last=False).

invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]

Remove a specific entry. Returns True if it was present.

Evict all cached entries and reset counters.