ractogateway.cache.exact_cache

Exact-match key-value cache with LRU eviction and optional TTL.

Uses collections.OrderedDict for O(1) get / put / evict — a standard least-recently-used (LRU) cache pattern. No external dependencies.

Thread-safety is provided by a threading.Lock so the cache is safe to share across threads without any external synchronisation.

class ractogateway.cache.exact_cache.ExactMatchCache(max_size=1024, ttl_seconds=None)[source]

Bases: object

Ultra-low-latency key-value cache for identical LLM requests.

Parameters:
  • max_size (int) – LRU capacity. 0 = unlimited (no eviction).

  • ttl_seconds (float | None) – Entries older than ttl_seconds are treated as misses and transparently evicted. None disables expiry.

  • Example::

    from ractogateway.cache import ExactMatchCache

    cache = ExactMatchCache(max_size=512, ttl_seconds=3600)

    # Wire into a kit: kit = OpenAIDeveloperKit(model=”gpt-4o”, exact_cache=cache)

get(user_message, system_prompt, model, temperature, max_tokens)[source]

Return a cached response or None on a miss.

O(1) — dictionary lookup + optional move-to-end.

Return type:

LLMResponse | None

put(user_message, system_prompt, model, temperature, max_tokens, response)[source]

Store a response. Evicts LRU entry when at capacity.

O(1) amortised — dictionary insert + optional popitem(last=False).

Return type:

None

invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]

Remove a specific entry. Returns True if it was present.

Return type:

bool

clear()[source]

Evict all cached entries and reset counters.

Return type:

None

property stats: CacheStats

Return a snapshot of hit/miss/size counters.