ractogateway.cache

Caching subsystem for RactoGateway.

Two complementary cache strategies:

  • ExactMatchCache — SHA-256 keyed LRU cache for byte-for-byte identical requests (zero latency on hit, no embedding cost).

  • SemanticCache — vector-similarity cache that returns cached answers for semantically equivalent queries even when the wording differs.

Both are optional and thread-safe. Enable them by passing instances to any developer kit constructor:

from ractogateway.cache import ExactMatchCache, SemanticCache
from ractogateway import openai_developer_kit as gpt

kit = gpt.OpenAIDeveloperKit(
    model="gpt-4o",
    exact_cache=ExactMatchCache(max_size=1024),
    semantic_cache=SemanticCache(embed_fn=my_embed_fn),
)
class ractogateway.cache.CacheConfig(**data)[source]

Bases: BaseModel

Configuration for cache instances.

Parameters:
  • max_size (int) – Maximum number of entries to hold. When full, the least-recently-used entry is evicted (LRU policy). 0 means unlimited.

  • ttl_seconds (float | None) – Time-to-live in seconds. Entries older than this are treated as misses and evicted lazily. None disables TTL.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

max_size: int
ttl_seconds: float | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.cache.CacheEntry(**data)[source]

Bases: BaseModel

A single cached LLM response.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

response: LLMResponse
created_at: float
hit_count: int
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.cache.CacheStats(**data)[source]

Bases: BaseModel

Snapshot of cache performance counters.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

hits: int
misses: int
size: int
property total: int

Total requests seen by the cache.

property hit_rate: float

Fraction of requests that were cache hits (0.0-1.0).

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.cache.ExactMatchCache(max_size=1024, ttl_seconds=None)[source]

Bases: object

Ultra-low-latency key-value cache for identical LLM requests.

Parameters:
  • max_size (int) – LRU capacity. 0 = unlimited (no eviction).

  • ttl_seconds (float | None) – Entries older than ttl_seconds are treated as misses and transparently evicted. None disables expiry.

  • Example::

    from ractogateway.cache import ExactMatchCache

    cache = ExactMatchCache(max_size=512, ttl_seconds=3600)

    # Wire into a kit: kit = OpenAIDeveloperKit(model=”gpt-4o”, exact_cache=cache)

get(user_message, system_prompt, model, temperature, max_tokens)[source]

Return a cached response or None on a miss.

O(1) — dictionary lookup + optional move-to-end.

Return type:

LLMResponse | None

put(user_message, system_prompt, model, temperature, max_tokens, response)[source]

Store a response. Evicts LRU entry when at capacity.

O(1) amortised — dictionary insert + optional popitem(last=False).

Return type:

None

invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]

Remove a specific entry. Returns True if it was present.

Return type:

bool

clear()[source]

Evict all cached entries and reset counters.

Return type:

None

property stats: CacheStats

Return a snapshot of hit/miss/size counters.

class ractogateway.cache.SemanticCache(embed_fn, similarity_threshold=0.95, max_size=512, ttl_seconds=None)[source]

Bases: object

Vector-similarity cache — returns cached answers for semantically similar queries, costing $0 in API calls.

Parameters:
  • embed_fn (Callable[[str], list[float]]) – Any callable (text: str) -> list[float]. Called once per new query (cache miss) and once at put() time.

  • similarity_threshold (float) – Minimum cosine similarity to declare a hit. Default 0.95 is intentionally strict to avoid incorrect responses.

  • max_size (int) – Maximum number of entries (LRU eviction). 0 = unlimited.

  • ttl_seconds (float | None) – Optional per-entry TTL. None disables expiry.

Examples

import ractogateway.openai_developer_kit as gpt

kit = gpt.OpenAIDeveloperKit(model="gpt-4o")

def embed(text: str) -> list[float]:
    import openai
    r = openai.OpenAI().embeddings.create(
        model="text-embedding-3-small", input=text
    )
    return r.data[0].embedding

cache = SemanticCache(embed_fn=embed, similarity_threshold=0.95)
get(query)[source]

Embed query and return a cached response if cosine-sim ≥ threshold.

Returns None on a cache miss (caller should make the real API call and then invoke put()).

Complexity: O(n·d) where n = number of entries, d = embedding dim.

Return type:

LLMResponse | None

put(query, response)[source]

Embed query and store response for future similar queries.

Evicts LRU entry when at capacity.

Return type:

None

clear()[source]

Remove all entries and reset counters.

Return type:

None

property stats: CacheStats

Return a snapshot of hit/miss/size counters.

class ractogateway.cache.SemanticCacheConfig(**data)[source]

Bases: BaseModel

Configuration for the semantic similarity cache.

Parameters:
  • threshold (float) – Minimum cosine similarity (0.0-1.0) required to declare a cache hit. Defaults to 0.95 (very strict — avoids false positives).

  • max_size (int) – Maximum entries before LRU eviction. 0 means unlimited.

  • ttl_seconds (float | None) – Optional TTL; None disables expiry.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

threshold: float
max_size: int
ttl_seconds: float | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.cache.SemanticCacheEntry(**data)[source]

Bases: BaseModel

One entry in the semantic cache, pairing an embedding with a response.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

vector: list[float]
response: LLMResponse
created_at: float
hit_count: int
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_SemanticCacheEntry__context)[source]

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

Return type:

None