Caching

RactoGateway ships two complementary, thread-safe cache strategies that can be wired into any developer kit with zero friction.

Exact Match Cache

ExactMatchCache uses a SHA-256 key over the serialised request. On a cache hit the stored response is returned instantly — no network call, no token cost.

from ractogateway.cache import ExactMatchCache
from ractogateway import openai_developer_kit as gpt

cache = ExactMatchCache(max_size=1024)

kit = gpt.OpenAIDeveloperKit(
    model="gpt-4o",
    exact_cache=cache,
)

response = kit.chat(gpt.ChatConfig(user_message="What is 2+2?"))
# Second call with identical message hits cache immediately.
response = kit.chat(gpt.ChatConfig(user_message="What is 2+2?"))

Semantic Cache

SemanticCache uses vector similarity so semantically equivalent queries (even with different wording) return the cached answer.

from ractogateway.cache import SemanticCache

def my_embed_fn(text: str) -> list[float]:
    # Use any embedding function — OpenAI, Google, local model, etc.
    ...

semantic_cache = SemanticCache(embedder=my_embed_fn, similarity_threshold=0.92)

kit = gpt.OpenAIDeveloperKit(
    model="gpt-4o",
    semantic_cache=semantic_cache,
)

Using Both Together

kit = gpt.OpenAIDeveloperKit(
    model="gpt-4o",
    exact_cache=ExactMatchCache(max_size=512),
    semantic_cache=SemanticCache(embedder=my_embed_fn),
)

Exact match is checked first (O(1)), then semantic similarity, then a live API call.

Installation

Both caches are included in the base install. For precise token counting with tiktoken:

pip install ractogateway[cache]