Caching
RactoGateway ships two complementary, thread-safe cache strategies that can be wired into any developer kit with zero friction.
Exact Match Cache
ExactMatchCache uses a SHA-256 key over the serialised request. On a cache hit the stored response is returned instantly — no network call, no token cost.
from ractogateway.cache import ExactMatchCache
from ractogateway import openai_developer_kit as gpt
cache = ExactMatchCache(max_size=1024)
kit = gpt.OpenAIDeveloperKit(
model="gpt-4o",
exact_cache=cache,
)
response = kit.chat(gpt.ChatConfig(user_message="What is 2+2?"))
# Second call with identical message hits cache immediately.
response = kit.chat(gpt.ChatConfig(user_message="What is 2+2?"))
Semantic Cache
SemanticCache uses vector similarity so semantically equivalent queries (even with different wording) return the cached answer.
from ractogateway.cache import SemanticCache
def my_embed_fn(text: str) -> list[float]:
# Use any embedding function — OpenAI, Google, local model, etc.
...
semantic_cache = SemanticCache(embedder=my_embed_fn, similarity_threshold=0.92)
kit = gpt.OpenAIDeveloperKit(
model="gpt-4o",
semantic_cache=semantic_cache,
)
Using Both Together
kit = gpt.OpenAIDeveloperKit(
model="gpt-4o",
exact_cache=ExactMatchCache(max_size=512),
semantic_cache=SemanticCache(embedder=my_embed_fn),
)
Exact match is checked first (O(1)), then semantic similarity, then a live API call.
Installation
Both caches are included in the base install. For precise token counting with tiktoken:
pip install ractogateway[cache]