# Caching RactoGateway ships two complementary, thread-safe cache strategies that can be wired into any developer kit with zero friction. ## Exact Match Cache `ExactMatchCache` uses a SHA-256 key over the serialised request. On a cache hit the stored response is returned instantly — no network call, no token cost. ```python from ractogateway.cache import ExactMatchCache from ractogateway import openai_developer_kit as gpt cache = ExactMatchCache(max_size=1024) kit = gpt.OpenAIDeveloperKit( model="gpt-4o", exact_cache=cache, ) response = kit.chat(gpt.ChatConfig(user_message="What is 2+2?")) # Second call with identical message hits cache immediately. response = kit.chat(gpt.ChatConfig(user_message="What is 2+2?")) ``` ## Semantic Cache `SemanticCache` uses vector similarity so *semantically equivalent* queries (even with different wording) return the cached answer. ```python from ractogateway.cache import SemanticCache def my_embed_fn(text: str) -> list[float]: # Use any embedding function — OpenAI, Google, local model, etc. ... semantic_cache = SemanticCache(embedder=my_embed_fn, similarity_threshold=0.92) kit = gpt.OpenAIDeveloperKit( model="gpt-4o", semantic_cache=semantic_cache, ) ``` ## Using Both Together ```python kit = gpt.OpenAIDeveloperKit( model="gpt-4o", exact_cache=ExactMatchCache(max_size=512), semantic_cache=SemanticCache(embedder=my_embed_fn), ) ``` Exact match is checked first (O(1)), then semantic similarity, then a live API call. ## Installation Both caches are included in the base install. For precise token counting with tiktoken: ```bash pip install ractogateway[cache] ```