ractogateway.redis
Redis infrastructure layer for RactoGateway.
Three production-ready utilities that replace or complement the built-in in-process modules when running across multiple servers:
RedisExactCache— distributed drop-in forExactMatchCache. Pass it directly to any developer-kitexact_cache=parameter; no other code changes required.RedisRateLimiter— fleet-wide token-budget rate limiter. Uses a sliding 1-minute window so costs can never exceedmax_tokens_per_minuteperuser_id, even across multiple server replicas.RedisChatMemory— bounded sliding-window conversation history. Stores the last N message pairs in a Redis List so multi-turn conversations survive rolling deployments and are accessible to every replica.
Quick start:
pip install ractogateway[redis]
from ractogateway.redis import (
RedisExactCache,
RedisRateLimiter,
RedisChatMemory,
RateLimitConfig,
ChatMemoryConfig,
)
REDIS_URL = "redis://localhost:6379/0"
# 1. Distributed response cache — wire into any kit:
cache = RedisExactCache(url=REDIS_URL, ttl_seconds=3600)
kit = OpenAIDeveloperKit(model="gpt-4o", exact_cache=cache)
# 2. Rate limiter — check before calling the LLM:
limiter = RedisRateLimiter(
url=REDIS_URL,
config=RateLimitConfig(max_tokens_per_minute=5_000),
)
if not limiter.check_and_consume(user_id, tokens=estimated_tokens):
raise RuntimeError("Rate limit exceeded.")
# 3. Chat memory — persist and retrieve conversation history:
memory = RedisChatMemory(
url=REDIS_URL,
config=ChatMemoryConfig(max_turns=20, ttl_seconds=1800),
)
memory.append(conv_id, "user", user_message)
history = memory.get_history(conv_id)
- class ractogateway.redis.ChatMemoryConfig(**data)[source]
Bases:
BaseModelConfiguration for
RedisChatMemory.- Parameters:
max_turns (int) – Maximum number of conversation turns to retain. Each turn consists of one user message and one assistant message, so up to
max_turns * 2raw messages are stored per conversation.ttl_seconds (float | None) – Optional TTL. Every
append()call refreshes the expiry on the underlying Redis list.Nonedisables expiry.key_prefix (str) – Redis key namespace. Each conversation is stored at
"{key_prefix}:{conversation_id}".
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- max_turns: int
- key_prefix: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.redis.RateLimitConfig(**data)[source]
Bases:
BaseModelConfiguration for
RedisRateLimiter.- Parameters:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- max_tokens_per_minute: int
- key_prefix: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ractogateway.redis.RedisChatMemory(*, url='redis://localhost:6379/0', client=None, config=None)[source]
Bases:
objectShared, bounded conversation history backed by Redis.
- Parameters:
url (
str) – Redis connection URL. Ignored when client is provided.config (
ChatMemoryConfig|None) –ChatMemoryConfigcontrolling turn limit, TTL, and key namespace. Defaults are applied whenNone.
- append(conversation_id, role, content)[source]
Append a message to the conversation history.
After appending, the list is trimmed to the last
config.max_turns * 2messages (oldest dropped first). If a TTL is configured, it is refreshed on every append so the window slides with activity.
- get_history(conversation_id)[source]
Return all stored messages as a list of
{"role", "content"}dicts.The list is ordered oldest-first, matching the
ChatConfig.historyconvention.Returns an empty list when the conversation does not exist or has expired.
- class ractogateway.redis.RedisExactCache(*, url='redis://localhost:6379/0', client=None, ttl_seconds=None, key_prefix='ractogateway:exact')[source]
Bases:
objectDistributed exact-match LRU cache backed by Redis.
- Parameters:
url (
str) – Redis connection URL (e.g."redis://localhost:6379/0"). Ignored when client is provided.client (
Any|None) – Pre-builtredis.Redis(or compatible) client. Useful when you manage the connection pool yourself or use a mock in tests.ttl_seconds (
float|None) – Optional TTL for each entry. Passed directly to RedisSET EX.Nonemeans entries never expire (Redis default).key_prefix (
str) – Namespace for all Redis keys managed by this instance.
- get(user_message, system_prompt, model, temperature, max_tokens)[source]
Return a cached response or
Noneon a miss.O(1) Redis GET.
- Return type:
- put(user_message, system_prompt, model, temperature, max_tokens, response)[source]
Store a response in Redis.
O(1) Redis SET [EX ttl].
- Return type:
- invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]
Remove a specific entry. Returns
Trueif it was present.- Return type:
- clear()[source]
Delete all entries matching this instance’s key prefix.
Uses SCAN to iterate safely (no KEYS * in production). Also resets in-memory stats counters.
- Return type:
- property stats: CacheStats
Return a snapshot of hit/miss counters plus current Redis key count.
- class ractogateway.redis.RedisRateLimiter(*, url='redis://localhost:6379/0', client=None, config=None)[source]
Bases:
objectFleet-wide token-budget rate limiter backed by a shared Redis instance.
- Parameters:
url (
str) – Redis connection URL. Ignored when client is provided.client (
Any|None) – Pre-builtredis.Redisclient. Useful for connection-pool sharing or unit-test mocking.config (
RateLimitConfig|None) –RateLimitConfigcontrolling the token budget and Redis key namespace. Defaults are applied whenNone.
- check_and_consume(user_id, tokens=1)[source]
Attempt to consume tokens from user_id’s budget.
Returns
Trueif the request is within budget (tokens are consumed), orFalseif the rate limit would be exceeded (no tokens consumed).The check-and-increment is done in a single Redis pipeline, making it safe against concurrent requests from the same user.
- get_remaining(user_id)[source]
Return the remaining token budget for the current minute.
Returns
max_tokens_per_minuteif the user has not made any requests in the current window.- Return type: