ractogateway.redis

Redis infrastructure layer for RactoGateway.

Three production-ready utilities that replace or complement the built-in in-process modules when running across multiple servers:

  • RedisExactCache — distributed drop-in for ExactMatchCache. Pass it directly to any developer-kit exact_cache= parameter; no other code changes required.

  • RedisRateLimiter — fleet-wide token-budget rate limiter. Uses a sliding 1-minute window so costs can never exceed max_tokens_per_minute per user_id, even across multiple server replicas.

  • RedisChatMemory — bounded sliding-window conversation history. Stores the last N message pairs in a Redis List so multi-turn conversations survive rolling deployments and are accessible to every replica.

Quick start:

pip install ractogateway[redis]

from ractogateway.redis import (
    RedisExactCache,
    RedisRateLimiter,
    RedisChatMemory,
    RateLimitConfig,
    ChatMemoryConfig,
)

REDIS_URL = "redis://localhost:6379/0"

# 1. Distributed response cache — wire into any kit:
cache = RedisExactCache(url=REDIS_URL, ttl_seconds=3600)
kit = OpenAIDeveloperKit(model="gpt-4o", exact_cache=cache)

# 2. Rate limiter — check before calling the LLM:
limiter = RedisRateLimiter(
    url=REDIS_URL,
    config=RateLimitConfig(max_tokens_per_minute=5_000),
)
if not limiter.check_and_consume(user_id, tokens=estimated_tokens):
    raise RuntimeError("Rate limit exceeded.")

# 3. Chat memory — persist and retrieve conversation history:
memory = RedisChatMemory(
    url=REDIS_URL,
    config=ChatMemoryConfig(max_turns=20, ttl_seconds=1800),
)
memory.append(conv_id, "user", user_message)
history = memory.get_history(conv_id)
class ractogateway.redis.ChatMemoryConfig(**data)[source]

Bases: BaseModel

Configuration for RedisChatMemory.

Parameters:
  • max_turns (int) – Maximum number of conversation turns to retain. Each turn consists of one user message and one assistant message, so up to max_turns * 2 raw messages are stored per conversation.

  • ttl_seconds (float | None) – Optional TTL. Every append() call refreshes the expiry on the underlying Redis list. None disables expiry.

  • key_prefix (str) – Redis key namespace. Each conversation is stored at "{key_prefix}:{conversation_id}".

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

max_turns: int
ttl_seconds: float | None
key_prefix: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.redis.RateLimitConfig(**data)[source]

Bases: BaseModel

Configuration for RedisRateLimiter.

Parameters:
  • max_tokens_per_minute (int) – Maximum LLM tokens a single user_id may consume in any 60-second window.

  • key_prefix (str) – Redis key namespace. All rate-limit keys are stored under "{key_prefix}:{user_id}:{unix_minute}".

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

max_tokens_per_minute: int
key_prefix: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.redis.RedisChatMemory(*, url='redis://localhost:6379/0', client=None, config=None)[source]

Bases: object

Shared, bounded conversation history backed by Redis.

Parameters:
  • url (str) – Redis connection URL. Ignored when client is provided.

  • client (Any | None) – Pre-built redis.Redis client.

  • config (ChatMemoryConfig | None) – ChatMemoryConfig controlling turn limit, TTL, and key namespace. Defaults are applied when None.

append(conversation_id, role, content)[source]

Append a message to the conversation history.

After appending, the list is trimmed to the last config.max_turns * 2 messages (oldest dropped first). If a TTL is configured, it is refreshed on every append so the window slides with activity.

Parameters:
  • conversation_id (str) – Opaque identifier for the conversation (e.g. session UUID).

  • role (str) – The message author: "user", "assistant", or "system".

  • content (str) – Text content of the message.

Return type:

None

get_history(conversation_id)[source]

Return all stored messages as a list of {"role", "content"} dicts.

The list is ordered oldest-first, matching the ChatConfig.history convention.

Returns an empty list when the conversation does not exist or has expired.

Return type:

list[dict[str, str]]

clear(conversation_id)[source]

Delete the conversation history from Redis.

Return type:

None

count(conversation_id)[source]

Return the number of messages stored for this conversation.

Returns 0 when the conversation does not exist.

Return type:

int

class ractogateway.redis.RedisExactCache(*, url='redis://localhost:6379/0', client=None, ttl_seconds=None, key_prefix='ractogateway:exact')[source]

Bases: object

Distributed exact-match LRU cache backed by Redis.

Parameters:
  • url (str) – Redis connection URL (e.g. "redis://localhost:6379/0"). Ignored when client is provided.

  • client (Any | None) – Pre-built redis.Redis (or compatible) client. Useful when you manage the connection pool yourself or use a mock in tests.

  • ttl_seconds (float | None) – Optional TTL for each entry. Passed directly to Redis SET EX. None means entries never expire (Redis default).

  • key_prefix (str) – Namespace for all Redis keys managed by this instance.

get(user_message, system_prompt, model, temperature, max_tokens)[source]

Return a cached response or None on a miss.

O(1) Redis GET.

Return type:

LLMResponse | None

put(user_message, system_prompt, model, temperature, max_tokens, response)[source]

Store a response in Redis.

O(1) Redis SET [EX ttl].

Return type:

None

invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]

Remove a specific entry. Returns True if it was present.

Return type:

bool

clear()[source]

Delete all entries matching this instance’s key prefix.

Uses SCAN to iterate safely (no KEYS * in production). Also resets in-memory stats counters.

Return type:

None

property stats: CacheStats

Return a snapshot of hit/miss counters plus current Redis key count.

class ractogateway.redis.RedisRateLimiter(*, url='redis://localhost:6379/0', client=None, config=None)[source]

Bases: object

Fleet-wide token-budget rate limiter backed by a shared Redis instance.

Parameters:
  • url (str) – Redis connection URL. Ignored when client is provided.

  • client (Any | None) – Pre-built redis.Redis client. Useful for connection-pool sharing or unit-test mocking.

  • config (RateLimitConfig | None) – RateLimitConfig controlling the token budget and Redis key namespace. Defaults are applied when None.

check_and_consume(user_id, tokens=1)[source]

Attempt to consume tokens from user_id’s budget.

Returns True if the request is within budget (tokens are consumed), or False if the rate limit would be exceeded (no tokens consumed).

The check-and-increment is done in a single Redis pipeline, making it safe against concurrent requests from the same user.

Parameters:
  • user_id (str) – Opaque identifier for the caller (e.g. API key, user UUID).

  • tokens (int) – Number of tokens to consume. Defaults to 1 for request-count limiting; pass the estimated LLM token count for cost-based limiting.

Return type:

bool

get_remaining(user_id)[source]

Return the remaining token budget for the current minute.

Returns max_tokens_per_minute if the user has not made any requests in the current window.

Return type:

int

reset(user_id)[source]

Delete all rate-limit keys for user_id (current and any stale windows).

Intended for admin / testing use. Uses SCAN to avoid blocking.

Return type:

None