ractogateway.redis.rate_limiter

Distributed token-bucket rate limiter backed by Redis.

Uses a sliding 1-minute window: each request increments a counter stored at "{key_prefix}:{user_id}:{unix_minute}". The key expires automatically after 60 seconds, so the window always reflects the current minute.

Why not a strict token-bucket?

A true token-bucket requires compare-and-swap semantics (Lua script) to be atomic. The sliding-window approach (INCRBY + EXPIRE in a pipeline) is atomic enough for rate-limiting purposes: it has a small race window at the boundary of two minute-windows, but this is acceptable — the same trade-off made by every major API gateway (Stripe, Cloudflare, etc.).

Example:

from ractogateway.redis import RedisRateLimiter, RateLimitConfig

limiter = RedisRateLimiter(
    url="redis://localhost:6379/0",
    config=RateLimitConfig(max_tokens_per_minute=5_000),
)

# In your request handler (before calling the LLM):
if not limiter.check_and_consume(user_id="user_42", tokens=estimated_tokens):
    raise RuntimeError("Rate limit exceeded — try again in a minute.")
class ractogateway.redis.rate_limiter.RedisRateLimiter(*, url='redis://localhost:6379/0', client=None, config=None)[source]

Bases: object

Fleet-wide token-budget rate limiter backed by a shared Redis instance.

Parameters:
  • url (str) – Redis connection URL. Ignored when client is provided.

  • client (Any | None) – Pre-built redis.Redis client. Useful for connection-pool sharing or unit-test mocking.

  • config (RateLimitConfig | None) – RateLimitConfig controlling the token budget and Redis key namespace. Defaults are applied when None.

check_and_consume(user_id, tokens=1)[source]

Attempt to consume tokens from user_id’s budget.

Returns True if the request is within budget (tokens are consumed), or False if the rate limit would be exceeded (no tokens consumed).

The check-and-increment is done in a single Redis pipeline, making it safe against concurrent requests from the same user.

Parameters:
  • user_id (str) – Opaque identifier for the caller (e.g. API key, user UUID).

  • tokens (int) – Number of tokens to consume. Defaults to 1 for request-count limiting; pass the estimated LLM token count for cost-based limiting.

Return type:

bool

get_remaining(user_id)[source]

Return the remaining token budget for the current minute.

Returns max_tokens_per_minute if the user has not made any requests in the current window.

Return type:

int

reset(user_id)[source]

Delete all rate-limit keys for user_id (current and any stale windows).

Intended for admin / testing use. Uses SCAN to avoid blocking.

Return type:

None