ractogateway.redis.rate_limiter
Distributed token-bucket rate limiter backed by Redis.
Uses a sliding 1-minute window: each request increments a counter stored at
"{key_prefix}:{user_id}:{unix_minute}". The key expires automatically after
60 seconds, so the window always reflects the current minute.
Why not a strict token-bucket?
A true token-bucket requires compare-and-swap semantics (Lua script) to be atomic. The sliding-window approach (INCRBY + EXPIRE in a pipeline) is atomic enough for rate-limiting purposes: it has a small race window at the boundary of two minute-windows, but this is acceptable — the same trade-off made by every major API gateway (Stripe, Cloudflare, etc.).
Example:
from ractogateway.redis import RedisRateLimiter, RateLimitConfig
limiter = RedisRateLimiter(
url="redis://localhost:6379/0",
config=RateLimitConfig(max_tokens_per_minute=5_000),
)
# In your request handler (before calling the LLM):
if not limiter.check_and_consume(user_id="user_42", tokens=estimated_tokens):
raise RuntimeError("Rate limit exceeded — try again in a minute.")
- class ractogateway.redis.rate_limiter.RedisRateLimiter(*, url='redis://localhost:6379/0', client=None, config=None)[source]
Bases:
objectFleet-wide token-budget rate limiter backed by a shared Redis instance.
- Parameters:
url (
str) – Redis connection URL. Ignored when client is provided.client (
Any|None) – Pre-builtredis.Redisclient. Useful for connection-pool sharing or unit-test mocking.config (
RateLimitConfig|None) –RateLimitConfigcontrolling the token budget and Redis key namespace. Defaults are applied whenNone.
- check_and_consume(user_id, tokens=1)[source]
Attempt to consume tokens from user_id’s budget.
Returns
Trueif the request is within budget (tokens are consumed), orFalseif the rate limit would be exceeded (no tokens consumed).The check-and-increment is done in a single Redis pipeline, making it safe against concurrent requests from the same user.
- get_remaining(user_id)[source]
Return the remaining token budget for the current minute.
Returns
max_tokens_per_minuteif the user has not made any requests in the current window.- Return type: