Token Truncation

TokenTruncator prevents context-window overflow by trimming conversation history while preserving the system prompt and most-recent turns.

Basic Usage

from ractogateway.truncation import TokenTruncator, TruncationConfig
from ractogateway import openai_developer_kit as gpt

truncator = TokenTruncator(TruncationConfig(
    keep_first_n=2,      # always keep the first N messages (e.g. system prompt)
    keep_last_n=8,       # always keep the last N messages
    safety_margin=512,   # reserve tokens for the model's response
))

kit = gpt.OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)

The truncator applies automatically before every API call.

Precise Token Counting (OpenAI)

By default TokenTruncator estimates token count as len(text) // 4. For exact counts install tiktoken:

pip install ractogateway[cache]   # includes tiktoken
import tiktoken
from ractogateway.truncation import TokenTruncator, TruncationConfig

enc = tiktoken.encoding_for_model("gpt-4o")
truncator = TokenTruncator(TruncationConfig(
    token_counter=lambda t: len(enc.encode(t)),
    keep_last_n=10,
    safety_margin=1024,
))

Model Context Limits

MODEL_CONTEXT_LIMITS is a pre-populated dict mapping common model names to their context window sizes:

from ractogateway.truncation import MODEL_CONTEXT_LIMITS

print(MODEL_CONTEXT_LIMITS["gpt-4o"])        # 128000
print(MODEL_CONTEXT_LIMITS["claude-3-5-sonnet-20241022"])  # 200000