Token Truncation
TokenTruncator prevents context-window overflow by trimming conversation history while preserving the system prompt and most-recent turns.
Basic Usage
from ractogateway.truncation import TokenTruncator, TruncationConfig
from ractogateway import openai_developer_kit as gpt
truncator = TokenTruncator(TruncationConfig(
keep_first_n=2, # always keep the first N messages (e.g. system prompt)
keep_last_n=8, # always keep the last N messages
safety_margin=512, # reserve tokens for the model's response
))
kit = gpt.OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)
The truncator applies automatically before every API call.
Precise Token Counting (OpenAI)
By default TokenTruncator estimates token count as len(text) // 4. For exact counts install tiktoken:
pip install ractogateway[cache] # includes tiktoken
import tiktoken
from ractogateway.truncation import TokenTruncator, TruncationConfig
enc = tiktoken.encoding_for_model("gpt-4o")
truncator = TokenTruncator(TruncationConfig(
token_counter=lambda t: len(enc.encode(t)),
keep_last_n=10,
safety_margin=1024,
))
Model Context Limits
MODEL_CONTEXT_LIMITS is a pre-populated dict mapping common model names to their context window sizes:
from ractogateway.truncation import MODEL_CONTEXT_LIMITS
print(MODEL_CONTEXT_LIMITS["gpt-4o"]) # 128000
print(MODEL_CONTEXT_LIMITS["claude-3-5-sonnet-20241022"]) # 200000