# Token Truncation `TokenTruncator` prevents context-window overflow by trimming conversation history while preserving the system prompt and most-recent turns. ## Basic Usage ```python from ractogateway.truncation import TokenTruncator, TruncationConfig from ractogateway import openai_developer_kit as gpt truncator = TokenTruncator(TruncationConfig( keep_first_n=2, # always keep the first N messages (e.g. system prompt) keep_last_n=8, # always keep the last N messages safety_margin=512, # reserve tokens for the model's response )) kit = gpt.OpenAIDeveloperKit(model="gpt-4o", truncator=truncator) ``` The truncator applies automatically before every API call. ## Precise Token Counting (OpenAI) By default `TokenTruncator` estimates token count as `len(text) // 4`. For exact counts install `tiktoken`: ```bash pip install ractogateway[cache] # includes tiktoken ``` ```python import tiktoken from ractogateway.truncation import TokenTruncator, TruncationConfig enc = tiktoken.encoding_for_model("gpt-4o") truncator = TokenTruncator(TruncationConfig( token_counter=lambda t: len(enc.encode(t)), keep_last_n=10, safety_margin=1024, )) ``` ## Model Context Limits `MODEL_CONTEXT_LIMITS` is a pre-populated dict mapping common model names to their context window sizes: ```python from ractogateway.truncation import MODEL_CONTEXT_LIMITS print(MODEL_CONTEXT_LIMITS["gpt-4o"]) # 128000 print(MODEL_CONTEXT_LIMITS["claude-3-5-sonnet-20241022"]) # 200000 ```