ractogateway.truncation.truncator
Automated token truncation for long conversation histories.
When a conversation history is about to breach the model’s context-window
limit, TokenTruncator trims middle turns while preserving:
The beginning of the conversation (
keep_first_nmessages) — provides original task context.The most recent turns (
keep_last_nmessages) — preserves conversational continuity.The current system prompt and user message — always present.
The truncator operates on ChatConfig and
returns a new config object (Pydantic model_copy), leaving the original
unchanged.
No external dependencies are required for the default approximation mode
(len(text) // 4). Swap in tiktoken for exact OpenAI token counting.
- class ractogateway.truncation.truncator.TokenTruncator(config=None)[source]
Bases:
objectSmart conversation-history trimmer.
- Parameters:
config (
TruncationConfig|None) –TruncationConfiginstance. If omitted a default config is used (approximate counter, 8 k limit).
Examples
from ractogateway.truncation import TokenTruncator, TruncationConfig import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") truncator = TokenTruncator( TruncationConfig( token_counter=lambda t: len(enc.encode(t)), keep_first_n=2, keep_last_n=8, ) ) kit = OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)
- truncate(chat_config, model)[source]
Return a copy of chat_config with trimmed history if necessary.
If the total estimated token count (system prompt + history + user_message) fits within the model’s context limit, the original
ChatConfigis returned unchanged.- Parameters:
chat_config (
ChatConfig) – The chat configuration to potentially truncate.model (
str) – The resolved model name used to look up the context-window limit.
- Return type:
ChatConfig- Returns:
ChatConfig – A new
ChatConfiginstance with (possibly shorter) history. Theuser_messageand all other fields are preserved verbatim.