ractogateway.truncation
Automated token-truncation subsystem for RactoGateway.
Prevents context-window overflows by intelligently trimming conversation history while preserving the beginning and most-recent turns.
Quick start:
from ractogateway.truncation import TokenTruncator, TruncationConfig
truncator = TokenTruncator(TruncationConfig(
keep_first_n=2,
keep_last_n=8,
safety_margin=512,
))
# Wire into any kit:
kit = OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)
For exact token counts (OpenAI only):
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
cfg = TruncationConfig(token_counter=lambda t: len(enc.encode(t)))
truncator = TokenTruncator(cfg)
- class ractogateway.truncation.TokenTruncator(config=None)[source]
Bases:
objectSmart conversation-history trimmer.
- Parameters:
config (
TruncationConfig|None) –TruncationConfiginstance. If omitted a default config is used (approximate counter, 8 k limit).
Examples
from ractogateway.truncation import TokenTruncator, TruncationConfig import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") truncator = TokenTruncator( TruncationConfig( token_counter=lambda t: len(enc.encode(t)), keep_first_n=2, keep_last_n=8, ) ) kit = OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)
- truncate(chat_config, model)[source]
Return a copy of chat_config with trimmed history if necessary.
If the total estimated token count (system prompt + history + user_message) fits within the model’s context limit, the original
ChatConfigis returned unchanged.- Parameters:
chat_config (
ChatConfig) – The chat configuration to potentially truncate.model (
str) – The resolved model name used to look up the context-window limit.
- Return type:
ChatConfig- Returns:
ChatConfig – A new
ChatConfiginstance with (possibly shorter) history. Theuser_messageand all other fields are preserved verbatim.
- class ractogateway.truncation.TruncationConfig(**data)[source]
Bases:
BaseModelConfiguration for
TokenTruncator.- Parameters:
max_context_tokens (int | None) – Hard cap on total prompt tokens before calling the API. When
None, the truncator looks up the model inMODEL_CONTEXT_LIMITS(falling back to8 192).keep_first_n (int) – Number of history messages to always preserve from the start of the conversation (anchors context). Defaults to
2.keep_last_n (int) – Number of history messages to always preserve from the most recent end of the conversation. Defaults to
6.token_counter (Callable[[str], int]) –
Callable
(text: str) -> int. Defaults to the built-in approximate counter (len // 4). Swap fortiktokenfor exact OpenAI token counts:import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") config = TruncationConfig(token_counter=lambda t: len(enc.encode(t)))
safety_margin (int) – Extra token budget reserved beyond the system prompt and user message. Defaults to
512.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- keep_first_n: int
- keep_last_n: int
- safety_margin: int
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resolve_limit(model)[source]
Return the effective token limit for model.
Priority:
max_context_tokens→MODEL_CONTEXT_LIMITSlookup →_DEFAULT_CONTEXT.- Return type: