ractogateway.truncation

Automated token-truncation subsystem for RactoGateway.

Prevents context-window overflows by intelligently trimming conversation history while preserving the beginning and most-recent turns.

Quick start:

from ractogateway.truncation import TokenTruncator, TruncationConfig

truncator = TokenTruncator(TruncationConfig(
    keep_first_n=2,
    keep_last_n=8,
    safety_margin=512,
))

# Wire into any kit:
kit = OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)

For exact token counts (OpenAI only):

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
cfg = TruncationConfig(token_counter=lambda t: len(enc.encode(t)))
truncator = TokenTruncator(cfg)

class ractogateway.truncation.TokenTruncator(config=None)[source]

Bases: object

Smart conversation-history trimmer.

Parameters:: config (TruncationConfig | None) – TruncationConfig instance. If omitted a default config is used (approximate counter, 8 k limit).

Examples

from ractogateway.truncation import TokenTruncator, TruncationConfig
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")
truncator = TokenTruncator(
    TruncationConfig(
        token_counter=lambda t: len(enc.encode(t)),
        keep_first_n=2,
        keep_last_n=8,
    )
)
kit = OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)

truncate(chat_config, model)[source]

Return a copy of chat_config with trimmed history if necessary.

If the total estimated token count (system prompt + history + user_message) fits within the model’s context limit, the original ChatConfig is returned unchanged.

Parameters:

chat_config (ChatConfig) – The chat configuration to potentially truncate.
model (str) – The resolved model name used to look up the context-window limit.

Return type:

ChatConfig

Returns:

ChatConfig – A new ChatConfig instance with (possibly shorter) history. The user_message and all other fields are preserved verbatim.

estimate_tokens(text)[source]

Convenience wrapper around the configured token counter.

Return type:: int

class ractogateway.truncation.TruncationConfig(**data)[source]

Bases: BaseModel

Configuration for TokenTruncator.

Parameters:

max_context_tokens (int | None) – Hard cap on total prompt tokens before calling the API. When None, the truncator looks up the model in MODEL_CONTEXT_LIMITS (falling back to 8 192).
keep_first_n (int) – Number of history messages to always preserve from the start of the conversation (anchors context). Defaults to 2.
keep_last_n (int) – Number of history messages to always preserve from the most recent end of the conversation. Defaults to 6.
token_counter (Callable[[str], int]) –
Callable (text: str) -> int. Defaults to the built-in approximate counter (len // 4). Swap for tiktoken for exact OpenAI token counts:
```
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
config = TruncationConfig(token_counter=lambda t: len(enc.encode(t)))
```
safety_margin (int) – Extra token budget reserved beyond the system prompt and user message. Defaults to 512.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

max_context_tokens: int | None

keep_first_n: int

keep_last_n: int

token_counter: Callable[[str], int]

safety_margin: int

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resolve_limit(model)[source]

Return the effective token limit for model.

Priority: max_context_tokens → MODEL_CONTEXT_LIMITS lookup → _DEFAULT_CONTEXT.

Return type:: int

model_post_init(_TruncationConfig__context)[source]

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

Return type:: None