ractogateway.openai_developer_kit

OpenAI Developer Kit — from ractogateway import openai_developer_kit as gpt.

Short usage:

from ractogateway import openai_developer_kit as gpt

kit = gpt.Chat(model="gpt-4o")          # short alias
kit = gpt.OpenAIDeveloperKit(model="gpt-4o")  # full name (same class)

class ractogateway.openai_developer_kit.BatchItem(**data)[source]

Bases: BaseModel

A single request within a batch job.

Parameters:

custom_id (str) – User-supplied identifier used to correlate results. Must be unique within a batch.
user_message (str) – The end-user’s query string (equivalent to ChatConfig.user_message).
temperature (float) – Sampling temperature. Defaults to 0.0.
max_tokens (int) – Maximum tokens for the completion. Defaults to 4096.
extra (dict[str, Any]) – Provider-specific pass-through kwargs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

custom_id: str

user_message: str

temperature: float

max_tokens: int

extra: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.BatchJobInfo(**data)[source]

Bases: BaseModel

Metadata about a submitted batch job.

Returned by submit_batch() and poll_status().

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

job_id: str

provider: str

status: BatchStatus

created_at: float

request_count: int

raw: Any

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.BatchResult(**data)[source]

Bases: BaseModel

The outcome of a single BatchItem.

A result is always present in the results list returned by get_results(); check error to detect failures.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

custom_id: str

response: LLMResponse | None

error: str | None

raw: Any

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property ok: bool: True when the request succeeded (no error, response present).

class ractogateway.openai_developer_kit.BatchStatus(*values)[source]

Bases: str, Enum

Processing state of a batch job.

Maps to the union of OpenAI and Anthropic batch status strings.

PENDING = 'pending'

IN_PROGRESS = 'in_progress'

FINALIZING = 'finalizing'

COMPLETED = 'completed'

FAILED = 'failed'

EXPIRED = 'expired'

CANCELLING = 'cancelling'

CANCELLED = 'cancelled'

ractogateway.openai_developer_kit.Chat: Short alias — gpt.Chat(model="gpt-4o") is identical to gpt.OpenAIDeveloperKit(...).

class ractogateway.openai_developer_kit.ChatConfig(**data)[source]

Bases: BaseModel

Validated input for every chat / achat / stream / astream call.

Pass a single ChatConfig to any developer-kit method. Every field has a safe default so you only need to supply what you actually need.

Minimal example:

config = ChatConfig(user_message="Explain Python generators.")
response = kit.chat(config)

Vision / multimodal example:

from ractogateway.prompts.engine import RactoFile

config = ChatConfig(
    user_message="Describe this chart.",
    attachments=[RactoFile.from_path("sales_q4.png")],
)

Structured JSON output example:

class Sentiment(BaseModel):
    label: str
    score: float

config = ChatConfig(
    user_message="I love this library!",
    response_model=Sentiment,
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

user_message: str

prompt: RactoPrompt | None

temperature: float

max_tokens: int

tools: ToolRegistry | None

auto_execute_tools: bool

max_tool_turns: int

response_model: type[BaseModel] | None

max_validation_retries: int

history: list[Message]

attachments: list[RactoFile] | None

chain_of_thought: bool

native_thinking: bool

thinking_budget: int

extra: dict[str, Any]

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.CostAwareRouter(tiers)[source]

Bases: object

Routes LLM requests to the appropriate model tier based on message complexity — without making any extra API calls.

Parameters:

tiers (list[RoutingTier]) – Ordered list of RoutingTier objects, sorted ascending by max_score (cheapest first). The last tier’s max_score should be 100 to act as fallback.

Raises:

ValueError – If tiers is empty or not sorted ascending by max_score.
Example — 3-tier OpenAI ladder:: – from ractogateway.routing import CostAwareRouter, RoutingTier router = CostAwareRouter([ RoutingTier(model=”gpt-4o-mini”, max_score=30), RoutingTier(model=”gpt-4o”, max_score=70), RoutingTier(model=”o3-mini”, max_score=100), ]) model = router.route(“What is 2+2?”) # → “gpt-4o-mini” model = router.route(“Analyze the trade-offs between Redis Cluster and ” “Cassandra for a write-heavy time-series workload …”) # → “o3-mini”
Example — binary routing (2 tiers):: – router = CostAwareRouter([ RoutingTier(model=”claude-haiku-4-5-20251001”, max_score=40), RoutingTier(model=”claude-opus-4-6”, max_score=100), ])

score(text)[source]

Compute a complexity score in [0, 100] for text.

A higher score means a more complex task.

Return type:: int

Algorithm

token_pts = min(len(text)//4, SAT) * (MAX_TP / SAT) kw_pts = min(matches * PPK, MAX_KP) score = clamp(token_pts + kw_pts, 0, 100)

route(text)[source]

Return the model identifier for text.

Walks tiers (cheapest first) and returns the first model whose max_score ≥ complexity_score. Always returns a model because the last tier has max_score == 100 (validated at construction).

Complexity: O(k) where k = number of tiers.

Return type:: str

property tiers: tuple[RoutingTier, ...]: Immutable view of the configured tiers.

class ractogateway.openai_developer_kit.EmbeddingConfig(**data)[source]

Bases: BaseModel

Validated input for embed / aembed calls.

Example:

config = EmbeddingConfig(texts=["Hello world", "Goodbye world"])
response = kit.embed(config)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

texts: list[str]

model: str | None

dimensions: int | None

extra: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.EmbeddingResponse(**data)[source]

Bases: BaseModel

Unified response from an embedding call.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

vectors: list[EmbeddingVector]

model: str

usage: dict[str, int]

raw: Any

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.EmbeddingVector(**data)[source]

Bases: BaseModel

A single embedding result.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

index: int

text: str

embedding: list[float]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.ExactMatchCache(max_size=1024, ttl_seconds=None)[source]

Bases: object

Ultra-low-latency key-value cache for identical LLM requests.

Parameters:

max_size (int) – LRU capacity. 0 = unlimited (no eviction).
ttl_seconds (float | None) – Entries older than ttl_seconds are treated as misses and transparently evicted. None disables expiry.
Example:: –
from ractogateway.cache import ExactMatchCache

cache = ExactMatchCache(max_size=512, ttl_seconds=3600)

# Wire into a kit: kit = OpenAIDeveloperKit(model=”gpt-4o”, exact_cache=cache)

get(user_message, system_prompt, model, temperature, max_tokens)[source]

Return a cached response or None on a miss.

O(1) — dictionary lookup + optional move-to-end.

Return type:: LLMResponse | None

put(user_message, system_prompt, model, temperature, max_tokens, response)[source]

Store a response. Evicts LRU entry when at capacity.

O(1) amortised — dictionary insert + optional popitem(last=False).

Return type:: None

invalidate(user_message, system_prompt, model, temperature, max_tokens)[source]

Remove a specific entry. Returns True if it was present.

Return type:: bool

clear()[source]

Evict all cached entries and reset counters.

Return type:: None

property stats: CacheStats: Return a snapshot of hit/miss/size counters.

class ractogateway.openai_developer_kit.FinishReason(*values)[source]

Bases: str, Enum

Why the model stopped generating.

STOP = 'stop'

TOOL_CALL = 'tool_call'

LENGTH = 'length'

CONTENT_FILTER = 'content_filter'

ERROR = 'error'

class ractogateway.openai_developer_kit.LLMResponse(**data)[source]

Bases: BaseModel

Unified, provider-agnostic response envelope.

Every adapter’s run() method returns one of these, regardless of whether the underlying provider is OpenAI, Gemini, or Anthropic.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

content: str | None

thinking: str | None

parsed: dict[str, Any] | list[Any] | None

tool_calls: list[ToolCallResult]

finish_reason: FinishReason

usage: dict[str, int]

raw: Any

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.Message(**data)[source]

Bases: BaseModel

A single conversation turn.

Used inside ChatConfig.history to provide prior conversation context to the model for multi-turn conversations.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

role: MessageRole

content: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.MessageRole(*values)[source]

Bases: str, Enum

Role of a single message in a conversation.

SYSTEM = 'system'

USER = 'user'

ASSISTANT = 'assistant'

class ractogateway.openai_developer_kit.OpenAIBatchProcessor(model='gpt-4o-mini', *, api_key=None, base_url=None, default_prompt=None)[source]

Bases: object

Submit thousands of chat-completion requests to OpenAI’s Batch API at ~50 % of standard API cost.

Parameters:

model (str) – Chat model to use for all items in a batch (e.g. "gpt-4o-mini").
api_key (str | None) – OpenAI API key. Falls back to OPENAI_API_KEY env var.
base_url (str | None) – Custom base URL (Azure OpenAI / proxy).
default_prompt (RactoPrompt | None) – RACTO prompt used as the system message for every batch item.

submit_batch / asubmit_batch:: Upload JSONL and create batch job → returns BatchJobInfo.

poll_status / apoll_status:: Fetch current job state → returns updated BatchJobInfo.

get_results / aget_results:: Download and parse completed job results → list[BatchResult].

submit_and_wait / asubmit_and_wait:: Convenience: submit + poll until done + return results.

provider: str = 'openai'

submit_batch(items, *, prompt=None, completion_window='24h')[source]

Upload items as a JSONL file and create an OpenAI batch job.

Returns immediately with a BatchJobInfo (status = IN_PROGRESS).

Return type:: BatchJobInfo

poll_status(job_id)[source]

Fetch the current status of batch job job_id.

Return type:: BatchJobInfo

get_results(job_id)[source]

Download and parse results for a completed batch job.

Raises:: RuntimeError – If the job is not yet completed.
Return type:: list[BatchResult]

submit_and_wait(items, *, prompt=None, completion_window='24h', poll_interval_s=60.0, max_wait_s=86400.0)[source]

Submit a batch and block until it completes, then return results.

Parameters:

poll_interval_s (float) – Seconds between status-poll API calls. Default 60.0.
max_wait_s (float) – Maximum total seconds to wait. Default 86400 (24 h).

Raises:

TimeoutError – If the batch does not complete within max_wait_s.
RuntimeError – If the batch job fails or is cancelled.

Return type:

list[BatchResult]

async asubmit_batch(items, *, prompt=None, completion_window='24h')[source]

Async variant of submit_batch().

Return type:: BatchJobInfo

async apoll_status(job_id)[source]

Async variant of poll_status().

Return type:: BatchJobInfo

async aget_results(job_id)[source]

Async variant of get_results().

Return type:: list[BatchResult]

async asubmit_and_wait(items, *, prompt=None, completion_window='24h', poll_interval_s=60.0, max_wait_s=86400.0)[source]

Async variant of submit_and_wait().

Return type:: list[BatchResult]

class ractogateway.openai_developer_kit.OpenAIDeveloperKit(model='gpt-4o', *, api_key=None, base_url=None, embedding_model='text-embedding-3-small', default_prompt=None, exact_cache=None, semantic_cache=None, router=None, truncator=None, tracer=None, metrics=None)[source]

Bases: object

Complete OpenAI developer kit — chat, stream, embeddings, and optional performance/cost optimisation middleware.

Parameters:

model (str) – Chat model (e.g. "gpt-4o", "gpt-4o-mini"). Use "auto" when a CostAwareRouter is provided — the router will select the model per-request.
api_key (str | None) – OpenAI API key. Falls back to OPENAI_API_KEY env var.
base_url (str | None) – Custom base URL (Azure OpenAI or proxy).
embedding_model (str) – Default embedding model. Defaults to "text-embedding-3-small".
default_prompt (RactoPrompt | None) – RACTO prompt used when ChatConfig.prompt is None.
exact_cache (ExactMatchCache | None) – Optional ExactMatchCache. Serves byte-identical requests from memory at zero cost.
semantic_cache (SemanticCache | None) – Optional SemanticCache. Returns cached answers for semantically similar queries (similarity ≥ threshold).
router (CostAwareRouter | None) – Optional CostAwareRouter. Selects the cheapest model that can handle each request’s complexity. Required when model="auto".
truncator (TokenTruncator | None) – Optional TokenTruncator. Automatically trims conversation history to fit the model’s context window before each API call.
tracer (RactoTracer | None) – Optional RactoTracer. Emits OpenTelemetry spans for every chat, stream, and embed call. Requires pip install ractogateway[telemetry].
metrics (GatewayMetricsMiddleware | None) – Optional GatewayMetricsMiddleware. Records Prometheus metrics (latency, tokens, cost, cache hit/miss). Requires pip install ractogateway[prometheus].

provider: str = 'openai'

chat(config)[source]

Synchronous chat completion with optional middleware pipeline.

Middleware order: truncate → exact cache → semantic cache → route model → API call → write caches → record telemetry.

Return type:: LLMResponse

async achat(config)[source]

Async chat completion with optional middleware pipeline.

Return type:: LLMResponse

stream(config)[source]

Synchronous streaming — yields StreamChunk objects.

Example:

for chunk in kit.stream(config):
    print(chunk.delta.text, end="", flush=True)
    if chunk.is_final:
        print(f"\nTokens: {chunk.usage}")

Return type:: Iterator[StreamChunk]

async astream(config)[source]

Async streaming — yields StreamChunk objects.

Return type:: AsyncIterator[StreamChunk]

embed(config)[source]

Synchronous embedding.

Return type:: EmbeddingResponse

async aembed(config)[source]

Async embedding.

Return type:: EmbeddingResponse

class ractogateway.openai_developer_kit.RoutingTier(**data)[source]

Bases: BaseModel

One tier in the cost-aware routing ladder.

The router evaluates a complexity score (0-100) for each incoming message and selects the first tier whose max_score is >= that score. The last tier in the list always acts as the catch-all fallback.

Parameters:

model (str) – The LLM model identifier to use for requests that fall in this tier (e.g. "gpt-4o-mini", "gemini-2.0-flash", "claude-haiku-4-5-20251001").
max_score (float) – Inclusive upper bound on the complexity score that routes to this model. Range: 0-100. Set to 100 for the last (most powerful) tier so it catches everything.

Examples

tiers = [
    RoutingTier(model="gpt-4o-mini",  max_score=30),
    RoutingTier(model="gpt-4o",        max_score=70),
    RoutingTier(model="o3-mini",        max_score=100),
]

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model: str

max_score: float

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.SemanticCache(embed_fn, similarity_threshold=0.95, max_size=512, ttl_seconds=None)[source]

Bases: object

Vector-similarity cache — returns cached answers for semantically similar queries, costing $0 in API calls.

Parameters:

embed_fn (Callable[[str], list[float]]) – Any callable (text: str) -> list[float]. Called once per new query (cache miss) and once at put() time.
similarity_threshold (float) – Minimum cosine similarity to declare a hit. Default 0.95 is intentionally strict to avoid incorrect responses.
max_size (int) – Maximum number of entries (LRU eviction). 0 = unlimited.
ttl_seconds (float | None) – Optional per-entry TTL. None disables expiry.

Examples

import ractogateway.openai_developer_kit as gpt

kit = gpt.OpenAIDeveloperKit(model="gpt-4o")

def embed(text: str) -> list[float]:
    import openai
    r = openai.OpenAI().embeddings.create(
        model="text-embedding-3-small", input=text
    )
    return r.data[0].embedding

cache = SemanticCache(embed_fn=embed, similarity_threshold=0.95)

get(query)[source]

Embed query and return a cached response if cosine-sim ≥ threshold.

Returns None on a cache miss (caller should make the real API call and then invoke put()).

Complexity: O(n·d) where n = number of entries, d = embedding dim.

Return type:: LLMResponse | None

put(query, response)[source]

Embed query and store response for future similar queries.

Evicts LRU entry when at capacity.

Return type:: None

clear()[source]

Remove all entries and reset counters.

Return type:: None

property stats: CacheStats: Return a snapshot of hit/miss/size counters.

class ractogateway.openai_developer_kit.StreamChunk(**data)[source]

Bases: BaseModel

A single piece of a streaming response.

Consumers iterate over StreamChunk objects — they never touch raw provider events directly.

delta: The incremental content for this chunk.

accumulated_text: Running concatenation of all delta.text values so far.

finish_reason: None for intermediate chunks; set on the final chunk.

tool_calls: Empty until the final chunk (is_final=True).

usage: Token counts — populated on the final chunk only.

is_final: True only for the very last chunk in the stream.

raw: The underlying provider event (escape-hatch for advanced users).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

delta: StreamDelta

accumulated_text: str

accumulated_thinking: str

is_thinking: bool

finish_reason: FinishReason | None

tool_calls: list[ToolCallResult]

usage: dict[str, int]

is_final: bool

parsed: dict[str, Any] | list[Any] | None

raw: Any

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.StreamDelta(**data)[source]

Bases: BaseModel

Incremental content produced by a single streaming event.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

text: str

thinking: str

tool_call_id: str | None

tool_call_name: str | None

tool_call_args_fragment: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.ToolCallResult(**data)[source]

Bases: BaseModel

A single tool/function call returned by the model.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

id: str

name: str

arguments: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ractogateway.openai_developer_kit.TokenTruncator(config=None)[source]

Bases: object

Smart conversation-history trimmer.

Parameters:: config (TruncationConfig | None) – TruncationConfig instance. If omitted a default config is used (approximate counter, 8 k limit).

Examples

from ractogateway.truncation import TokenTruncator, TruncationConfig
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")
truncator = TokenTruncator(
    TruncationConfig(
        token_counter=lambda t: len(enc.encode(t)),
        keep_first_n=2,
        keep_last_n=8,
    )
)
kit = OpenAIDeveloperKit(model="gpt-4o", truncator=truncator)

truncate(chat_config, model)[source]

Return a copy of chat_config with trimmed history if necessary.

If the total estimated token count (system prompt + history + user_message) fits within the model’s context limit, the original ChatConfig is returned unchanged.

Parameters:

chat_config (ChatConfig) – The chat configuration to potentially truncate.
model (str) – The resolved model name used to look up the context-window limit.

Return type:

ChatConfig

Returns:

ChatConfig – A new ChatConfig instance with (possibly shorter) history. The user_message and all other fields are preserved verbatim.

estimate_tokens(text)[source]

Convenience wrapper around the configured token counter.

Return type:: int

class ractogateway.openai_developer_kit.TruncationConfig(**data)[source]

Bases: BaseModel

Configuration for TokenTruncator.

Parameters:

max_context_tokens (int | None) – Hard cap on total prompt tokens before calling the API. When None, the truncator looks up the model in MODEL_CONTEXT_LIMITS (falling back to 8 192).
keep_first_n (int) – Number of history messages to always preserve from the start of the conversation (anchors context). Defaults to 2.
keep_last_n (int) – Number of history messages to always preserve from the most recent end of the conversation. Defaults to 6.
token_counter (Callable[[str], int]) –
Callable (text: str) -> int. Defaults to the built-in approximate counter (len // 4). Swap for tiktoken for exact OpenAI token counts:
```
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
config = TruncationConfig(token_counter=lambda t: len(enc.encode(t)))
```
safety_margin (int) – Extra token budget reserved beyond the system prompt and user message. Defaults to 512.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

max_context_tokens: int | None

keep_first_n: int

keep_last_n: int

token_counter: Callable[[str], int]

safety_margin: int

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resolve_limit(model)[source]

Return the effective token limit for model.

Priority: max_context_tokens → MODEL_CONTEXT_LIMITS lookup → _DEFAULT_CONTEXT.

Return type:: int

model_post_init(_TruncationConfig__context)[source]

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

Return type:: None