Native Thinking API Reference

ChatConfig fields

class ractogateway._models.chat.ChatConfig(**data)[source]

Bases: BaseModel

Validated input for every chat / achat / stream / astream call.

Pass a single ChatConfig to any developer-kit method. Every field has a safe default so you only need to supply what you actually need.

Minimal example:

config = ChatConfig(user_message="Explain Python generators.")
response = kit.chat(config)

Vision / multimodal example:

from ractogateway.prompts.engine import RactoFile

config = ChatConfig(
    user_message="Describe this chart.",
    attachments=[RactoFile.from_path("sales_q4.png")],
)

Structured JSON output example:

class Sentiment(BaseModel):
    label: str
    score: float

config = ChatConfig(
    user_message="I love this library!",
    response_model=Sentiment,
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

user_message: str

prompt: RactoPrompt | None

temperature: float

max_tokens: int

tools: ToolRegistry | None

auto_execute_tools: bool

max_tool_turns: int

response_model: type[BaseModel] | None

max_validation_retries: int

history: list[Message]

attachments: list[RactoFile] | None

chain_of_thought: bool

native_thinking: bool

thinking_budget: int

extra: dict[str, Any]

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

StreamDelta

class ractogateway._models.stream.StreamDelta(**data)[source]

Bases: BaseModel

Incremental content produced by a single streaming event.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

text: str

thinking: str

tool_call_id: str | None

tool_call_name: str | None

tool_call_args_fragment: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Field	Type	Description
`text`	`str`	Answer token delta (same as always)
`thinking`	`str`	Reasoning token delta (non-empty only on thinking chunks)

StreamChunk

class ractogateway._models.stream.StreamChunk(**data)[source]

Bases: BaseModel

A single piece of a streaming response.

Consumers iterate over StreamChunk objects — they never touch raw provider events directly.

delta: The incremental content for this chunk.

accumulated_text: Running concatenation of all delta.text values so far.

finish_reason: None for intermediate chunks; set on the final chunk.

tool_calls: Empty until the final chunk (is_final=True).

usage: Token counts — populated on the final chunk only.

is_final: True only for the very last chunk in the stream.

raw: The underlying provider event (escape-hatch for advanced users).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

delta: StreamDelta

accumulated_text: str

accumulated_thinking: str

is_thinking: bool

finish_reason: FinishReason | None

tool_calls: list[ToolCallResult]

usage: dict[str, int]

is_final: bool

parsed: dict[str, Any] | list[Any] | None

raw: Any

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Field	Type	Description
`delta`	`StreamDelta`	Incremental content (`.text` or `.thinking`)
`accumulated_text`	`str`	All answer text received so far
`accumulated_thinking`	`str`	All reasoning text received so far
`is_thinking`	`bool`	`True` when this chunk carries only reasoning text
`is_final`	`bool`	`True` on the last event in the stream
`usage`	`dict[str, int]`	Token counts on the final chunk

LLMResponse

class ractogateway.adapters.base.LLMResponse(**data)[source]

Bases: BaseModel

Unified, provider-agnostic response envelope.

Every adapter’s run() method returns one of these, regardless of whether the underlying provider is OpenAI, Gemini, or Anthropic.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

content: str | None

thinking: str | None

parsed: dict[str, Any] | list[Any] | None

tool_calls: list[ToolCallResult]

finish_reason: FinishReason

usage: dict[str, int]

raw: Any

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Field	Type	Description
`content`	`str \| None`	Final answer text
`thinking`	`str \| None`	Complete reasoning text (Anthropic / Google)
`usage`	`dict[str, int]`	Token counts; OpenAI o-series adds `reasoning_tokens` key

Provider behaviour summary

Anthropic

Supported models: claude-3-7-sonnet-20250219 and later.
API param injected: thinking={"type": "enabled", "budget_tokens": N}.
Temperature is automatically forced to 1 (API requirement; user value is ignored).
Stream events: thinking_delta events carry delta.thinking; text_delta events carry delta.text. Both can interleave in the same stream.
Non-streaming: LLMResponse.thinking contains the complete reasoning block.

Google

Supported models: gemini-2.5-pro, gemini-2.0-flash-thinking-exp.
API param injected: ThinkingConfig(thinking_budget=N) in GenerateContentConfig.
Thought parts are identified by part.thought == True in the response candidates.
Streaming: thought parts arrive as is_thinking=True chunks before answer parts.
Non-streaming: LLMResponse.thinking contains all joined thought parts.

OpenAI

Supported models: o1, o3, o3-mini.
native_thinking / thinking_budget are silently ignored (OpenAI does not expose reasoning text in the Chat Completions API).
Reasoning token count is exposed automatically in response.usage["reasoning_tokens"] whenever the model returns completion_tokens_details.reasoning_tokens.