Native Thinking API Reference

ChatConfig fields

class ractogateway._models.chat.ChatConfig(**data)[source]

Bases: BaseModel

Validated input for every chat / achat / stream / astream call.

Pass a single ChatConfig to any developer-kit method. Every field has a safe default so you only need to supply what you actually need.

Minimal example:

config = ChatConfig(user_message="Explain Python generators.")
response = kit.chat(config)

Vision / multimodal example:

from ractogateway.prompts.engine import RactoFile

config = ChatConfig(
    user_message="Describe this chart.",
    attachments=[RactoFile.from_path("sales_q4.png")],
)

Structured JSON output example:

class Sentiment(BaseModel):
    label: str
    score: float

config = ChatConfig(
    user_message="I love this library!",
    response_model=Sentiment,
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

user_message: str
prompt: RactoPrompt | None
temperature: float
max_tokens: int
tools: ToolRegistry | None
auto_execute_tools: bool
max_tool_turns: int
response_model: type[BaseModel] | None
max_validation_retries: int
history: list[Message]
attachments: list[RactoFile] | None
chain_of_thought: bool
native_thinking: bool
thinking_budget: int
extra: dict[str, Any]
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].


StreamDelta

class ractogateway._models.stream.StreamDelta(**data)[source]

Bases: BaseModel

Incremental content produced by a single streaming event.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

text: str
thinking: str
tool_call_id: str | None
tool_call_name: str | None
tool_call_args_fragment: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Field

Type

Description

text

str

Answer token delta (same as always)

thinking

str

Reasoning token delta (non-empty only on thinking chunks)


StreamChunk

class ractogateway._models.stream.StreamChunk(**data)[source]

Bases: BaseModel

A single piece of a streaming response.

Consumers iterate over StreamChunk objects — they never touch raw provider events directly.

delta

The incremental content for this chunk.

accumulated_text

Running concatenation of all delta.text values so far.

finish_reason

None for intermediate chunks; set on the final chunk.

tool_calls

Empty until the final chunk (is_final=True).

usage

Token counts — populated on the final chunk only.

is_final

True only for the very last chunk in the stream.

raw

The underlying provider event (escape-hatch for advanced users).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

delta: StreamDelta
accumulated_text: str
accumulated_thinking: str
is_thinking: bool
finish_reason: FinishReason | None
tool_calls: list[ToolCallResult]
usage: dict[str, int]
is_final: bool
parsed: dict[str, Any] | list[Any] | None
raw: Any
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Field

Type

Description

delta

StreamDelta

Incremental content (.text or .thinking)

accumulated_text

str

All answer text received so far

accumulated_thinking

str

All reasoning text received so far

is_thinking

bool

True when this chunk carries only reasoning text

is_final

bool

True on the last event in the stream

usage

dict[str, int]

Token counts on the final chunk


LLMResponse

class ractogateway.adapters.base.LLMResponse(**data)[source]

Bases: BaseModel

Unified, provider-agnostic response envelope.

Every adapter’s run() method returns one of these, regardless of whether the underlying provider is OpenAI, Gemini, or Anthropic.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

content: str | None
thinking: str | None
parsed: dict[str, Any] | list[Any] | None
tool_calls: list[ToolCallResult]
finish_reason: FinishReason
usage: dict[str, int]
raw: Any
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Field

Type

Description

content

str | None

Final answer text

thinking

str | None

Complete reasoning text (Anthropic / Google)

usage

dict[str, int]

Token counts; OpenAI o-series adds reasoning_tokens key


Provider behaviour summary

Anthropic

  • Supported models: claude-3-7-sonnet-20250219 and later.

  • API param injected: thinking={"type": "enabled", "budget_tokens": N}.

  • Temperature is automatically forced to 1 (API requirement; user value is ignored).

  • Stream events: thinking_delta events carry delta.thinking; text_delta events carry delta.text. Both can interleave in the same stream.

  • Non-streaming: LLMResponse.thinking contains the complete reasoning block.

Google

  • Supported models: gemini-2.5-pro, gemini-2.0-flash-thinking-exp.

  • API param injected: ThinkingConfig(thinking_budget=N) in GenerateContentConfig.

  • Thought parts are identified by part.thought == True in the response candidates.

  • Streaming: thought parts arrive as is_thinking=True chunks before answer parts.

  • Non-streaming: LLMResponse.thinking contains all joined thought parts.

OpenAI

  • Supported models: o1, o3, o3-mini.

  • native_thinking / thinking_budget are silently ignored (OpenAI does not expose reasoning text in the Chat Completions API).

  • Reasoning token count is exposed automatically in response.usage["reasoning_tokens"] whenever the model returns completion_tokens_details.reasoning_tokens.