# Native Thinking API Reference ## ChatConfig fields ```{eval-rst} .. autoclass:: ractogateway._models.chat.ChatConfig :members: native_thinking, thinking_budget :noindex: ``` --- ## StreamDelta ```{eval-rst} .. autoclass:: ractogateway._models.stream.StreamDelta :members: :noindex: ``` | Field | Type | Description | | --- | --- | --- | | `text` | `str` | Answer token delta (same as always) | | `thinking` | `str` | Reasoning token delta (non-empty only on thinking chunks) | --- ## StreamChunk ```{eval-rst} .. autoclass:: ractogateway._models.stream.StreamChunk :members: :noindex: ``` | Field | Type | Description | | --- | --- | --- | | `delta` | `StreamDelta` | Incremental content (`.text` or `.thinking`) | | `accumulated_text` | `str` | All answer text received so far | | `accumulated_thinking` | `str` | All reasoning text received so far | | `is_thinking` | `bool` | `True` when this chunk carries only reasoning text | | `is_final` | `bool` | `True` on the last event in the stream | | `usage` | `dict[str, int]` | Token counts on the final chunk | --- ## LLMResponse ```{eval-rst} .. autoclass:: ractogateway.adapters.base.LLMResponse :members: thinking :noindex: ``` | Field | Type | Description | | --- | --- | --- | | `content` | `str \| None` | Final answer text | | `thinking` | `str \| None` | Complete reasoning text (Anthropic / Google) | | `usage` | `dict[str, int]` | Token counts; OpenAI o-series adds `reasoning_tokens` key | --- ## Provider behaviour summary ### Anthropic - Supported models: `claude-3-7-sonnet-20250219` and later. - API param injected: `thinking={"type": "enabled", "budget_tokens": N}`. - Temperature is automatically forced to `1` (API requirement; user value is ignored). - Stream events: `thinking_delta` events carry `delta.thinking`; `text_delta` events carry `delta.text`. Both can interleave in the same stream. - Non-streaming: `LLMResponse.thinking` contains the complete reasoning block. ### Google - Supported models: `gemini-2.5-pro`, `gemini-2.0-flash-thinking-exp`. - API param injected: `ThinkingConfig(thinking_budget=N)` in `GenerateContentConfig`. - Thought parts are identified by `part.thought == True` in the response candidates. - Streaming: thought parts arrive as `is_thinking=True` chunks before answer parts. - Non-streaming: `LLMResponse.thinking` contains all joined thought parts. ### OpenAI - Supported models: `o1`, `o3`, `o3-mini`. - `native_thinking` / `thinking_budget` are silently ignored (OpenAI does not expose reasoning text in the Chat Completions API). - Reasoning token count is exposed automatically in `response.usage["reasoning_tokens"]` whenever the model returns `completion_tokens_details.reasoning_tokens`.