Native Thinking

Set native_thinking=True on any ChatConfig to get the model’s actual internal reasoning process — not just a prompted instruction to “think step by step”, but the real tokens the model spends working through the problem before writing its answer.

This is different from chain_of_thought=True (which is a prompt engineering trick that works on all models). Native thinking is a model-level feature supported only by specific models, and the thinking text appears in dedicated fields separated from the final answer.

Supported models

Provider

Models

Thinking visible?

Anthropic

claude-3-7-sonnet-20250219 and later

Yes — full text, streamable

Google

gemini-2.5-pro, gemini-2.0-flash-thinking-exp

Yes — full text, streamable

OpenAI

o1, o3, o3-mini

No — token count only (usage["reasoning_tokens"])


Quick start — Anthropic (streaming)

from ractogateway import anthropic_developer_kit as claude
from ractogateway.prompts.engine import RactoPrompt

prompt = RactoPrompt(
    role="Expert mathematician",
    aim="Solve the user's problem completely.",
    constraints=["Show the final answer clearly."],
    tone="Precise",
    output_format="text",
)
kit = claude.Chat(
    model="claude-3-7-sonnet-20250219",
    default_prompt=prompt,
)

print("=== THINKING ===")
for chunk in kit.stream(claude.ChatConfig(
    user_message="How many trailing zeros does 100! have?",
    native_thinking=True,
    thinking_budget=8000,
)):
    if chunk.is_thinking:
        print(chunk.delta.thinking, end="", flush=True)
    elif chunk.delta.text:
        if not chunk.accumulated_thinking:
            print("\n=== ANSWER ===")
        print(chunk.delta.text, end="", flush=True)

Expected output:

=== THINKING ===
I need to find the number of trailing zeros in 100!.
Trailing zeros come from factors of 10, and 10 = 2 × 5.
Since factors of 2 appear much more often than 5, I just need to count factors of 5.

Factors of 5 in 100!:
- Numbers divisible by 5:  100 ÷ 5  = 20
- Numbers divisible by 25: 100 ÷ 25 = 4
- Numbers divisible by 125: 100 ÷ 125 = 0 (125 > 100)

Total = 20 + 4 = 24

=== ANSWER ===
100! has **24 trailing zeros**.

Quick start — Anthropic (non-streaming)

response = kit.chat(claude.ChatConfig(
    user_message="How many trailing zeros does 100! have?",
    native_thinking=True,
    thinking_budget=8000,
))

print("THINKING:", response.thinking)
# THINKING: I need to find the number of trailing zeros in 100!.
# Trailing zeros come from factors of 10, and 10 = 2 × 5. ...

print("ANSWER:", response.content)
# ANSWER: 100! has **24 trailing zeros**.

Quick start — Google Gemini (streaming)

from ractogateway import google_developer_kit as gemini

kit = gemini.Chat(
    model="gemini-2.5-pro",
    default_prompt=prompt,
)

for chunk in kit.stream(gemini.ChatConfig(
    user_message="What is the probability of getting at least one 6 in four dice rolls?",
    native_thinking=True,
    thinking_budget=4096,
)):
    if chunk.is_thinking:
        print(chunk.delta.thinking, end="", flush=True)
    elif chunk.delta.text:
        print(chunk.delta.text, end="", flush=True)

Expected output (thinking):

The complement of "at least one 6" is "no 6 in any of the four rolls".
P(no 6 on a single roll) = 5/6
P(no 6 in four rolls)    = (5/6)^4 = 625/1296
P(at least one 6)        = 1 − 625/1296 = 671/1296 ≈ 0.5177

Expected output (answer):

The probability is **671/1296 ≈ 51.8%**.

Quick start — OpenAI o-series (reasoning token count)

OpenAI o1/o3 models reason internally and do not expose the reasoning text. native_thinking=True is accepted but has no API effect — the reasoning token count is always added to usage["reasoning_tokens"] automatically when available.

from ractogateway import openai_developer_kit as gpt

kit = gpt.Chat(model="o3-mini", default_prompt=prompt)

response = kit.chat(gpt.ChatConfig(
    user_message="Solve x² − 5x + 6 = 0",
    native_thinking=True,   # optional flag; no-op for OpenAI
))

print(response.content)
# x = 2  or  x = 3

print(response.usage)
# {
#   "prompt_tokens": 142,
#   "completion_tokens": 89,
#   "total_tokens": 231,
#   "reasoning_tokens": 64      ← reasoning tokens consumed
# }

Controlling the thinking budget

config = claude.ChatConfig(
    user_message="Explain why P ≠ NP is hard to prove.",
    native_thinking=True,
    thinking_budget=20000,   # more budget → deeper reasoning
    max_tokens=8192,         # must be > thinking_budget for Anthropic
)

Parameter

Default

Notes

thinking_budget

10000

Max tokens the model may spend reasoning

max_tokens

4096

Anthropic: must be set higher than the budget

Anthropic note: temperature is automatically forced to 1 — you do not need to set it. Passing any other value is silently overridden.


Reading the output

Streaming

for chunk in kit.stream(config):
    # --- thinking phase ---
    chunk.is_thinking            # True while thinking tokens stream
    chunk.delta.thinking         # the new reasoning text in this event
    chunk.accumulated_thinking   # all reasoning text so far

    # --- answer phase ---
    chunk.delta.text             # the new answer text in this event
    chunk.accumulated_text       # all answer text so far

    # --- final chunk ---
    chunk.is_final               # True on the last event
    chunk.accumulated_thinking   # complete reasoning
    chunk.accumulated_text       # complete answer
    chunk.usage                  # token counts

Non-streaming

response = kit.chat(config)

response.thinking   # str | None  — complete reasoning text
response.content    # str | None  — final answer
response.usage      # dict with prompt_tokens, completion_tokens, total_tokens
                    # (+ reasoning_tokens for OpenAI o-series)

Combining with chain_of_thought

native_thinking and chain_of_thought can be used together. chain_of_thought adds a prompt-level instruction; native_thinking activates the model’s internal engine. On Anthropic/Google the model will reason both internally (native) and then also tend to be more explicit in its answer (prompted).

config = claude.ChatConfig(
    user_message="...",
    native_thinking=True,
    chain_of_thought=True,
)

Practical: render thinking in a terminal

import sys

previous_was_thinking = False

for chunk in kit.stream(config):
    if chunk.is_thinking:
        if not previous_was_thinking:
            print("\033[2m[thinking]\033[0m", flush=True)   # dim
        print(f"\033[2m{chunk.delta.thinking}\033[0m", end="", flush=True)
        previous_was_thinking = True
    else:
        if previous_was_thinking:
            print("\n\033[0m[answer]\033[0m", flush=True)   # reset
            previous_was_thinking = False
        print(chunk.delta.text, end="", flush=True)

print()

Tips

Goal

Setting

Deep, multi-step proofs / code

Raise thinking_budget to 32000100000

Fast simple answers

Lower thinking_budget to 10242048

Only show the final answer

Ignore chunk.delta.thinking; read chunk.accumulated_text

Log full reasoning for debugging

Save chunk.accumulated_thinking on chunk.is_final

Async streaming

Use astream() — same fields, same behaviour